Next Article in Journal
Experimental Study on Strength and Deformation Moduli of Columnar Jointed Rock Mass—Uniaxial Compression as an Example
Previous Article in Journal
Is the Wavefunction Already an Object on Space?
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

ETLSH-YOLO: An Edge–Real-Time Transmission Line Safety Hazard Detection Method

1
Shanxi Energy Internet Research Institute, Taiyuan 030032, China
2
College of Electrical and Power Engineering, Taiyuan University of Technology, Taiyuan 030024, China
3
Department of Automation, Taiyuan Institute of Technology, Taiyuan 030008, China
4
Key Laboratory of Cleaner Intelligent Control on Coal Electricity, Ministry of Education, Taiyuan 030024, China
*
Author to whom correspondence should be addressed.
Symmetry 2024, 16(10), 1378; https://doi.org/10.3390/sym16101378
Submission received: 11 September 2024 / Revised: 9 October 2024 / Accepted: 14 October 2024 / Published: 16 October 2024
(This article belongs to the Special Issue Symmetry/Asymmetry Study in Object Detection)

Abstract

:
Using deep learning methods to detect potential safety hazards in transmission lines is the mainstream method for power grid security monitoring. However, the existing model is too complex to adapt to edge device deployment and real-time detection. Therefore, an edge–real-time transmission line safety hazard detection method (ETLSH-YOLO) was proposed to reduce the model’s complexity and improve the model’s robustness. Firstly, a re-parameterized Ghost efficient layer aggregation network (RepGhostCSPELAN) was designed to effectively fuse the feature information of different layers while enhancing the model’s expression ability and reducing the number of model parameters and floating-point operations. Then, a spatial channel decoupled downsampling block (CSDovn) was designed to reduce computational redundancy and improve the computational efficiency of the model. Then, coordinate attention (CA) was added in the process of multi-scale feature fusion to suppress the interference of complex background and improve the global perception ability of the model object. Finally, the Mish activation function was used to improve the network’s training speed, convergence, and generalization ability. The experimental results show that the mAP50 of this model improved by 1.73% compared with the baseline model, and the number of parameters and floating-point operations were reduced by 33.96% and 22.22%, respectively. This model lays the foundation for solving the dilemma of edge device deployment.

1. Introduction

With the continuous expansion of power networks and the continuous growth of power demand, the safe and stable operation of transmission lines as the “arteries” of power transmission is crucial to ensuring the reliability and stability of the power supply [1,2]. During operation, transmission lines are often faced with a variety of threats from the natural environment and human factors. Among them, foreign objects and mountain fires on transmission lines are two major risk factors that cannot be ignored, which bring great challenges to power operation and maintenance and service life [3,4]. Therefore, the timely and accurate detection of foreign objects and mountain fires on transmission lines has become an important part of power operation and maintenance. Traditional detection methods often rely on manual inspections and simple equipment monitoring, which are not only inefficient but also make it difficult to achieve comprehensive coverage and real-time monitoring of transmission lines [5]. In recent years, with the rapid development of science and technology, deep learning has become widely used in the field of electrical engineering, such as power equipment fault diagnosis, smart grids, digital circuit vulnerability analysis, etc. [6], and it is also widely used in the field of power line safety detection. These technologies can not only realize real-time monitoring and early warning of faults in transmission lines but also improve the accuracy and efficiency of detection, providing strong support for power operation and maintenance [7,8,9].
Therefore, we developed an autonomous inspection Unmanned Aerial Vehicle (UAV) and a cable robot, whose structures are shown in Figure 1. They achieve real-time and accurate detection through deployed visual edge devices, which greatly improve the accuracy and efficiency of transmission line safety detection [10,11]. In the future, it is expected that an intelligent regional transmission line safety monitoring system will be deployed, as shown in Figure 1. The system uses cable robots to achieve static area safety monitoring and UAVs to achieve dynamic area autonomous safety inspection. At the same time, robots and UAVs use wireless interactive communication to achieve the high-precision positioning of transmission line safety detection [12,13]. The system can achieve comprehensive monitoring and provide an early warning of faults in transmission lines, can discover and eliminate potential safety hazards in a timely manner, and can ensure the stable operation of the power network. It has important practical significance and broad application prospects [14,15].
With the great achievements of deep learning algorithms in image recognition and detection, the detection of foreign objects and mountain fires on power lines can be performed by deploying deep learning-based object detection algorithms on UAVs and visual robot edge devices, thereby improving the accuracy of power line safety object detection and the timely handling of power line safety faults, effectively realizing the automation and intelligence of the monitoring system [16,17,18]. At present, the object detection methods based on deep learning are mainly divided into two categories. One is a two-stage method, such as the R-CNN series algorithm based on candidate regions [19,20]. These methods first generate regions and then classify samples through convolutional neural networks (CNNs). Feng Jun used the Faster R-CNN algorithm to build a Siamese network model, using the improved Region Proposal Network (RPN) module to generate high-quality prediction boxes, and performed correlation matching on the ROI features of the support and query images in the detection head. This model solves the problems of difficulty and low efficiency in power system inspection [21]. Xue proposed a detection method based on the improved Faster R-CNN model, which improves the feature extraction capability of the model by increasing the network depth, solving the problem of small object detection on power lines [22]. The other is a one-stage method, such as the YOLO series algorithm [23,24,25]. These methods directly extract features in the network to predict object classification and location. Yan et al. proposed an improved Single Shot MultiBox Detector (SSD) algorithm to detect small object defects in transmission line inspection images [26]. Xue et al. proposed a transmission line foreign body detection algorithm that combines a window-based self-attention network with YOLOv5. The algorithm uses a large convolution kernel to expand the receptive field of the model, enhances the ability to extract effective information, and improves the adaptive space [27]. Liu et al. added a compression excitation module to the YOLOv5s backbone network to enhance the feature extraction ability of the network, thereby effectively improving the performance of the algorithm [28]. Liu et al. proposed an information aggregation algorithm based on YOLOX-S. The algorithm aggregates spatial information and channel information in the feature map, which enhances relevant features, suppresses irrelevant features, improves the overall learning ability of the network, and improves detection accuracy [29]. Wang et al. proposed a new method for foreign body detection on transmission lines based on YOLOv8n. The method integrates MSDA attention into the YOLOv8n network, which optimizes the feature fusion process and enables the model to effectively capture feature information of different scales [30]. Zhang et al. proposed a mountain fire detection method in the transmission line channel based on improved DETR, which adds multi-scale feature information in the feature extraction stage and uses hole convolution to improve the algorithm’s perception ability of underlying features. Meanwhile, the self-attention mechanism in the Transformer module is improved; finally, the optimal mountain fire detection model is established [31]. Yan et al. used EfficientNet networks to replace the main feature extraction network in the original YOLOv4 model. In addition, the inclusion of a grouping convolution module in the feature pyramid structure replaces the conventional convolution operation. The resulting model not only reduces the model parameters but also effectively ensures detection accuracy [32].
The above methods have achieved improvement of the performance of safety-based object detection on transmission lines to a certain extent, but the types of detected objects are detected on a single-object basis, and it is impossible to take into account foreign objects and mountain fires at the same time. In addition, when the target is in a complex background, it is difficult to effectively extract features of multiple targets and easy to cause missed detection and false detection. Moreover, the model is complex and not conducive to deployment on edge devices. Therefore, in order to improve the effectiveness and accuracy of foreign object and mountain fire detection on transmission lines, enhance the environmental adaptability of the algorithm, reduce the complexity of the model, and facilitate the deployment of edge devices, this paper follows the efficiency–accuracy-driven design strategy and proposes an edge–real-time transmission line safety hazard detection method (ETLSH-YOLO). The model comprehensively considers various components in YOLO and designs a lightweight layer aggregation network and a spatial channel decoupling subsampling module, which significantly reduces computational redundancy and improves computational efficiency and enhances feature expression. In order to further improve the accuracy, a coordinate attention module is added in the multi-scale feature fusion process to enhance the model’s capabilities and explore the potential for performance improvement at a low cost. In order to improve the detection performance and convergence speed of the model, the Mish function is used instead of the Silu function to ensure that the model can capture complex nonlinear relationships in the data. The main contributions of this study are listed as follows:
(1)
Designing a re-parameterized Ghost efficient layer aggregation network (RepGhostCSPELAN), which enhances the model’s feature extraction and gradient flow capabilities, reduces the model’s complexity, and reduces the model’s parameters and floating-point operations.
(2)
Designing a spatial channel decoupled downsampling block (CSDovn), which re-duces computational redundancy, improves the model’s computational efficiency and information retention during downsampling, and obtains stronger feature expression capabilities, thus proving the model’s detection capabilities.
(3)
Adding coordinate attention to help the model extract the relationship between position information and channel information in the feature map at a lower cost, enhancing the model’s global perception capabilities and improving the model’s detection accuracy.
(4)
Using the Mish function instead of the Silu function to capture complex nonlinear relationships in the data, thereby improving the model’s stability, convergence speed, and generalization.

2. Material and Methods

The YOLOv9s model is mainly composed of five parts: input, backbone, neck, head, and auxiliary reversible branch. The input is mainly used for data enhancement. The backbone is composed of multiple Conv, Adown, and RepNCSPELAN4 modules, which are mainly used to extract features from images. The neck network adopts the PAN structure to fuse the high-level and low-level features extracted by the backbone and realizes the integration of feature map information and semantic information. The head is mainly composed of multiple detection heads, which are responsible for predicting the object position and category based on the feature information refined by the neck. The auxiliary reversible branch mainly generates reliable gradients so that the main branch can receive more complete and rich information, thereby improving the accuracy of the model.
Due to the limited computing resources of edge devices, the efficiency–accuracy driven model was designed for each part. The network structure of the ETLSH-YOLO method is shown in Figure 2. In order to better enable the reversible branch to generate reliable gradient information during the training process and provide the main branch with gradients for reverse transmission, ETLSH-YOLO continues the YOLOv9 reversible branch structure, which extracts reversible branch features from the input feature map instead of extracting reversible branch features from the intermediate features of the backbone. This will ensure complete information flow from data to the target, and the model can learn more comprehensive feature representation, which helps to improve the model’s detection accuracy and generalization ability. Because of the large complexity of the model, which is not conducive to the deployment of edge devices, the re-parameterized Ghost efficient layer aggregation network (RepGhostCSPELAN) and the spatial channel decoupling downsampling block (CSDovn) are designed. The RepGhostCSPELAN applies the GhostNet lightweight structure and uses cheap operations to generate a part of redundant feature maps to reduce the number of calculations and parameters. At the same time, in order to make up for the performance loss caused by discarding the residual block, RepConv is used on the gradient flow branch to enhance the ability of feature extraction and gradient flow. The CSDovn separates the two processes of spatial downsampling and channel adjustment, first through deep convolution and pooling operations and then through point convolution, avoiding the non-interaction of information between feature map channels obtained by spatial downsampling, while helping to reduce the computational cost of the model and improve the inference speed. In order to improve the detection accuracy of the model, coordinate attention (CA) is added to the feature fusion layer to assist the network in learning key feature information and to enhance the model’s global perception ability, thereby improving the model’s detection accuracy. The Mish function is used instead of the Silu function on the activation function to capture the complex nonlinear relationship in the data, improve the convergence and stability of the model, and enhance the generalization ability of the model, thereby improving the performance of the object detection model.

2.1. RepGhostCSPELAN

In order to effectively reduce the number of model parameters, improve model efficiency, enhance feature expression capabilities, and optimize reasoning efficiency so that the model can be better deployed on resource-constrained devices while maintaining high performance, the re-parameterized Ghost efficient layer aggregation network (RepGhostCSPELAN) is designed, and the structure is shown in Figure 3.
GhostNBottleneck adopts the Ghost lightweight structure, which generates many feature maps through group convolution and simple linear transformation. These maps can fully reveal the information of intrinsic features and enrich the feature expression ability of the model, thereby improving the detection performance of the model. This module significantly reduces the computational cost of the model, significantly improves the inference speed, and makes the model more efficient in the inference process.
GhostNCSP divides the input feature map into two branches for feature extraction, which helps the model better capture the different scale features in the image, and reduces the number of parameters in each branch, thereby reducing the computational complexity of the entire model. Since the GhostNBottleneck branch does not use bottleneck connection, in order to make up for the performance loss of abandoning staggered connection, re-parameterized convolution is used on another branch. Re-parameterized convolution uses a multi-branch structure to increase the gradient feedback path during training; it helps the model learn richer feature representations and merges multiple convolutional layers (such as Conv+BN) into one convolution operation during inference, thereby reducing the number of calculations and improving the inference efficiency. Finally, the different features of the two branches are fused so that the model obtains richer semantic information and improves the accuracy of foreign object and mountain fire detection.
RepGhostCSPELAN fuses features from different levels of convolution and GhostNCSP, which can capture more contextual information and detailed features, improve the feature expression ability of the model, enable it to better capture target information of different scales, and make the model more robust, thereby improving the detection accuracy of safety-based object detection on transmission lines.

2.2. CSDovn

In order to further reduce the number of parameters and computational complexity of the model while improving the diversity and expressiveness of model feature extraction, the spatial channel decoupled downsampling block (CSDovn) is designed, and the structure is shown in Figure 4.
Spatial channel decoupling allows the model to extract features independently in the spatial dimension and channel dimension, which means that the model can perform more refined processing on the features of each channel or each spatial position, which helps the model to more accurately capture the location information and semantic information of the target in the transmission line image. The module first uses two branches to reduce the spatial size. One branch independently extracts features from different groups through group convolution, which reduces the spatial size of the feature map while reducing the number of calculation and parameters, and helps the model learn a variety of feature representations. The other branch reduces the spatial size of the feature map through max pooling, which does not require additional computational overhead. Moreover, maxi pooling has a certain robustness to small translations of input features, which helps the model extract more stable features. Then, the feature maps processed by group convolution and max pooling are added together to achieve feature fusion, which helps the model to integrate the features extracted so that this feature information can be complementary, forming more comprehensive feature representation and enhancing the expressiveness of the features. Finally, the number of channels of the feature map is adjusted through point convolution, which can further fuse features from different channels and realize feature interaction between channels, thereby improving the accuracy of transmission line safety detection. Compared with YOLOv10 [33], this module proposes a spatial channel decoupling module, which significantly improves the interaction between feature information of different channels and enhances the expressiveness of features.
When the feature map F C × H × W is downsampled to F 2 C × ( H / 2 ) × ( W / 2 ) , the number of parameters and calculations required for the CSDovn is as follows:
p a r a m s = K × K × C + 2 C 2 = K 2 × C + 2 C 2
F L O P s = K × K × C × H 2 × W 2 + C × H × W 4 + 1 × 1 × C × 2 C × H 2 × W 2 = K 2 × C + C + 2 C 2 × H × W 4
where K represents the size of the convolution kernel, C represents the number of channels of the feature map, and H and W represent the height and width of the feature map.
The number of parameters and calculations required for ordinary convolution is as follows:
p a r a m s = K × K × C × 2 C = K 2 × 2 C 2
F L O P s = K × K × C × 2 C × H 2 × W 2 = K 2 × 2 C 2 × H × W 4
In terms of the comparison of parameter quantity and computational complexity, the CSDovn has fewer parameters and lower computational complexity, which can enable the model to have a faster inference speed to meet the real-time detection of transmission lines, better generalization to meet the safety object detection of transmission lines in complex environments, and lower computing resource consumption, making it easy to deploy and integrate in visual edge devices.

2.3. Coordinate Attention

Attention mechanisms are widely used in computer vision tasks. Common attention modules include squeeze-and-excitation (SE), convolutional block attention module (CBAM), and efficient channel attention (ECA). The SE module only considers the information of different channels but ignores the location information. On the basis of the SE module, the CBAM uses convolution to obtain position information, but convolution can only focus on local information and lacks the ability to extract remote information. ECA introduces deformable convolution to capture the correlation between channels and reduce the amount of computation, but it does not take into account the spatial information directly. The coordinate attention (CA) module overcomes the above problems; it extracts the relationship between the position information and channels information in the feature map in an effective way, and it obtains the feature map with direction-aware and position-aware information and applies it to the input feature map in a complementary manner. The CA enhances the feature representation of the target, allowing the model to more effectively focus on the information of a specific location or region, rather than relying solely on global or local feature representations, which helps to accurately locate and identify targets and improve the detection accuracy of the model. Its structure is shown in Figure 5.
The calculation process of CA is divided into two steps: coordinate information embedding and coordinate attention generation. The overall process formula is shown below:
z c h h = 1 W 0 i W x c h , i
z c w w = 1 H 0 j H x c j , w
f = δ F 1 z h , z w
g h = σ F h f h
g w = σ F w f w
y c i , j = x c i , j × g c h i × g c w j
Among them, h and w represent the height and width of the input feature map. H and W represent the size of the pooling kernel. x c ( h , i ) and x c ( j , w ) represent the horizontal feature vector at height h and the vertical feature vector at width w of the feature graph x c of channel c . z c h h and z c w w represent the horizontal and vertical coordinate information vectors on the c channel. F 1 , F h , F w represent convolution operations. δ , σ represent different activation functions. f represents the intermediate feature encoding spatial information. f h and f w represent splitting f into two independent vectors along the spatial dimension. g h and g w represent attention vectors.
Specifically, when embedding coordinate information, each channel uses ( H , 1 ) and ( 1 , W ) pooling kernels to encode single-dimensional features in the height and width directions to obtain coordinate information vectors z h and z w in the horizontal and vertical directions. Then, the coordinate information vector is concat connection, and 1 × 1 convolution, BatchNorm (BN) layer, and nonlinear activation layer are applied. Then, the intermediate features are split into two independent feature tensors, f h and f w , and then the dimension is adjusted from C / r channels to C channels by 1 × 1 convolution. Then, the attention weights in the horizontal and vertical directions are obtained by the Sigmoid activation function. Finally, the weights of the two directions are multiplied with the feature map at the corresponding coordinate position to obtain the final output feature map y that is endowed with the channel and the internal position and direction attention information.

2.4. Activation Function

In the safety-based object detection task on transmission lines, the selection of a suitable activation function has an important impact on the performance and training effect of the model. Therefore, the Mish function is used to replace the Silu function. Compared with the Silu function, the Mish function has outstanding nonlinear characteristics, which can improve the convergence and stability of the model and enhance the generalization ability of the model, thereby improving the performance and effect of the target detection model so as to better cope with the task of safety-based object detection on transmission lines in complex environments.
The calculation method of the Silu function and the calculation method of the Mish function is shown in the formulas below.
S i l u x = x 1 1 + e x
M i s h x = x tanh ln 1 + e x
The Mish function uses the hyperbolic tangent tanh and logarithmic ln functions, whereas the Silu function 1 1 + e x tends to be linear in some cases. By contrast, the Mish function exhibits obvious nonlinearity in the entire definition domain and can better capture the complex nonlinear relationships in the data, which is particularly important for identifying and locating the complex features of targets in transmission line object detection and can improve the model’s expressiveness and prediction performance.
The inverse of Silu and Mish functions is shown in the formulas below.
S i l u ( x ) = x ( x + x + 1 ) x x + 2 x + 1
M i s h x = tanh ln 1 + e x + 4 x ( 1 + e ln 1 + e x ) 3
It can be seen from the formula that when x is large or small, S i l u ( x ) may change rapidly due to the rapid growth or decay of the exponential function. Because of the nature of functions t a n h and l n , M i s h x is relatively smooth throughout the domain of x . In particular, when x is large or small, M i s h x does not change as rapidly as S i l u ( x ) . Therefore, Mish functions are analytically smoother than Silu functions. It can also be seen from Figure 6 that the Mish() function is smoother than the Silu() function, allowing better information to penetrate the neural network for better accuracy and generalization. At the same time, the Mish() function has a smoother gradient, which can more effectively reduce the problem of gradient disappearance or explosion compared to the Silu function. Mish() can make it easier for the model to reach a good local optimal solution, thereby improving the training stability and convergence speed of the model.

3. Results and Discussion

3.1. Data Set and Parameter Settings

The experiment‘s data set was collected by the UAV and crawled from the web. There was a total of 3500 images of safety hazards of power lines, and then LabelImg software (V.5.4.1) was used to label the image samples, with a total of six label categories, namely ballon, kite, nest, trash, fire, and smoke. Each type of image datum is divided into train set, val set, and test set at a ratio of 7:2:1, as shown in Table 1.
This experiment’s system version is Ubuntu 20.04.6, the GPU is RTX-4090, CUDA is version 12.0, python is version 3.8, and Pytorch is version 1.8. In this experiment, the stochastic gradient descent (SGD) algorithm is used for training for 300 epochs. The initial learning rate is set to 0.01, the minimum learning rate is set to 0.0001, and the batch size is set to 24. The momentum parameter and weight decay are 0.9370 and 0.0005, respectively.

3.2. Ablation Experiment

In order to verify the detection performance of the proposed model and explore the effectiveness of various improvement methods, ablation experiments were carried out based on the baseline model and the transmission line safety hazard data set. Each group of experiments used the same hyperparameters and training methods. The experimental results are shown in Table 2. The results of different attention performances are shown in Table 3. The [email protected] data curves of different modules are shown in Figure 7. The visualization of heat maps of different modules is shown in Figure 8.
As shown in Table 2, Experiment 0 is the experimental result of YOLOv9s, and Experiment 1 is the baseline model after pruning YOLOv9s. Because the model needs to be deployed on edge devices, the pruned model is selected as the baseline model. In Experiment 2, the addition of the RepGhostCSPELAN module reduces the model parameters and floating-point calculations by 27.30% and 16.89%, reducing the complexity of the model, while the mAP0.5 of the model is improved by 0.54%, indicating that the RepGhostCSPELAN module improves the feature expression ability of the model while reducing the complexity of the model and thereby improving the detection performance of the model. In Experiment 3, the CA module is introduced, and the model parameters and floating-point calculations do not increase much, but the mAP0.5 of the model is improved by 1.08%. This shows that the introduction of CA can increase the model’s attention to the hidden dangers of the transmission line, reduce the false detection rate and missed detection rate, explore the potential for performance improvement at a low cost, and improve the model detection capability. In Experiment 4, the addition of the CSDown module reduced the model parameters and floating-point calculations by 29.69% and 15.11%, while the mAP0.5 of the model increased by 0.33%, indicating that the CSDown module decouples the spatial channels of the feature convolution operation, which can not only effectively reduce the complexity of the model but also perform more refined processing on the channel and spatial dimensions of the features and improve the model’s feature capture capability, thus improving detection accuracy. In Experiment 5, the mAP0.5 of the model improved by 0.87% using the Mish activation function. Combined with Figure 7, the model using the Mish function has a faster convergence speed than other models. This shows that the Mish activation function can improve the convergence and stability of the model, thereby improving the detection performance and effect of the model. In Experiment 6, the RepGhostCSPELAN module and the CA module were added to the model together; the model parameters and floating-point calculations decreased by 21.16% and 12.89%, and the model’s mAP0.5 increased by 1.41%. In Experiment 7, the RepGhostCSPELAN module, CA module, and CSDown module were added to the model, and the model parameters and floating-point calculations decreased by 33.96% and 22.22%, respectively, and the mAP0.5 of the model increased by 1.52%. In Experiment 8, the Mish activation function was introduced on the basis of Experiment 7, and the mAP0.5 of the model increased by 1.73%.
It can be seen from Table 3 that after the introduction of SE, CBAM, ECA, and CA, the mAP0.5 of the model is 0.926, 0.930, 0.927, and 0.933, respectively, indicating that the introduction of the attention mechanism improves the detection accuracy of the model. Among them, CA has better detection performance than SE, CBAM, and ECA, but CA has a higher parameter number and floating-point computation compared with SE and ECA. Considering that there is little difference between parameter number and floating-point computation, the detection accuracy is greatly improved, so CA has better overall performance. In the premise of not increasing the complexity of the model, it can improve the detection effect of the model.
As shown in Figure 8, the high-brightness area represents the key part that the network pays attention to. The RepGhostCSPELAN and CSDown models are mainly used to optimize the feature extraction process, and the changes in the heatmap are manifested as more accurate and localized high-intensity areas, which correspond to the position or edge of the target object, indicating that the subtle features and structural information in the image can be captured more effectively. The CA module is introduced in the feature fusion, and more obvious high-brightness areas can be seen in the heatmap, which reflect how the model can pay more attention to the safety hazards of the transmission line, reduce the interference of complex backgrounds, reduce the false detection rate and missed detection rate, and improve the model detection ability. The Mish activation function allows more information to go deep into the neural network. It can be seen in the heatmap that the changes are smoother and more continuous, and the representation of the boundaries and details of the target object may be more accurate and coherent. The heatmap combined with all modules shows higher resolution and accuracy, more accurate and concentrated target area marking, more obvious boundary and detail capture, and smoother and more continuous transition. Through quantitative and qualitative experimental verification, it is shown that the various components designed in our model can effectively reduce the complexity of the model and improve the model detection performance, which has obvious advantages in improving the accuracy of hidden danger object detection in transmission lines.

3.3. Comparison with State-of-the-Art Methods

In order to verify the advantages of the proposed model for the detection of safety hazards in transmission lines, we compared it with state-of-the-art target detection models, which included YOLOv5s, YOLOv7, YOLOv8s, and YOLOv9s. The results are shown in Table 4, and the visualization results are shown in Figure 9.
As can be seen from Table 4, compared with the current mainstream target detection models, the mAP50 and mAP50:95 values of the proposed model are higher than those of other models. The mAP50 is improved by 0.97%, 3.07%, 3.19%, 1.29%, and 2.17%, respectively, compared with YOLOv5s, YOLOv7, YOLOv8s, YOLOv9s, and DETR. Among them, the detection accuracy of ballon, kite, nest, smoke, and trash reached the highest level. The detection accuracy of the ballon category is improved by 0.40%, 0.10%, and 0.30% compared with YOLOv5s, YOLOv7, and YOLOv8s; the detection accuracy of the smoke is improved by 4.13%, 2.89%, and 2.75% compared with YOLOv8s, YOLOv9s, and DETR. This proves that ETFD-YOLO not only performs well in the safety-based task of hidden danger detection on transmission lines but also that the mAP50 and mAP50:95 indexes surpass the current mainstream target detection model, which has significant advantages and competitiveness in this field.
As shown in Figure 9, other target detection models have missed detection and false detection, while the proposed model has no missed detection and false detection. At the same time, the detection effect of the proposed model in all categories is the best. In the nest detection scenario, YOLOv5s, YOLOv7, and YOLOv8s have missed detection, which may be because the nest target is small and has occlusion problems. However, the proposed model can accurately detect it without missed detection, indicating that the proposed model has strong generalization ability. In the fire and smoke detection scenarios, other models have missed detection, but the proposed model has no missed detection, indicating that the proposed model can suppress the interference of other background noise to a certain extent and improve the detection accuracy in complex backgrounds. In the trash detection scenario, YOLOv5s, YOLOv8s, YOLOv9s, and DETR have missed detection or false detection, but the proposed model can accurately detect. This proves that the ETFD-YOLO model has demonstrated excellent performance in a variety of detection scenarios, not only completely avoiding missed and false detection problems but also achieving high-precision detection in the face of challenges such as nest’s small size and occlusion, complex background interference from fire and smoke, and the diversity of trash detection. It is proved that the model is advanced, effective, and highly reliable in the field of transmission line safety hazard detection.

3.4. Robustness Comparative Experiment

In order to demonstrate the robustness of the model in this paper, we conducted a comparative experiment under simulated natural weather conditions, including the effects of light and noise. The experimental results are shown in Table 5, and the visualization results are shown in Figure 10, Figure 11 and Figure 12.
It can be seen from Table 5 that the mAP5 of all the models decreases when the brightness of the image is changed and the image is interfered with by noise. Among them, YOLOv8s has the largest decrease in the mAP50 of darkened images and brightened images, which are 4.51% and 8.24%, respectively; the YOLOv7 model has the largest decrease in the mAP50 of noisy images, which is 16.79%; the proposed model has the smallest decrease in the mAP50 of darkened images, brightened images, and noisy images, which are 1.49%, 1.70%, and 4.05%, respectively.
As shown in Figure 10, Figure 11 and Figure 12, it can be seen that due to the poor image quality, other object detection models have serious missed detection and false detection rates. Although the accuracy of the proposed model has decreased, it can still correctly detect the types of safety hazards of power transmission lines, and no serious false detection and missed detection phenomena have occurred. In the darkened image nest detection scene, due to the insufficient light brightness, other models have missed detection, but the proposed model accurately detects it without missed detection. In the brightened image trash detection scene, due to the enhanced light brightness, YOLOv5s and YOLOv7 have false detection, YOLOv8s and DETR have serious missed detection, and the proposed model accurately detects the trash target. In the noisy image fire and smoke detection scene, due to the noise causing image blur, YOLOv5s, YOLOv7, and DETR have multiple missed detections, while other models did not detect any targets and had serious missed detections. The proposed model only misses one target, and all other targets are correctly detected. This proves that ETFD-YOLO shows stronger robustness, stability, and detection accuracy than other commonly used target detection models under complex conditions simulating different natural environments and can accurately identify safety hazards of power lines even under poor image quality, which fully verifies the reliability and advantages of this model in practical applications.

3.5. Comparison of Parameters, Detection Speed, and Detection Accuracy

In order to further verify the feasibility and advantages of the proposed model in actual application scenarios, especially in edge device deployment, we compare the parameters, detection speed, and detection accuracy with the common object detection model. The experimental results are shown in Figure 13.
Figure 13 shows the relationship between the model parameters and detection speed and detection accuracy. It can be seen that the proposed model achieves the highest detection accuracy under the conditions of minimum model parameters and fastest detection speed. This proves that the proposed model can run efficiently on edge devices with limited computing resources and can process video streams or image inputs in real time. It can realize the real-time monitoring of potential dangers around transmission lines, timely discover and warn about potential safety hazards, effectively prevent and reduce accidents and power outages caused by safety hazards of transmission lines, and improve the safety and stability of the power supply system.

4. Conclusions

In order to realize the intelligent, safety-based detection of hidden danger targets in transmission lines and construct an intelligent, regional transmission line safety monitoring system, this paper proposes an edge–real-time transmission line safety hazard detection method (ETLSH-YOLO). Among them, the RepGhostCSPELAN module significantly reduces the number of model parameters and floating-point calculations and effectively integrates the information of different feature layers to improve the feature expression ability of the model. The CSDown module decouples the spatial channel of the feature convolution operation, which not only effectively reduces the complexity of the model but also performs more refined processing on the channel and spatial dimensions of the feature, thereby improving the model’s feature capture ability. The CA module effectively improves the detection accuracy of hidden danger targets amid complex backgrounds in transmission lines and enhances the model’s attention to key areas. The Mish activation function better captures the complex nonlinear relationships in the data, improving model convergence, stability, and detection performance. On the power transmission line safety hazard data set, ETLSH-YOLO significantly improved the detection accuracy of the model and significantly reduced the complexity of the model compared with the baseline model and had higher accuracy and better adaptability to complex environments than other object detection models. In the future, we will continue to optimize the model structure and find a more efficient and accurate model structure to meet the needs of edge deployment through continuous research and experiments. We aim to strengthen cross-domain adaptability and train and optimize the model according to different application scenarios and weather conditions so as to improve its generalization ability in various environments, promote the application of the model in actual scenarios, and build an intelligent and high-precision protective barrier for power grid security.

Author Contributions

Conceptualization, L.Z.; methodology, L.Z.; software, L.Z. and Y.D.; validation, L.Z. and Y.Z.; formal analysis, L.Z. and Y.D.; investigation, L.Z. and Y.J.; resources, Y.D.; data curation, L.Z., Y.J. and Q.L.; writing—original draft preparation, L.Z.; writing—review and editing, L.Z. and Y.D.; visualization, Q.L.; supervision, L.Z. and Y.J.; project administration, Y.D.; funding acquisition, Y.Z. and Y.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Shanxi Provincial Key Research and Development Project (Grant number 202102060301020) and Shanxi Provincial Higher Education Science and Technology Innovation Project (Grant number 2022L524).

Data Availability Statement

The data that support the findings of this study are available upon reasonable request from the reader.

Conflicts of Interest

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

References

  1. Wu, D.; Zhang, J.; Zhou, Q.; Zhang, L.; Gong, H. An overview of the evolution of security and stability of china’s power system. China Electr. Power. 2024, 1–12. Available online: http://kns.cnki.net/kcms/detail/11.3265.TM.20240528.1355.004.html (accessed on 10 September 2024).
  2. Lin, Y.; Zhou, T.; Wang, Z. A High-Reliable Wireless Sensor Network Coverage Scheme in Substations for the Power Internet of Things. Symmetry 2023, 15, 1020. [Google Scholar] [CrossRef]
  3. He, H.; Zhang, Z.; Jia, Q.; Huang, L.; Cheng, Y.; Chen, B. Wildfire detection for transmission line based on improved lightweight YOLO. Energy Rep. 2022, 9, 512–520. [Google Scholar] [CrossRef]
  4. Wu, Y.; Wang, Q.; Guo, N.; Tian, Y.; Li, F.; Su, X. Efficient Multi-Source Self-Attention Data Fusion for FDIA Detection in Smart Grid. Symmetry 2023, 15, 1019. [Google Scholar] [CrossRef]
  5. Ou, Z.B.; Huang, Z.H.; Qiu, S. Application of computer remote control uav in transmission line inspection. In Proceedings of the 2024 IEEE 7th International Electrical and Energy Conference (CIEEC), Harbin, China, 10–12 May 2024; pp. 1–6. [Google Scholar] [CrossRef]
  6. Rahimifar, M.; Jahanirad, H.; Fathi, M. Deep transfer learning approach for digital circuits vulnerability analysis. Expert Syst. Appl. 2023, 237, 121757. [Google Scholar] [CrossRef]
  7. Huang, X. Survey of intelligent inspection based on image perception. High Volt. Eng. 2024, 50, 1826–1841. [Google Scholar] [CrossRef]
  8. Chen, S.; Tian, Y.; Dai, Z.; Lin, J.; Huang, R.; Wang, H. Construction of distribution network fault detection model based on artificial intelligence algorithm. In Proceedings of the 2023 International Conference on Power, Electrical Engineering, Electronics and Control (PEEEC), Athens, Greece, 25–27 September 2023; pp. 781–787. [Google Scholar] [CrossRef]
  9. Bhattacharjee, J.; Kujur, N.N.; Varma, P.R.; Affijulla, S. Fault detection and classification in transmission lines using artificial intelligence. In Proceedings of the 2023 5th International Conference on Energy, Power and Environment: Towards Flexible Green Energy Technologies (ICEPE), Shillong, India, 15–17 June 2023; pp. 1–6. [Google Scholar] [CrossRef]
  10. Chen, W.; Liu, X.; Niu, B.; Zhang, X.; Liu, H.; Duan, L. A brief discussion on the application of intelligent robots in power transmission line inspection. China Equip. Eng. 2023, 33–35. [Google Scholar] [CrossRef]
  11. Ambatkar, H.P.; Dhatrak, R.K. Drone applications in transmission line. In Proceedings of the 2022 International Mobile and Embedded Technology Conference (MECON), Noida, India, 10–11 March 2022; pp. 531–533. [Google Scholar] [CrossRef]
  12. Han, J.; Li, M.; Zhao, B. Modeling and application of transmission line panoramic monitoring platform. In Proceedings of the 2022 IEEE 6th Information Technology and Mechatronics Engineering Conference (ITOEC), Chongqing, China, 4–6 March 2022; pp. 564–568. [Google Scholar] [CrossRef]
  13. Wu, J.; Du, W.; Wang, J.; Yang, G.; Zhao, Y. Application of power internet of things in online monitoring of transmission lines. In Proceedings of the 2023 Smart City Challenges & Outcomes for Urban Transformation (SCOUT), Singapore, 29–30 July 2023; pp. 19–23. [Google Scholar] [CrossRef]
  14. Singh, N.; Paliwal, P. Planning and monitoring of smart grid architecture using internet of things. In Proceedings of the 2022 IEEE 6th International Conference on Condition Assessment Techniques in Electrical Systems (CATCON), Durgapur, India, 17–19 December 2022; pp. 12–16. [Google Scholar] [CrossRef]
  15. Li, Q.; Tang, W. An anomaly detection method for smart power grid: A federated learning framework. In Proceedings of the 2023 6th International Conference on Data Science and Information Technology (DSIT), Shanghai, China, 28–30 July 2023; pp. 73–77. [Google Scholar] [CrossRef]
  16. Khan, S.; Khan, A. Ffirenet: Deep learning based forest fire classification and detection in smart cities. Symmetry 2022, 14, 2155. [Google Scholar] [CrossRef]
  17. Zou, D.M.; Liu, Y.; Lin, H.; Qiu, Z. An improved algorithm for foreign objects detection on power transmission lines. In Proceedings of the 2023 9th International Conference on Systems and Informatics (ICSAI), Changsha, China, 16–18 December 2023; pp. 1–6. [Google Scholar] [CrossRef]
  18. Zhao, M.; Barati, M. A real-time fault localization in power distribution grid for wildfire detection through deep convolutional neural networks. IEEE Trans. Ind. Appl. 2021, 57, 4316–4326. [Google Scholar] [CrossRef]
  19. Girshick, R. Fast r-cnn. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar] [CrossRef]
  20. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
  21. Feng, J.; Pan, S.; Zhao, S.; Peng, L.; Fan, X. Research on few-shot power detection of siamese network based on improved rpn. J. Hebei Univ. Sci. Technol. 2023, 44, 67–73. [Google Scholar] [CrossRef]
  22. Xue, Y.; Wu, H.D.; Zhang, N.; Yu, Z.C.; Ye, X.K.; Hua, X. Detection of insulation piercing connectors and bolts on the transmission line using improved faster r-cnn. Laser Optoelectron. Progress 2020, 57, 76–83. [Google Scholar] [CrossRef]
  23. Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. Yolov6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
  24. Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 7464–7475. [Google Scholar] [CrossRef]
  25. Wang, C.Y.; Yeh, I.H.; Liao, H.Y.M. Yolov9: Learning what you want to learn using programmable gradient information. arXiv 2024, arXiv:2402.13616. [Google Scholar]
  26. Yan, L.; Chen, Z.; Wu, X.; Yuan, X.; Zhu, J.; Li, J. Object detection method based on improved ssd algorithm for smart grid. In Proceedings of the 2021 IEEE 5th Conference on Energy Internet and Energy System Integration (EI2), Taiyuan, China, 22–24 October 2021; pp. 3020–3024. [Google Scholar] [CrossRef]
  27. Ang, X.; Enyu, J.; Wentao, Z.; Shunfu, L.; Yang, M. Detection of foreign bodies in transmission line channels based on the fusion of swin transformer and yolov5. J. Shanghai Jiaotong Univ. 2023, 1–22. [Google Scholar] [CrossRef]
  28. Liu, X.; Rao, Z.; Lin, N. Object detection method for foreign substances on high-voltage transmission lines based on deep learning. In Proceedings of the 2023 18th International Conference on Intelligent Systems and Knowledge Engineering (ISKE), Fuzhou, China, 17–19 November 2023; pp. 580–584. [Google Scholar] [CrossRef]
  29. Liu, B.; Huang, J.; Lin, S.; Yang, Y.; Qi, Y. Improved yolox-s abnormal condition detection for power transmission line corridors. In Proceedings of the 2021 IEEE 3rd International Conference on Power Data Science (ICPDS), Harbin, China, 26–26 December 2021; pp. 13–16. [Google Scholar] [CrossRef]
  30. Xu, W.; Xiwen, C.; Haibin, C.; Yi, C.; Jun, Z. Foreign object detection method in transmission lines based on improved yolov8n. In Proceedings of the 2024 10th International Symposium on System Security, Safety, and Reliability (ISSSR), Xiamen, China, 16–17 March 2024; pp. 196–200. [Google Scholar] [CrossRef]
  31. Zhang, Z.; He, H. An improved DETR power line channel fire smoke detection method. Small Microcomput. Syst. 2024, 670–675. [Google Scholar]
  32. Yan, S.; Gao, L.; Wang, W.; Cao, G.; Han, S.; Wang, S. An algorithm for power transmission line fault detection based on improved YOLOv4 model. Sci. Rep. 2024, 14, 5046. [Google Scholar] [CrossRef] [PubMed]
  33. Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. Yolov10: Real-time end-to-end object detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
Figure 1. Regional transmission line safety monitoring system and components.
Figure 1. Regional transmission line safety monitoring system and components.
Symmetry 16 01378 g001
Figure 2. ETLSH-YOLO network structure.
Figure 2. ETLSH-YOLO network structure.
Symmetry 16 01378 g002
Figure 3. RepGhostCSPELAN module.
Figure 3. RepGhostCSPELAN module.
Symmetry 16 01378 g003
Figure 4. CSDovn module.
Figure 4. CSDovn module.
Symmetry 16 01378 g004
Figure 5. Coordinate attention module.
Figure 5. Coordinate attention module.
Symmetry 16 01378 g005
Figure 6. Silu() and Mish() function curves.
Figure 6. Silu() and Mish() function curves.
Symmetry 16 01378 g006
Figure 7. [email protected] data curves of different modules.
Figure 7. [email protected] data curves of different modules.
Symmetry 16 01378 g007
Figure 8. Heat map visual comparison results of different modules.
Figure 8. Heat map visual comparison results of different modules.
Symmetry 16 01378 g008
Figure 9. Detection effects of different methods.
Figure 9. Detection effects of different methods.
Symmetry 16 01378 g009
Figure 10. Detection effects of different models under darkened image settings.
Figure 10. Detection effects of different models under darkened image settings.
Symmetry 16 01378 g010
Figure 11. Detection effects of different models under brightened image settings.
Figure 11. Detection effects of different models under brightened image settings.
Symmetry 16 01378 g011
Figure 12. Detection effects of different models under noisy image settings.
Figure 12. Detection effects of different models under noisy image settings.
Symmetry 16 01378 g012
Figure 13. Parameters, detection speed, and detection accuracy.
Figure 13. Parameters, detection speed, and detection accuracy.
Symmetry 16 01378 g013
Table 1. Detailed partitioning of data sets.
Table 1. Detailed partitioning of data sets.
BallonKiteNestTrashFireSmoke
train4030100307080
val806020060140160
test280210700210490560
Table 2. Comparison of ablation experiment results of different modules.
Table 2. Comparison of ablation experiment results of different modules.
RepGhostCSPELANCACSDownMishParams(M)FLOPs(G)PrecisionRecallmAP0.5
7.1026.40.9210.9050.927
5.8622.50.9160.8910.923
4.2618.70.9210.9020.928
6.0223.20.9220.9100.933
4.1219.10.9200.9090.926
5.8622.50.9240.9060.931
4.6219.60.9270.9160.936
3.8717.50.9300.9170.937
3.8717.50.9350.9200.939
The bold number indicates the current optimal value.
Table 3. Comparison of different attention performances.
Table 3. Comparison of different attention performances.
ModelParams (M)FLOPs (G)mAP0.5
SE5.9322.50.926
CBAM6.1525.20.930
ECA5.9522.60.927
CA6.0223.20.933
The bold number indicates the current optimal value.
Table 4. Comparison of experimental results of different models.
Table 4. Comparison of experimental results of different models.
MethodClassPrecisionRecallmAP0.5mAP0.5:0.95
BallonKiteNestFireSmokeTrash
YOLOv5s0.9920.9810.9390.9730.7730.9170.9180.9100.9300.572
YOLOv70.9950.9870.9620.9790.7350.8780.9170.8580.9110.545
YOLOv8s0.9930.9840.8950.9600.7510.8760.9070.8670.9100.541
YOLOv9s0.9930.9800.9500.9700.7600.9160.9300.9160.9270.578
DETR0.9860.9810.9330.9640.7610.8950.9250.9150.9190.550
ETLSH-YOLO0.9960.9870.9820.9780.7820.9550.9350.9200.9390.595
The bold number indicates the current optimal value.
Table 5. Comparison of experimental results of different models in different environments.
Table 5. Comparison of experimental results of different models in different environments.
MethodNormal ImageDarkened ImagesBrightened ImagesNoisy Images
YOLOv5s0.9300.9040.9090.873
YOLOv70.9110.8910.8930.758
YOLOv8s0.9100.8690.8350.805
YOLOv9s0.9270.9060.8870.811
DETR0.9190.9050.8520.847
ETFD-YOLO0.9390.9250.9230.901
The bold number indicates the current optimal value.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhao, L.; Zhang, Y.; Dou, Y.; Jiao, Y.; Liu, Q. ETLSH-YOLO: An Edge–Real-Time Transmission Line Safety Hazard Detection Method. Symmetry 2024, 16, 1378. https://doi.org/10.3390/sym16101378

AMA Style

Zhao L, Zhang Y, Dou Y, Jiao Y, Liu Q. ETLSH-YOLO: An Edge–Real-Time Transmission Line Safety Hazard Detection Method. Symmetry. 2024; 16(10):1378. https://doi.org/10.3390/sym16101378

Chicago/Turabian Style

Zhao, Liangliang, Yu Zhang, Yinke Dou, Yangyang Jiao, and Qiang Liu. 2024. "ETLSH-YOLO: An Edge–Real-Time Transmission Line Safety Hazard Detection Method" Symmetry 16, no. 10: 1378. https://doi.org/10.3390/sym16101378

APA Style

Zhao, L., Zhang, Y., Dou, Y., Jiao, Y., & Liu, Q. (2024). ETLSH-YOLO: An Edge–Real-Time Transmission Line Safety Hazard Detection Method. Symmetry, 16(10), 1378. https://doi.org/10.3390/sym16101378

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop