1. Introduction
Electricity is the basis for the functioning of modern society, from lighting and heating to industrial manufacturing and digital communication; all of them are driven by it. With the progress of science and technology and the acceleration of industrialization, the demand for electricity has been increasing [
1], which has brought great pressure on the power system, and how to ensure the safe and stable operation of large-scale power grids has become an urgent problem to be solved. It is very critical to detect key high-voltage transmission accessories, especially suspension clamp, grading ring, vibration damper, link fitting, and insulators to maintain the operation of a power system. These accessories together ensure the mechanical stability, electrical insulation, and operational safety of transmission lines. Their failure will lead to cascading blackouts, so their accurate identification and state evaluation is very important to improve the reliability of a power grid and prevent catastrophic accidents. Among them, transmission accessories detection faces a series of difficulties. On the one hand, the existing transmission accessories detection method is susceptible to environmental influences, especially in harsh environments, detection workers carrying out transmission accessories detection, and maintenance; the slightest carelessness can lead to personal injury accidents. On the other hand, the traditional transmission accessories detection method often relies on a manual method [
2,
3], which requires high professional skills of detection personnel, and detection personnel are extremely fatigued during the detection process; it is easy to detect misjudgment and a leakage detection problem. For this reason, in view of the difficulty of inspecting the transmission network at high altitude and high voltage, researching a deep learning-based transmission accessories detection algorithm has become an inevitable trend in the future [
4,
5,
6].
In recent years, for the task of power object detection in visible light images, domestic and foreign scholars have carried out a lot of research work. With the continuous development of technology, the detection frameworks adopted by scholars are also advancing, resulting in higher and higher detection accuracy. Most of the detection methods initially adopted by researchers are traditional image processing algorithms, such as in the detection of transmission conductors [
7]. Guo et al. [
8] argue that the combined matching algorithm utilizing SURF and FLANN is employed for solving. The SURF features exhibit exceptional performance in capturing details, while the utilization of the Hessian matrix for extremum computation enhances the speed of feature extraction, rendering it straightforward and efficient. The FLANN algorithm, employing a tree-based structure for storage and search, effectively addresses the issue of slow high-dimensional feature matching. In insulator detection, Zhao et al. [
9] proposed using SIFT (Scale Invariant Feature Transform) to extract the features of each insulator, and then using RANSAC (Random Sample Consistency) to remove outliers, so that the matching inner points of each template are accurate points, which has high accuracy in identifying and locating insulators in a complex background. For the detection of spacer bars [
10], a single sliding time window was used to modify the micro meteorological data, and then the gray correlation analysis method and specific gravity method were used to obtain the influence weights, constructing a GA-BP-SVM combination model. Texture, gray distribution, and other features were used to achieve segmentation of the sky area; a threading method and rotating projection were then combined to realize the exact positioning of transmission lines. A sliding window was used to traverse the transmission lines and detect the spacers according to Laws’ texture energy values of projection curves in the window.
Nowadays, more and more scholars adopt deep learning algorithms to achieve the power object detection task [
11], because deep learning algorithms can adapt to more complex detection scenarios, and the detection accuracy is better than that of other methods. The object detection algorithms based on deep learning are mainly divided into two categories. The first category is a two-stage target detection model, which mainly includes Mask RCNN [
12], Faster RCNN [
13], and Cascade RCNN [
14]. The two-stage network model generally has high detection accuracy, but the model is too large to meet the real-time detection requirements. The second category is the one-stage object detection model, including RetinaNet [
15], SSD [
16], YOLO series [
17,
18,
19], and EfficientDet [
20]. Because of its relatively small model and low hardware requirements, the update iteration is very fast at present, and the detection accuracy of the latest one-stage network model is not much different from or even better than that of the two-stage network model. Zhang et al. [
21] put forward an MS-COCO pre-training strategy to improve the accuracy, and improved Cascade R-CNN based on the ResNeXt-101 network and FPN module to realize accurate detection of insulators on transmission lines, but its inference speed is not satisfactory. Zhao et al. [
22] designed an automatic visual shape clustering network (AVSCNet), proposed an unsupervised visual shape clustering method for bolts, and used three deep convolution neural network optimization methods in the model. These methods have excellent detection accuracy and are very suitable for the defect detection of transmission lines in cloud servers. However, the currently designed network model has large parameters and still has the potential for optimization. Hence, some studies began to explore the use of a lightweight model to optimize the network structure in order to achieve faster reasoning speed. Ning et al. [
23] used MobileNet combined with YOLOv3 to improve the IoU-Kmeans algorithm for target location prediction, which increased the number of network layers and enriched the feature mapping module while maintaining a small amount of computation. Peng et al. [
24] constructed the EDF-YOLOv5 model for detecting transmission line defects through inspection images of unmanned aerial vehicles. Based on YOLOv5, this model introduced an advanced semantic feature information extraction network, DCNv3C3, and a bounding box loss function, Focal-CIoU, which improved its generalization ability for defects with different shapes. A lightweight network is designed to speed up the processing speed, but it also leads to the reduction in model accuracy. In the complex environment that requires high-precision detection, this network may not meet the needs of practical applications.
As shown in
Figure 1, it is worth noting that most of the transmission accessories are dense small targets, and the contrast with the surrounding environment is low, which makes it difficult for the object detection algorithm to accurately identify and locate small targets. Moreover, the field of view of the camera is limited, and small targets may be affected by factors such as viewing angle and occlusion, which in turn reduces the accuracy of object detection. Consequently, this paper proposes an efficient and lightweight WCANet model for high-voltage transmission accessories. The main contributions of this research are summarized below.
(1) The plug-and-play WCA module is designed, which employs adaptive adjustment of the number of operation channels to help balance the feature extraction and representation capabilities and ensure the computational efficiency, enabling the network to focus on the key features of small-target objects and suppress the interference of the complex background.
(2) The novel network WCANet for high-voltage transmission accessory detection is proposed, which suggests that Sim-AFPN can be able to extract features with different detail information and semantic information from various scale layers with reduced computational overhead, which makes the feature representation richer and more comprehensive, and thus achieves better detection results.
(3) The performance of the proposed model is tested on the self-built high-voltage transmission accessories dataset and the public dataset VisDrone2019. Compared with other mainstream target detection models, the model significantly improves the accuracy of image detection, and both parameters and size size are more lightweight.
This paper is organized as follows:
Section 2 describes the methodology, including the proposed WCANet architecture and specific innovations.
Section 3 describes the experimental details and training results, including a comparison of the innovations, ablation experiments, and comparisons between different models.
Section 4 concludes this paper.
5. Conclusions
In this paper, we propose a new efficient lightweight detection framework, WCANet, for improving detection efficiency, accuracy, and cost reduction in high-voltage transmission environments. In this network, the designed plug-and-play WCA module can be easily combined in different computer vision tasks to significantly enhance a small target feature representation. The proposed Sim-AFPN network structure integrates key area features from different layers by connecting and adding initial information layer by layer. A new WIoU loss function is utilized in WCANet to reduce the competitiveness of high-quality anchor boxes and mask the impact of low-quality examples through dynamic non-monotonic FM. Experiments show that, in our construct HVTA dataset, WCANet has significantly reduced the computational complexity; only 2.2 M parameters are about 26% smaller than YOLOv8s, and the model weight of 5.2 MB has achieved the optimal solution. At the same time, it maintains a competitive accuracy of 75.67%. In addition, in the VisDrone2019 benchmark test, Sim-AFPN improves the small target detection AP to 34.38%, and verifies the effectiveness of the WCA module in enhancing the feature representation of small targets. WCANet is superior to other models in the number of parameters, model size, and mAP, making it a competitive object detection network.