Lightweight Oriented Detector for Insulators in Drone Aerial Images

Qu, Fengrui; Lin, Yu; Tian, Lianfang; Du, Qiliang; Wu, Huangyuan; Liao, Wenzhi

doi:10.3390/drones8070294

Open AccessArticle

Lightweight Oriented Detector for Insulators in Drone Aerial Images

by

Fengrui Qu

¹,

Yu Lin

¹,

Lianfang Tian

^1,2,*,

Qiliang Du

^1,3,*,

Huangyuan Wu

¹ and

Wenzhi Liao

⁴

¹

School of Automation Science and Engineering, South China University of Technology, Guangzhou 510641, China

²

Guangdong Engineering Research Center of Cloud-Edge-End Collaboration Technology for Smart City, Guangzhou 510641, China

³

The Key Laboratory of Autonomous Systems and Network Control of Ministry of Education, Guangzhou 510641, China

⁴

Department of Telecommunications and Information Processing, Ghent University, St-Pietersnieuwstraat 41, B-9000 Gent, Belgium

^*

Authors to whom correspondence should be addressed.

Drones 2024, 8(7), 294; https://doi.org/10.3390/drones8070294

Submission received: 30 April 2024 / Revised: 19 June 2024 / Accepted: 25 June 2024 / Published: 28 June 2024

Download

Browse Figures

Versions Notes

Abstract

Due to long-term exposure to the wild, insulators are prone to various defects that affect the safe operation of the power system. In recent years, the combination of drones and deep learning has provided a more intelligent solution for insulator automatic defect inspection. Positioning insulators is an important prerequisite step for defect detection, and the accuracy of insulator positioning greatly affects defect detection. However, traditional horizontal detectors lose directional information and it is difficult to accurately locate tilted insulators. Although oriented detectors can predict detection boxes with rotation angles to solve this problem, these models are complex and difficult to apply to edge devices with limited computing power. This greatly limits the practical application of deep learning methods in insulator detection. To address these issues, we proposed a lightweight insulator oriented detector. First, we designed a lightweight insulator feature pyramid network (LIFPN). It can fuse features more efficiently while reducing the number of parameters. Second, we designed a more lightweight insulator oriented detection head (LIHead). It has less computational complexity and can predict rotated detection boxes. Third, we deployed the detector on edge devices and further improved its inference speed through TensorRT. Finally, a series of experiments demonstrated that our method could reduce the computational complexity of the detector by approximately 49 G and the number of parameters by approximately 30 M while ensuring almost no decrease in the detection accuracy. It can be easily deployed to edge devices and achieve a detection speed of 41.89 frames per second (FPS).

Keywords:

drone insulator inspection; oriented detector; lightweight; deep learning; computer vision

1. Introduction

Insulators are important components in power systems. They play a crucial role in ensuring the safe operation of the power system. Due to long-term exposure to the wild, transmission line insulators are prone to various defects caused by external forces, such as string dropping [1] and bursting [2]. Therefore, insulator defect detection has been one of the issues that a power department needs to address. Traditional insulator defect detection adopts manual on-site inspection, which is not only time consuming and labor intensive but is also relatively dangerous. In recent years, with the development of drone technology and deep learning, a new inspection method of drone cooperation with deep learning has emerged that utilizes drones carrying cameras and other detection instruments to take photos of insulators, and using deep learning methods to automatically detect insulator defects in the images [3]. This approach greatly reduces the labor time cost and danger of insulator inspection, enabling more intelligent insulator inspections. Insulator defect detection usually requires first detecting the insulator in the image, and then detecting the defect based on the insulator position. Therefore, more accurate insulator positioning can greatly affect the subsequent defect detection. With the great success of convolutional neural network-based methods in vehicle [4], pedestrian [5], ship [6], and other detection tasks [7], in recent years, many works have also proposed insulator detection methods based on such methods. However, most of these methods use horizontal object detectors. Insulator string has a unique shape, which is a slender object with a large aspect ratio. As shown in Figure 1a, when detecting inclined insulator strings, the horizontal detector has too many background pixels in its detection box, making it difficult to accurately locate the insulators. In addition, as shown in Figure 1c, when detecting two adjacent inclined insulators, the overlapping area between the two horizontal boxes is too large, and the non-maximum suppression (NMS) algorithm will simply recognize them as detection boxes of the same insulator string, thereby removing one of the detection boxes and causing a missed detection.

The oriented object detector used in remote sensing and scene text detection can generate rotated detection boxes with rotation angles. Interestingly, using it for insulator detection can effectively solve the problems of horizontal detectors. Figure 1 shows a comparison of the detection effects of a horizontal object detector and oriented object detector on insulators. The green horizontal and rotating boxes in the figure represent the detection boxes for insulators generated by the horizontal and oriented object detectors, respectively. The rotation detection box for insulators generated by the oriented detector has fewer background interference pixels and can more accurately locate tilted insulators. In addition, when detecting two adjacent insulator strings, there is no overlapping area between the two rotating detection boxes, which will not cause the NMS algorithm to misjudge them as detection boxes for the same insulator, thus avoiding missed detections. Therefore, oriented detectors can improve the accuracy of insulator positioning, reduce adjacent insulator missed detections, and are more suitable for insulator detection tasks.

Due to the need to generate prediction boxes with rotation angles, currently universal oriented detectors, such as Oriented RCNN [8] and ROI Transformer [9], typically have a large number of parameters and calculations. This is difficult for edge devices with limited computing power to bear. However, power departments hope to deploy detectors on edge devices, which can be easily portable. This creates a contradiction between the detection accuracy and practical applications.

Therefore, this study aimed to design a lightweight oriented insulator detector that reduces the parameters and computational complexity of the detector while ensuring high insulator detection accuracy. Therefore, it can be deployed on edge devices to achieve practical application value. The main contributions of this study were as follows:

We designed a lightweight insulator feature pyramid network (LIFPN) that effectively reduces the number of feature fusion paths and convolutions, greatly reducing the number of model parameters while ensuring a high detection accuracy.
We designed a lightweight insulator oriented detection head (LIHead) that can not only generate rotation boxes with rotation angles but also has fewer parameters and a lower computational complexity.
We selected the suitable lightweight backbone through experiments and combined it with LIFPN and LIHead to form a lightweight oriented detector. We deployed it on the edge device Nvidia AGX Orin and verified the real-time performance of the model.

The remaining parts of this paper are as follows: Section 2 introduces the relevant work in the field of insulator detection. Section 3 introduces our proposed lightweight insulator oriented detector. Section 4 conducts ablation and comparative experiments on the proposed method. Section 5 summarizes the methods proposed in this paper and provides prospects for future research directions. The code will be available at https://github.com/insulator123-code/lightweight_insulator_detector (accessed on 10 June 2024).

2. Related Work

2.1. Object Detection

With the tremendous success of the region convolutional neural network (RCNN) in the field of object detection in 2014, many classic detectors based on convolutional neural networks (CNNs) have been continuously proposed. These detectors can be divided into two categories based on whether there is a proposal box generation stage: two-stage detectors and single-stage detectors. The most representative two-stage detector is the Faster RCNN series. In order to solve the computational redundancy problem caused by the RCNN, the creators of Fast RCNN [10] proposed region of interest (ROI) pooling, which performs feature extraction once for the entire image and uses ROI pooling to map different candidate regions to fixed size feature maps, significantly improving detection speed. On this basis, the creators of Faster RCNN [11] further proposed the region proposal network (RPN) for generating candidate regions, thereby achieving end-to-end training and inference. In recent years, many more advanced two-stage detectors have been proposed, the creators of Cascade RCNN introduced cascaded detectors to gradually improve the detection performance through multi-stage detection. The creators of Libra RCNN [12] proposed the intersection over union (IoU) balanced sampling, a balanced feature pyramid, and balanced L1 loss to address three types of imbalance issues in object detection. These two-stage detectors have a high detection accuracy, but their detection speed is always limited.

To address this issue, many single-stage detectors have been proposed. The most representative ones are the You Only Look Once (YOLO) [13] series and RetinaNet. The YOLO series eliminated the proposal generation stage and considered the object detection task as a regression problem. The models of the YOLO series, such as YOLO v2 [14], YOLO v3 [15], YOLO v4 [16], and YOLO v5, continuously improved the detection accuracy and speed of detectors by introducing anchors and feature fusion networks, as well as improving the loss functions and network structures. RetinaNet [17] is a very popular single-stage network structure, where its creators proposed focal loss to solve the problem of positive and negative sample imbalance in single-stage object detectors.

In contrast, two-stage detectors have a higher detection accuracy, while single-stage detectors have a faster detection speed and smaller computational complexity. However, regardless of the type of detector mentioned above, the detection boxes they generate are all horizontal boxes. For some special detection scenarios, such as remote sensing images object detection, the target in the image usually has directional information because it is shot from an overlooking angle. Using a horizontal detection box will undoubtedly lose this directional information. Therefore, some oriented detectors have been proposed for these special scenarios.

RRPN [18] is an improved oriented detector based on Faster R-CNN. It utilizes a rotation anchor regression prediction box to achieve oriented detection. In order to address the time-consuming issue of generating oriented proposals, ReDet [19] extracts rotation-equivariant features by combining rotation-equivariant networks with detectors. Its creators proposed the rotation-equivariant RoI Align, which adaptively extracts rotation-equivariant features from rotation-invariant features. R³Det [20] is based on the single-stage detector RetinaNet, the creators of which designed a feature refinement module (FRM) to achieve feature reconstruction and alignment through feature interpolation.

These universal oriented detectors can generate rotating boxes with rotation angles. Compared with traditional horizontal detectors, it can more accurately locate the position of objects, especially for tilted objects with large aspect ratios, such as insulators. However, due to the need to predict rotation boxes, their parameters and computational complexity have also increased. Therefore, it is not suitable for edge devices with limited computing power. The excessive model parameters and computational complexity limit the practical application of universal oriented detectors in insulator inspection. It is necessary to design a new lightweight oriented detector for insulator detection, which can be deployed to edge devices and improve the portability of oriented detectors.

2.2. Insulator Detection

Insulators are important components of a power system. Due to their long-term exposure to the outdoors, they are prone to defects caused by external forces. Therefore, in order to ensure the safe operation of the power system, the regular inspection of insulators is an important task for the power department. The traditional manual inspection method is not only laborious but also dangerous. Recently, drone inspections have become increasingly popular. However, the amount of aerial image data captured by drones is enormous, and the manual detection of insulators in the images is time-consuming and labor-intensive. The use of computer vision methods for insulator detection has become a hot research topic (Table 1).

As shown in Table 1, the traditional insulator detection methods in aerial images usually used the color [21], texture [22], shape [23], and other features to locate the insulators in the image through edge detection, threshold segmentation, and other methods. These insulator detection methods based on manually designed features have a good detection accuracy when faced with situations such as fewer images, single shooting angles, and less background interference. However, due to changes in factors such as drone aerial photography distance and lighting, these methods have poor generalization when faced with situations where the image changes significantly or the background is complex, and are prone to missed and false detections. With CNN-based object detection methods achieving good results in various fields, in recent years, many works have proposed insulator detection methods based on CNNs. Work [24] proposes a cascaded RCNN network, which first uses a detection network with VGG as the backbone to detect insulators in the image and cropped the insulator area. Then, a second stage network is used to detect defects within the insulator area. It proposes a data enhancement method based on insulator segmentation and background fusion to solve the problem of insulator defect image scarcity. With a series of improvements, it achieved a precision of 0.91 and a recall of 0.96. Work [2] proposes a multi-granularity fusion network MGFNet, which obtains the multi-granularity features of insulators through a multi-granularity learning strategy and combines them to produce more resilient features. By utilizing non-local interaction, the region relationship attention module learns the difference between normal and defect features, thereby improving the accuracy of insulator defect diagnosis. Work [25] proposes an improved insulator detection method based on YOLO v5, the creators of which designed an adaptive neighborhood-weighted median filtering (NW-AMF) method to reduce noise in images, used RepVGG [27] as the backbone to achieve a balance between the detection accuracy and speed, and proposed the focal EIoU loss function to solve the problem of imbalanced positive and negative samples for small targets. It achieved an advanced detection accuracy and speed. Work [26] proposes an insulator detection method based on oriented detectors to address the issue of missing detection caused by the NMS algorithm for tilted insulators in close proximity. It uses the prediction box with a rotation angle, generated rotation anchors using RRPN, and combines the K-means++ method to generated anchors that are more suitable for the aspect ratio and scale of insulators, significantly improving the detection accuracy of insulators and avoiding the missed detection of tilted insulators in close proximity.

Traditional insulator detection methods are based on prior knowledge and manually designed features, which have limited generalization and low detection accuracy when facing complex backgrounds. The insulator detection method based on deep learning can use a CNN to extract deeper semantic features, thereby improving the generalization and accuracy of the detection method. However, these methods typically use horizontal detectors, and when detecting tilted insulators with large aspect ratios, there are too many background interference pixels in the horizontal detection box, making it difficult to accurately locate the insulators. The insulator detection method based on oriented detectors can generate rotated detection boxes, effectively improving the positioning accuracy of the insulators. However, a power department typically conducts power inspections outdoors, and thus, it hopes to deploy algorithms on portable edge devices. The large parameters of oriented detectors make them difficult to apply to edge devices with limited computing power, which limits their practical application in insulator inspection. Therefore, this paper proposes a lightweight oriented detector that utilizes the characteristics of insulators to make corresponding improvements to reduce the number of parameters and computational complexity while ensuring detection accuracy, making it more suitable for edge devices and enhancing the practical application value of oriented detectors in insulator detection.

3. Methods

Figure 2 is the structural diagram of our proposed lightweight oriented detector. It mainly consists of three parts. The first part is a lightweight backbone for extracting features from drone aerial images of insulators for insulator detection. The second part is our designed lightweight feature fusion network LIFPN, which utilizes fewer feature fusion paths and convolutions to achieve more efficient feature fusion, thereby improving the multi-scale detection ability and accuracy of the model. The third part is our proposed lightweight oriented detection head, which can generate rotation detection boxes with rotation angles while having a small number of parameters. This reduces the computational complexity of the model while accurately locating the insulator. In the following content of this section, we provide a detailed introduction to the various parts of our proposed lightweight oriented detector.

3.1. LIFPN

In order to enable our lightweight (lightweight means that the detector has fewer parameters and lower computational complexity) oriented detector to run more quickly, we adopted a single-stage detector structure. The single-stage detector has no proposal box generation, and thus, its detection speed is faster than that of the two-stage detector. RetinaNet is a very classic and effective single-stage detector, with a very concise structural design that includes three parts: the backbone, neck, and head. This is a very popular design paradigm, and many advanced detectors have also been designed based on this structure. We also adopted this design paradigm. RetinaNet uses a feature pyramid network (FPN) as the neck, which is a very popular feature fusion network. An FPN integrates the feature layers output by the backbone through downsampling, thereby improving the multiscale detection capability of the model. However, due to the fact that an FPN only has a top-down feature fusion path, it is difficult to transmit positioning information from lower layers to higher layers, which limits the detection accuracy of the model. In recent years, many feature fusion networks, such as PAFPN [28] and Bi FPN [29], have fully integrated low-level positioning information and high-level semantic information by adding bottom-up feature fusion paths and different levels of feature refusion. This has improved the multi-scale detection ability of the network and further improved the detection accuracy. However, these feature fusion networks also require many parameters and have a high computational complexity, which is not suitable for edge devices with limited computing power.

To address this issue, we designed a lightweight insulator feature fusion network called LIFPN. First, the experiment showed that the 3 × 3 convolution (Conv) accounted for a significant part of the model computational complexity, where many feature fusion networks use a lot of 3 × 3 Convs to improve the fitting ability of the model. We found through experiments that for relatively simple detection tasks, such as only detecting insulators, even if these 3 × 3 Convs were removed, the fitting ability of the model was still sufficient. Therefore, we removed these redundant 3 × 3 Convs. Second, although the two feature fusion paths in the PAFPN integrate features from different layers more fully, they also bring significant computational complexity. In addition, longer upsampling and downsampling paths also result in more information loss. To solve these problems, we borrowed the design concept of a balanced FPN [12] and designed a feature fusion network, as shown in Figure 3. We fused features on the intermediate feature layer so that C3 and C5 only need one upsampling or downsampling to match the same feature dimension as C4, thereby minimizing information loss. Third, feature fusion methods are usually classified into two types: direct addition or concat. The concat method stacks feature maps together and changes the number of channels in the feature layer after fusion. Therefore, it requires a 1 × 1 Conv to fuse the information between each channel and transform the number of channels to 256. This undoubtedly brings some computation, while the detection accuracy of the two feature fusion methods is almost the same. Therefore, we adopted the direct addition feature fusion method. Fourth, in order to improve the response of feature maps to targets, many attention mechanisms have been proposed in recent years, such as SENet [30] and CBAM [31]. They use the spatial and channel attentions to improve the attention of the model to the spatial position and channel of the target so as to improve the detection accuracy. We introduced the SimAM [32], which is a parameterless attention mechanism that can improve the response to the targets with only a small increase in computational complexity, into the LIFP. We added the SimAM after the feature fusion, which can simultaneously affect three feature layers, further reducing the computational complexity brought about by using the attention mechanism. Of course, these simplifications also inevitably lead to a decrease in the model detection accuracy. Finally, due to the simplification of feature fusion paths, the upsampling and downsampling process becomes more important. In order to minimize the information loss during the upsampling and downsampling processes, we replaced the upsampling with bilinear interpolation to transposed convolution and the downsampling with conventional convolution to dilated convolution. Due to the learnable weight of transposed convolution, this can achieve better upsampling performance compared with bilinear interpolation through training, thereby reducing information loss during the sampling process. Dilated convolution can increase the receptive field while ensuring the same parameter quantity and convolution kernel size, which is more helpful for detecting large-scale insulators. Therefore, using dilated convolution for downsampling is beneficial for insulator detection at different scales.

3.2. LIHead

In order to make the detector run faster, our detection head design drew on the structure of the single-stage detector RetinaNet. The detection head structure of RetinaNet is very classic and effective. Recently, many advanced universal oriented detectors, such as R³Det and S²A-Net [33], also adopted this structure. However, the detection head of RetinaNet can only generate horizontal detection boxes, which makes it difficult to accurately locate tilted insulators. In order to enable our detector to predict rotation boxes with a rotation angle, we modified the last 3 × 3 Conv of the regression branch. We changed the number of output channels from 4A to 5A, where A represents the number of anchors generated at each pixel. The regression branch needs to predict five parameters (x, y, w, h,

θ

), where x is the horizontal coordinate of the center point of the rotation box, y is the vertical coordinate of the center point, w is the width, h is the height, and

θ

is the rotation angle. The definition of the rotation angle adopts the long-side definition method, which uses the angle between the longer side of the rotation box and the horizontal axis as the rotation angle. The clockwise direction is positive, and the counterclockwise direction is negative. At present, universal oriented detectors usually generate anchors with a rotation angle, such as generating six types of anchors at 30^∘ intervals within the range of 0^∘ to 180^∘. Anchors typically have three scales and three aspect ratios. For each pixel, there are 3 × 3 × 6 = 54 anchors, which undoubtedly brings huge computational complexity. To reduce the computational complexity of the model, we directly regressed the rotation prediction boxes from the horizontal anchors. In this way, only 3 × 3 = 9 anchors need to be generated for each point. In addition, the vast majority of anchors are negative samples, which can affect the training due to extreme sample imbalance. Therefore, for insulator detection, the accuracy of using our method is almost the same as that of generating rotating anchors.

The head classification and regression branch of RetinaNet used eight 3 × 3 conventional Convs (as shown in Figure 4) to further extract features and improve the fitting ability of the model. Although this improves the detection accuracy of the model, it also brings a significant number of parameters. The detection head accounts for approximately 13% of the model parameters. In order to reduce the number of parameters and computational complexity of the head, we studied two lightweight insulator detection head structures: LIHead based on deep separable convolution (DS Conv) and LIHead based on group convolution. We demonstrated through experiments that LIHead based on group convolution can significantly reduce model the computational complexity while ensuring detection accuracy, achieving a faster detection speed. In the following text, we provide a detailed introduction to two lightweight head structures and analyze the reasons why we chose the LIHead based on group convolution.

3.2.1. LIHead Based on Deep Separable Convolution

As shown in Figure 5, depthwise separable convolution is a special type of convolution composed of depthwise Conv and pointwise Conv. Depthwise Conv is a special group convolution where the number of input channels is equal to the number of groups and output channels. Due to the fact that it is a separate convolution of each channel and lacks cross-channel information exchange, pointwise Conv is needed to fuse the information of each channel. Pointwise Conv is a convolution where the number of input channels is equal to the number of depthwise Conv groups. It integrates information from various channels to reduce the information loss between channels caused by depthwise Conv. Replacing 3 × 3 convolution with depthwise Conv combined with pointwise Conv can effectively reduce the computational complexity of the convolution. As shown in Figure 5, by assuming that the number of input channels and output channels are both 256, using DSConv can reduce the number of parameters by 8.69 times compared with conventional 3 × 3 Conv. Therefore, many advanced lightweight classification networks, such as MobileNet v2 [34] and Xception [35], use DS Conv, and we also designed a lightweight head structure using DS Conv.

We first attempted to replace all eight 3 × 3 Convs in the RetinaNet head (as shown in Figure 6) with DS Convs, which effectively reduced the number of parameters. However, due to the lack of cross-channel information exchange in the first stage of DS Conv, although the second stage re-fuses each channel, it is still inevitable that there will be information loss. Therefore, replacing all 3 × 3 convolutions with DS Conv resulted in the final model localization accuracy of 0 and training failure. We observed the weights of each DS Conv and found that most of them had weight values of 0. This caused a gradient of 0 during backpropagation, leading to the phenomenon of gradient disappearance. To solve this problem, we borrowed design ideas from MobileNet v2. As shown in Figure 7, we first used a 1 × 1 Conv to increase the dimensionality of the input channels. The process of dimensionality increase can produce more information. The information after the dimensionality increase is richer and more inclusive of information loss. Second, we added a shortcut at both ends of each DS Convs. As shown in Figure 7, even if the weight of the DS Conv becomes 0 during gradient backpropagation, the path relying on the shortcut can still enable the gradient to propagate to the earlier Convs, preventing gradient disappearance caused by the chain rule. Finally, we used a 1 × 1 Conv to reduce the dimensionality of the output channel to 256. Through this structure, we could minimize the information loss caused by using DS Conv, improve the fitting ability of the model, and thus, improve the detection accuracy of the model.

3.2.2. LIHead Based on Group Convolution

Another design approach to produce a lightweight model is to utilize group convolutions, such as the advanced ShuffleNet series classification network. We also attempted to use group convolutions to design a lightweight head. As shown in Figure 8, group Conv is a special type of convolution that divides the channels of the input feature map into multiple groups and convolutions for each group separately. Compared with conventional 3 × 3 Conv, the group Conv can reduce the number of parameters. We first replaced eight 3 × 3 Convs in the head with group Convs. However, a group Conv only involves information exchange between different channels within each group and lacks cross-group information exchange. This will undoubtedly lead to a decrease in detection accuracy. To address this problem, we borrowed the idea of ShuffleNet and added a shuffle operation after each group Conv. As shown in Figure 9, the shuffle operation allowed all channels of the current group to interact with the channels of another group through channel rearrangement, effectively preventing performance degradation caused by a lack of cross-group information interaction. In addition, the shuffle operation could improve the detection accuracy of the model with almost no increase in computation and parameter.

3.2.3. Comparison of Two LIHead Structures

LIHead based on deep separable convolution can effectively reduce the computational complexity. However, due to the lack of cross-channel information exchange in the first stage of DS Conv, it is easy to cause information loss. Although the use of dimensionality and a shortcut connection can effectively reduce information loss and improve the detection accuracy of the model, these operations also bring computational complexity, which limits how lightweight the model can become. In addition, DS Conv involves more fragmentation operations, which limits the improvement of the detection speed. Especially on edge devices, excessive fragmentation operations have a significant impact on the detection speed. The process of increasing and decreasing dimensions also brings some reasoning time. LIHead based on group convolution can also effectively reduce the computational complexity. Compared with LIHead based on deep separable convolution, it can further reduce the number of model parameters by adjusting the number of groups. In addition, due to the use of only group convolution and shuffle operations, there is no excessive fragmentation operation. Compared with LIHead based on deep separable convolution, it can achieve a faster detection speed and is more suitable for edge devices. Therefore, we ultimately chose the LIHead based on group convolution as the lightweight head of the model.

3.3. Lightweight Backbone Selection

In recent years, there have been many lightweight backbones, where the most advanced and representative are the MobileNet series and ShuffleNet series. MobileNet v1 [36] utilizes deep separable convolutions to replace conventional convolutions in VGG [37], resulting in a 1% decrease in detection accuracy and a reduction of approximately 32 times in the number of model parameters. However, many kernel weights of DS Convs are 0 and are not involved in the actual calculations, resulting in gradient vanishing. To address this issue, MobileNet v2 proposed an inverted residual structure and linear bottleneck. Compared with MobileNet v1, the detection accuracy was significantly improved. The creators of MobileNet v3 [38] utilized neural architecture search (NAS) to search for the optimal network structure and introduced an attention mechanism similar to SENet. They proposed an h-swish activation function that is more suitable for mobile devices and improved the tail design of MobileNet v2. During the experiment, we found that although MobileNet v3 can reduce some computational complexity compared with MobileNet v2, the detection accuracy and speed still need further experimental verification for different detection tasks.

The ShuffleNet series is also a very representative lightweight backbone. ShuffleNet v1 [39] utilizes group convolution and channel shuffle to reduce the parameters while effectively preventing a decrease in the information expression ability caused by the lack of cross-group information flow in group convolution. The creators of ShuffleNet v2 [40] proposed four lightweight network design criteria, which further improved the detection speed of the model by minimizing the memory access cost (MAC), reducing the fragmentation and number of element-level operations.

As shown in Section 4.3.3, we conducted comparative experiments on the most popular lightweight backbone in insulator detection tasks. From the experimental results, it can be seen that these lightweight backbones can effectively reduce the parameters of the model, and the difference in effectiveness is not significant. In terms of the detection accuracy, MobileNet v2 and MobileNet v3 have slightly higher detection accuracies. Due to some fragmentation operations, the detection speed of Mobilenet v3 is slower. Taking into account the detection accuracy, model parameters, and detection speed, we ultimately decided to use MobileNet v2 as the backbone of our lightweight oriented detector.

3.4. Edge Device Deployment and Model Acceleration

To verify that our proposed lightweight oriented detector could effectively reduce the computational complexity, and thus, is more suitable for edge devices with limited computing power, we deployed it on the Nvidia edge device AGX Orin. AGX Orin is an edge device manufactured by the Nvidia company, which only has a size of 100 mm × 87 mm and can be easily mounted on drone, greatly improving the portability of deep learning models in practical applications. The small size of an edge device also limits its computing power, making it difficult for some models with large amounts of computation to run on edge devices. Although our improved lightweight oriented detector can run on edge devices, the detection speed is also limited by computational power, only reaching approximately 14 frames per second (FPS). In order to make the model run faster, we utilized TensorRT technology to accelerate the model. TensorRT is an acceleration technology developed for the model inference process. Models usually require a large number of parameters and calculations during training, resulting in a relatively large model. These parameters limit the inference speed of the model when applied. TensorRT optimizes the model through layer fusion, operator optimization, and quantization. It reduces memory access requirements through tensor optimization, and improves the running speed of the model through dynamic size support and multi-stream inference. We first converted the model trained by pytorch into the form of onnx, and then converted it into the TensorRT form. TensorRT allocates and optimizes memory usage for the model during model inference, and then transfers the data to the GPU. It executes the model according to its optimized representation, including the layer fusion, tensor optimization, and automatic mixing accuracy. Next, it uploads the inference results from the GPU back to the host memory and cleans up relevant resources. Through the acceleration of TensorRT, our lightweight oriented detector increased its running speed on the Nvidia AGX Orin to approximately 40 FPS, enabling real-time detection on edge devices.

4. Experiments

4.1. Experimental Setup and Dataset

The experimental equipment used was a computer equipped with an Intel Core i9-10940X CPU and Nvidia GeForce RTX 3080 with 10 GB of memory. The system environment was Ubuntu 20.04, and pytorch was used to build the model. The weight initialization of the backbone used pre-trained weights on ImageNet. The optimizer adopted SGD with momentum, with the momentum set to 0.9 and the weight decay set to 0.0001. The initial learning rate was set to 0.0025, and the learning rate adjustment strategy adopted CosineRestart. The classification loss adopted focal loss,

γ

was set to 2, and

α

was set to 0.25. The regression loss adopted L1 loss. The batch size was set to 4. The edge device used during deployment was an Nvidia 32 GB version of AGX Orin, equipped with 56 tensor cores of 1792-core NVIDIA architecture GPUs, and an 8-core arm of a 64 bit CPU. Due to the fact that insulator infrared inspection data is usually confidential to the power department, there is currently no relevant open source dataset available. Our experimental dataset was the infrared images of insulators actually captured by a drone during the insulator overheating inspection by the power department. Due to the scarcity of relevant images, we utilized data augmentation methods to expand the dataset. Our data augmentation methods included random scaling, random rotation, brightness adjustment, and random flipping, resulting in a total of 3800 images. The train-val set and test set were divided by random extraction in an 8:2 ratio.

4.2. Evaluation Metrics

The most commonly used evaluation metrics in object detection is average precision (AP) or mean average precision (mAP). mAP is the average value of AP for each category. Due to the fact that insulator detection only detects one category, the values of mAP and AP are the same. Therefore, we used AP as the evaluation metric for insulator detection. The calculation of AP requires precision and recall, and their calculation formulas are as follows:

Precision = \frac{TP}{TP + FP}

(1)

Recall = \frac{TP}{TP + FN}

(2)

where TP is true positive, FP is false positive, and FN is false negative.

Precision represents the probability of being a true positive sample among the predicted positive samples, and recall represents the probability of being a true positive sample among all positive samples. The combination of the two can better reflect the comprehensive performance of the model. First, the series of precision and recall were obtained by adjusting the confidence threshold. We drew the P-R curve with precision as the vertical axis and recall as the horizontal axis. The area surrounding the P-R curve and the coordinate axis is the value of AP. AP₅₀ represents the value of AP when the intersection over union (IoU) threshold was set to 0.5, and AP₇₅ represents the value of AP when the IoU threshold was set to 0.75. AP represents the average of 10 AP values calculated when the IoU threshold was set in steps of 0.05 to 10 values within the range of [0.5, 0.95].

The commonly used lightweight evaluation metrics for models are the number of parameters (Params) and floating point operations (FLOPs). Params reflect the number of parameters in the model, and FLOPs reflect the number of floating-point operations during a model run, which is an indicator of the model complexity. The smaller the Params and FLOPs, the fewer the number of parameters and the lower the computational complexity the model has, and the higher the degree of lightweightness of the model. Frames per second (FPS) is usually used as the evaluation metric for model detection speed; it represents the number of image frames detected in one second. A larger FPS indicates a faster detection speed of the model.

4.3. Ablation Study

4.3.1. LIFPN Experiment

To verify the effectiveness of our designed LIFPN, a series of ablation experiments were conducted. For a more concise expression, we numbered different ablation LIFPN structures. In the following description, we use the numbers in the table to refer to the corresponding LIFPN structures. As shown in Table 2, compared with FPN, LIFPN1 eliminates redundant 3 × 3 Convs and optimizes feature fusion paths. It could reduce the FLOPs by 4.84 G and Params by 7.08 M while ensuring almost the same detection accuracy. This indicates that these 3 × 3 Convs were actually redundant for insulator detection, and removing them had little impact on the detection accuracy. When using transposed convolution and dilation convolution separately to replace the original upsampling and downsampling methods, its AP had different improvements. When both transposed convolution and dilated convolution were used, LIFPN4 could increase the AP by 1.37% compared with the FPN, while the FLOPs decreased by 3.47 G and Params decreased by 5.38 M. This was because the weight of the transposed convolution could be learned by minimizing the loss function during the training process, and thus, it could achieve better upsampling performance compared with bilinear interpolation. Dilated convolution can increase the receptive field while ensuring the same convolution kernel size, thus improving the detection accuracy of large-scale insulators. By further incorporating attention mechanisms, LIFPN5 could reduce the FLOPs by 3.47 G and Params by 5.38 M compared with the FPN, while increasing the AP by 1.77%. This was because SimAM is a parameterless attention mechanism that can improve the response to insulators with almost no increase in parameters, thereby improving the detection accuracy. This series of ablation experiments demonstrated that our proposed LIFPN eliminates redundant 3 × 3 Convs, optimizes feature fusion paths, reduces information loss during sampling, and integrates attention mechanisms to achieve more efficient feature fusion, improving the response strength to insulators. Therefore, compared with the FPN, LIFPN can improve the detection accuracy while reducing the number of model parameters and computational complexity, making it more user-friendly for edge devices with limited computing power.

4.3.2. LIHead Experiment

As shown in Table 3, we conducted ablation experiments on the LIHead structure based on deep separable convolution and the LIHead structure based on group convolution. Since our LIHead was improved based on the head of RetinaNet, we used it as the baseline. Simply replacing the eight conventional Convs in the head of RetinaNet with DS Convs resulted in a training failure with an AP of 0. This was because DS Conv may lose some information due to the lack of cross-channel information interaction in the first stage. Continuous concatenation of DS Convs exacerbated this phenomenon, resulting in most of the weights of DS Convs being 0 and the disappearance of gradients during backpropagation. To address this issue, we added shortcut connections at both ends of the DS Convs to reduce the information loss caused by DS Conv. During backpropagation, even if the gradient of the DS Conv path is small, gradient propagation can still be carried out through the shortcut connection path, effectively avoiding the phenomenon of gradient disappearance. Therefore, the AP increased to 53.38%, but compared with the baseline, the AP still decreased by 9.51%. In order to further improve the detection accuracy of the model, we used 1 × 1 Conv to increase the dimensionality of the channel, which could more effectively reduce the information loss caused by DS Conv in high-dimensional space, thus increasing the AP to 60.34%. Compared with the baseline, its AP decreased by 2.55%, FLOPs decreased by 17.61 G, and Params decreased by 2.59 M. Compared with the LIHead based on DS Conv, the AP of the LIHead based on group Conv reached 62.84%, with an increase of 2.5%, and the FLOPs and Params also slightly decreased.

Although LIHead based on DS Conv could reduce the FLOPs and Params compared with the baseline, the feature dimensionality increase limited its degree of lightweightness. On the one hand, in order to minimize the information loss caused by DS Conv, it was necessary to increase the feature dimension as much as possible. However, on the other hand, increasing the feature dimension inevitably led to an increase in the model parameters. The two constrained each other, thereby limiting the performance of LIHead based on DS Conv. In addition, DS Conv significantly reduced the detection speed due to its extensive fragmentation operations. During the edge device deployment, we found that the detection speed of LIHead based on DS Conv decreased by approximately 6.2 FPS compared with LIHead based on group Conv. Therefore, we ultimately chose the LIHead based on group Conv as the head of our lightweight oriented detector. Compared with the baseline, LIHead based on group Conv could reduce the FLOPs by 20.12 G and Params by 2.95 M while ensuring an almost constant detection accuracy.

4.3.3. Backbone Experiment

In order to select the more suitable lightweight backbone, we conducted comparative experiments on different backbones. As shown in Table 4, compared with the ResNet50 originally used by RetinaNet, other lightweight backbones could reduce the FLOPs by approximately 27 G and Params by approximately 26 M while ensuring a high detection accuracy. The detection accuracy of MobileNet v2 and MobileNet v3 is relatively high, and the difference between them is not obvious. When running on edge devices, the detection speed of Mobilenet v2 was 2.29 FPS higher. Therefore, we finally chose it as the backbone of our lightweight detector.

4.3.4. All Ablation Experiments

We conducted a total ablation experiment on all improvements. As shown in Table 5, our series of improvements could reduce the FLOPs and Params to varying degrees while ensuring a high detection accuracy. Among them, LIFPN reduced the FLOPs by 3.47 G and Params by 5.38 M, while increasing the AP by 1.54%. This indicates that our designed LIFPN could more efficiently fuse features, thereby improving the detection accuracy while reducing the model parameters. LIHead significantly reduced the FLOPs by 20.12 G and Params by 2.95 M with almost the same AP, which was due to the effective reduction of model parameters by group Conv, while the shuffle operation ensured the interaction of information between different groups. When all improvements were used, our final lightweight oriented detector showed a detection accuracy reduction of only 0.41% compared with the original detector, while the FLOPs were significantly reduced by 49.11 G and the Params were reduced by 29.94 M. This indicates that our lightweight oriented detector could significantly reduce the parameters and computational complexity of the model while ensuring a high detection accuracy, making it more suitable for edge devices with limited computing power.

4.4. Comparative Experiment

In order to verify the effectiveness of our lightweight oriented detector, we conducted comparative experiments with currently advanced horizontal detectors and universal oriented detectors. As shown in Table 6, the horizontal detector RetinaNet had the lowest detection accuracy in the insulator detection task, with an AP of only 58.3%. Note that this was still calculated using a horizontal IoU, and if the AP was calculated based on a rotated IoU, its AP value would be lower. This is because the horizontal detector has too many background pixels in its detection box when detecting tilted objects with large aspect ratios, such as insulators, making it difficult to accurately locate the position of the insulators. In addition, as shown in Figure 10, when detecting two insulator strings that are close together, the overlapping area between the two horizontal boxes is too large, which can easily cause the NMS algorithm to miss detections.

The AP calculation of the universal oriented detector is based on the rotated IoU, which better reflects the positioning accuracy of the insulator. In contrast, oriented detectors, due to their ability to predict rotated boxes with rotation angles, have very few background pixels within their detection boxes and can more accurately locate insulators. When detecting two insulator strings that are close together, there is almost no overlapping area between the two rotating boxes. The AP of the universal oriented detector could reach approximately 62% to 65%, but its number of parameters and computational complexity were also relatively high. The FLOPs of universal oriented detectors are mostly around 64 G, while the Params are above 30 M. The FLOPs of R³Det reached 139.94 G and the Params reached 47.04 M, which is difficult for edge devices with limited computing power to bear. Although the FLOPs of ReDet decreased to 44.15 G and the Params to 31.54 M, its detection speed on the computer was only 16.95 FPS. Our lightweight oriented detector could effectively reduce the model parameters and computational complexity while ensuring a high detection accuracy, resulting in the FLOPs being reduced to 16.38 G and the Params being reduced to 6.19 M. In addition, the detection speed on the computer was increased to 27.98 FPS, and the detection speed on the edge device AGX Orin was accelerated to 41.89 FPS. This was because our lightweight feature fusion network LIFPN could achieve more efficient feature fusion in the middle feature layer. Our lightweight LIHead ensured the detection accuracy of the model by replacing conventional convolutions with group convolutions and channel shuffle operations while significantly reducing the number of parameters. And our selected lightweight backbone was more suitable for insulator detection. Due to the low degree of fragmentation in the group convolution and channel shuffle, the detection speed of the model was also improved.

As shown in Figure 10, an interesting phenomenon was that in the comparative experimental results in the first row, only our detector and the Oriented RetinaNet detected small-scale insulators in the distance. In contrast, the rotation box of Oriented RetinaNet had some deviation, and our detector positioned small-scale insulators more accurately. This may be related to the reduction in model complexity further improving the generalization of the model and reducing the overfitting phenomena. In addition, LIFPN further enhanced the multi-scale detection capability of our detector, because in some images, even when people labeled these distant insulators, they may be missed, and our detector could detect these small-scale insulators. In other images, our detector also had the lowest miss detection rate and was accurate in positioning. Compared with the horizontal detector, our lightweight oriented detector could more accurately locate the position of the insulators and improve the detection accuracy. Compared with universal oriented detectors, our lightweight oriented detector could significantly reduce the model computational complexity while ensuring a high detection accuracy, thereby improving the detection speed. Therefore, when deploying on edge devices with limited computing power, it can effectively reduce the computational resource consumption and improve the detection efficiency of the detector.

4.5. Edge Device Experiment

We deployed our lightweight oriented detector on the edge device Nvidia AGX Orin and conducted ablation experiments. As shown in Table 7, due to the use of the same deep learning framework on edge devices, and our model being a lightweight model, there was no need for pruning or quantization operations, and thus, the detection accuracy on the edge device was almost the same as that on a computer. In terms of the detection speed, after using TensorRT acceleration technology, our lightweight oriented detector could achieve a detection speed of 41.89 FPS, which was 8.52 FPS higher than the original RetinaNet. In addition, due to the large number of parameters of RetinaNet, it is difficult to run multiple models in parallel on some edge devices with more scarce computing resources, such as TX2, and long-term operation can cause system lag. Our lightweight oriented detector can effectively solve these problems, showing that our lightweight oriented detector can reduce the computational overhead and improve the detection speed on edge devices while ensuring a high detection accuracy.

5. Conclusions

The automatic detection of insulators in drone aerial images can help power departments reduce the time and manpower invested in insulator inspections, achieving more automated and intelligent power inspections. In recent years, many works proposed insulator automatic detection methods based on convolutional neural networks. However, most of them used horizontal detectors, and for tilted slender insulators, there are too many background pixels in the detection box, making it difficult to accurately locate the insulators. Although universal oriented detectors can generate rotation detection boxes to solve this problem, they require a large amount of computation and are not suitable for edge devices with limited computing power. To address these issues, we designed a lightweight oriented detector. First, we designed a lightweight feature fusion network LIFPN, which achieves more efficient feature fusion by reducing the amount of redundant convolutions, optimizing feature fusion paths, and introducing an attention mechanism. Second, we proposed a lightweight oriented detection head LIHead, which utilizes the structural design of group Conv and channel shuffle to generate rotation detection boxes while reducing the parameters and computational complexity. Third, we selected the suitable lightweight backbone through experiments and merged it with LIFPN and LIHead to form the lightweight insulator oriented detector. We deployed it to edge devices and utilized TensorRT technology to improve the running speed of the model. Finally, we demonstrated through a series of experiments that our proposed lightweight oriented detector could effectively reduce the number of model parameters while ensuring a high detection accuracy, and achieved a detection speed of 41.89 FPS on edge devices. This provided a more effective solution for the application of oriented detectors in drone-based insulator inspections.

In the future, we will continue to explore in terms of the following directions. First, we will explore how to optimize the fragmentation and point operations to further improve the detection speed of the model. In addition, techniques such as model pruning and model quantization can be used to reduce the number of model parameters. Second, although our lightweight oriented detector can more accurately locate the position of insulators compared with the horizontal detector, its detection accuracy can still be further improved. In the future, we will explore how to use model distillation technology to guide the training of lightweight models using teacher models with a high detection accuracy, thereby improving the detection accuracy of lightweight models. Third, due to the lack of relevant insulator defect data, this article only conducted research on insulator detection. In the future, we will collect more defect data and apply lightweight insulator directional detectors to insulator defect detection tasks. With its precise insulator positioning and lightweight model structure, we will attempt to achieve faster defect detection on edge devices.

Author Contributions

Conceptualization, F.Q. and Y.L.; methodology, F.Q. and Y.L.; software, F.Q. and Y.L.; validation, F.Q.; formal analysis, F.Q. and Y.L.; investigation, F.Q., Y.L. and L.T.; resources, F.Q., L.T. and Q.D.; data curation, F.Q., Y.L. and H.W.; writing—original draft preparation, F.Q. and Y.L.; writing—review and editing, F.Q., Y.L., L.T., Q.D., H.W. and W.L.; visualization, F.Q., Y.L. and H.W.; supervision, L.T., F.Q., Q.D., H.W. and W.L.; project administration, F.Q., Y.L., L.T. and W.L.; funding acquisition, F.Q., L.T. and Q.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key R&D Program of China (2023YFB4704900), Zhuhai Industry–University–Institute Cooperation Project (2220004002460), and Key-Area Research and Development Program of Guangdong Province (2020B1111010002).

Data Availability Statement

Data are unavailable due to privacy or ethical restrictions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zheng, J.; Wu, H.; Zhang, H.; Wang, Z.; Xu, W. Insulator-defect detection algorithm based on improved YOLOv7. Sensors 2022, 22, 8801. [Google Scholar] [CrossRef] [PubMed]
Lu, Z.; Li, Y.; Shuang, F. MGFNet: A Progressive Multi-Granularity Learning Strategy-Based Insulator Defect Recognition Algorithm for UAV Images. Drones 2023, 7, 333. [Google Scholar] [CrossRef]
Shuang, F.; Han, S.; Li, Y.; Lu, T. RSIn-dataset: An UAV-based insulator detection aerial images dataset and benchmark. Drones 2023, 7, 125. [Google Scholar] [CrossRef]
Zhao, X.; Zhang, W.; Zhang, H.; Zheng, C.; Ma, J.; Zhang, Z. ITD-YOLOv8: An Infrared Target Detection Model Based on YOLOv8 for Unmanned Aerial Vehicles. Drones 2024, 8, 161. [Google Scholar] [CrossRef]
Ma, X.; Zhang, Y.; Zhang, W.; Zhou, H.; Yu, H. SDWBF algorithm: A novel pedestrian detection algorithm in the aerial scene. Drones 2022, 6, 76. [Google Scholar] [CrossRef]
Han, Y.; Guo, J.; Yang, H.; Guan, R.; Zhang, T. SSMA-YOLO: A Lightweight YOLO Model with Enhanced Feature Extraction and Fusion Capabilities for Drone-Aerial Ship Image Detection. Drones 2024, 8, 145. [Google Scholar] [CrossRef]
Lu, G.; He, X.; Wang, Q.; Shao, F.; Wang, H.; Wang, J. A novel multi-scale transformer for object detection in aerial scenes. Drones 2022, 6, 188. [Google Scholar] [CrossRef]
Xie, X.; Cheng, G.; Wang, J.; Yao, X.; Han, J. Oriented R-CNN for object detection. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 11–17 October 2021; pp. 3520–3529. [Google Scholar]
Ding, J.; Xue, N.; Long, Y.; Xia, G.S.; Lu, Q. Learning RoI transformer for oriented object detection in aerial images. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 2849–2858. [Google Scholar]
Girshick, R. Fast r-cnn. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 91–99. [Google Scholar] [CrossRef]
Pang, J.; Chen, K.; Shi, J.; Feng, H.; Ouyang, W.; Lin, D. Libra r-cnn: Towards balanced learning for object detection. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 821–830. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Ma, J.; Shao, W.; Ye, H.; Wang, L.; Wang, H.; Zheng, Y.; Xue, X. Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimed. 2018, 20, 3111–3122. [Google Scholar] [CrossRef]
Han, J.; Ding, J.; Xue, N.; Xia, G. ReDet: A Rotation-equivariant Detector for Aerial Object Detection. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 2785–2794. [Google Scholar]
Yang, X.; Yan, J.; Feng, Z.; He, T. R3det: Refined single-stage detector with feature refinement for rotating object. In Proceedings of the 2021 AAAI Conference on Artificial Intelligence (AAAI), Virtually, 2–9 February 2021; Volume 35, pp. 3163–3171. [Google Scholar]
Zhai, Y.; Wang, D.; Zhang, M.; Wang, J.; Guo, F. Fault detection of insulator based on saliency and adaptive morphology. Multimed. Tools Appl. 2017, 76, 12051–12064. [Google Scholar] [CrossRef]
Zhang, K.; Qian, S.; Zhou, J.; Xie, C.; Du, J.; Yin, T. ARFNet: Adaptive receptive field network for detecting insulator self-explosion defects. Signal Image Video Process. 2022, 16, 2211–2219. [Google Scholar] [CrossRef]
Zhai, Y.; Chen, R.; Yang, Q.; Li, X.; Zhao, Z. Insulator fault detection based on spatial morphological features of aerial images. IEEE Access 2018, 6, 35316–35326. [Google Scholar] [CrossRef]
Tao, X.; Zhang, D.; Wang, Z.; Liu, X.; Zhang, H.; Xu, D. Detection of power line insulator defects using aerial images analyzed with convolutional neural networks. IEEE Trans. Syst. Man Cybern. Syst. 2018, 50, 1486–1498. [Google Scholar] [CrossRef]
Yu, Z.; Lei, Y.; Shen, F.; Zhou, S.; Yuan, Y. Research on Identification and Detection of Transmission Line Insulator Defects Based on a Lightweight YOLOv5 Network. Remote Sens. 2023, 15, 4552. [Google Scholar] [CrossRef]
Zheng, H.; Liu, Y.; Sun, Y.; Li, J.; Shi, Z.; Zhang, C.; Lai, C.S.; Lai, L.L. Arbitrary-Oriented Detection of Insulators in Thermal Imagery via Rotation Region Network. IEEE Trans. Ind. Inform. 2022, 18, 5242–5252. [Google Scholar] [CrossRef]
Ding, X.; Zhang, X.; Ma, N.; Han, J.; Ding, G.; Sun, J. RepVGG: Making VGG-style ConvNets Great Again. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 13728–13737. [Google Scholar]
Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10778–10787. [Google Scholar]
Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 13–23 June 2018; pp. 7132–7141. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. arXiv 2018, arXiv:1807.06521. [Google Scholar]
Yang, L.; Zhang, R.Y.; Li, L.; Xie, X. SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks. In Proceedings of the 2021 International Conference on Machine Learning (ICML), Virtual, 18–24 July 2021. [Google Scholar]
Han, J.; Ding, J.; Li, J.; Xia, G. Align Deep Features for Oriented Object Detection. IEEE Trans. Geosci. Remote Sens. 2020, 60, 1–11. [Google Scholar] [CrossRef]
Sandler, M.; Howard, A.G.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1800–1807. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Howard, A.G.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for MobileNetV3. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar]
Ma, N.; Zhang, X.; Zheng, H.; Sun, J. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. arXiv 2018, arXiv:1807.11164. [Google Scholar]
Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully Convolutional One-Stage Object Detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9626–9635. [Google Scholar]
Li, W.; Chen, Y.; Hu, K.; Zhu, J. Oriented reppoints for aerial object detection. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 1829–1838. [Google Scholar]
Hou, L.; Lu, K.; Xue, J.; Li, Y. Shape-adaptive selection and measurement for oriented object detection. In Proceedings of the 2022 AAAI Conference on Artificial Intelligence (AAAI), Virtual, 22 February–1 March 2022; Volume 36, pp. 923–932. [Google Scholar]

Figure 1. Comparison of horizontal and oriented detector effects. The green boxes are the detection boxes for insulators. (a) Horizontal detectors find it difficult to accurately locate insulators. (b) Oriented detectors can accurately locate insulators. (c) Horizontal detector missed detection. (d) Correct detection by oriented detector.

Figure 2. Structure diagram of lightweight oriented detector.

Figure 3. Structure diagram of LIFPN.

Figure 4. Conventional convolution.

Figure 5. Depthwise separable convolution.

Figure 6. RetinaNet head.

Figure 7. LIHead based on deep separable convolution.

Figure 8. Group convolution.

Figure 9. LIHead based on group convolution.

Figure 10. Comparison of different detectors. Each column from left to right displays the detection results of RetinaNet, Oriented RetinaNet, Oriented FCOS, R³Det, ReDet, and our lightweight oriented detector.

Table 1. Literature review of related work.

Detection Methods	Detection Accuracy	Detection Speed
[21]	Achieved an accuracy of 92% on a self-built dataset (100 aerial images).	Achieved 0.5 frames per second (FPS) on the computer
[22]	On a self-built dataset (2303 aerial images), the combination of ARFNet and YOLO v5 achieved an 84.4% average precision (AP). Note that this was the AP for the horizontal detection box.	No relevant data available.
[23]	Achieved a detection accuracy of 90.6% on a self-built dataset (74 aerial images).	Due to the use of multiple image-processing steps, the detection speed was slow, reaching only about 1.5 FPS on the computer.
[24]	Achieved 91% precision and 96% recall in detecting dropped string defects on a self-built dataset (1956 aerial images).	Due to the use of cascaded networks, the detection speed was slow and could only reach 2.79 FPS on the computer.
[25]	Achieved a 70.5% $A P_{50}$ and 50.3% $A P_{50 : 95}$ on a self-built dataset (1627 aerial images). Note that this was the AP for the horizontal detection box.	Related data mismatch.
[26]	Achieved 95.08% $A P_{50}$ on a self-built dataset (2760 aerial images), with no relevant $A P_{50 : 95}$ data available.	Due to the use of an oriented object detector, the detection speed was slow and could only reach 6.3 FPS on the computer. Note that this was the AP for the rotated detection box.

Table 2. LIFPN ablation experiment.

Neck	AP (%)	AP₅₀ (%)	AP₇₅ (%)	FLOPs (G)	Params (M)
FPN	62.89	90.38	75.96	65.49	36.13
LIFPN1 (Integrated feature fusion path)	62.85	90.63	75.38	60.65	29.05
LIFPN2 (Replace with transposed Conv)	63.96	90.51	75.02	61.08	29.57
LIFPN3 (Replace with dilation Conv)	63.02	90.60	75.99	61.60	30.23
LIFPN4 (Replace with transposed and dilation Conv)	64.26	90.63	76.00	62.02	30.75
LIFPN5 (Integrating the SimAM attention mechanism)	64.66	90.68	76.53	62.02	30.75

Table 3. LIHead ablation experiment.

Head	AP (%)	AP₅₀ (%)	AP₇₅ (%)	FLOPs (G)	Params (M)
Baseline	62.89	90.38	75.96	65.49	36.13
LIHead (DSConv)	0	0	0	37.04	31.95
LIHead (DS Conv and shortcut)	53.38	89.40	54.06	37.04	31.95
LIHead (DS Conv and high-dimensional)	60.34	90.25	66.76	47.88	33.54
LIHead (group Conv and shuffle)	62.84	90.40	76.19	45.37	33.18

Table 4. Lightweight backbone comparison experiment.

Backbone	AP (%)	AP₅₀ (%)	AP₇₅ (%)	FLOPs (G)	Params (M)
ResNet50	62.89	90.38	75.96	65.49	36.13
Mobilenet v3	60.84	90.20	67.40	37.49	8.4
ShuffleNetv2	59.49	90.05	65.44	38.36	9.74
Mobilenet v2	60.72	90.58	67.54	39.53	12.74

Table 5. Overall ablation experiment.

Mobilenet v2	LIFPN	LIHead	AP (%)	AP₅₀ (%)	AP₇₅ (%)	FLOPs (G)	Params (M)
			62.89	90.38	75.96	65.49	36.13
🗸			60.72	90.58	67.54	39.53	12.74
	🗸		64.43	90.60	74.91	62.02	30.75
		🗸	62.84	90.40	76.19	45.37	33.18
🗸	🗸	🗸	62.48	90.39	75.38	16.38	6.19

Table 6. Comparative experiment.

Detector	AP (%)	AP₅₀ (%)	AP₇₅ (%)	FLOPs (G)	Params (M)
RetinaNet [17]	58.30	87.70	66.0	65.35	36.10
Oriented RetinaNet	62.89	90.38	75.96	65.49	36.13
Oriented FCOS [41]	64.06	90.68	77.72	64.44	31.89
R³Det [20]	62.81	90.63	78.47	139.94	47.04
Oriented Reppoints [42]	59.76	88.01	69.88	60.71	36.62
ReDet [19]	65.55	90.82	78.85	44.15	31.54
SASM [43]	64.01	86.70	77.30	60.70	36.60
Lightweight oriented detector (ours)	62.48	90.39	75.38	16.38	6.19

Table 7. Ablation experiment on edge device.

Mobilenet v2	LIFPN	LIHead	AP (%)	AP₅₀ (%)	AP₇₅ (%)	FPS ¹	FPS ²
			62.887	90.381	75.965	11.43	33.37
🗸			60.720	90.586	67.544	11.88	38.23
	🗸		64.266	90.604	74.911	12.12	39.52
		🗸	62.836	90.403	76.187	12.06	38.71
🗸	🗸	🗸	62.477	90.393	75.380	13.65	41.89

¹ Not using TensorRT acceleration. ² Using TensorRT acceleration.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qu, F.; Lin, Y.; Tian, L.; Du, Q.; Wu, H.; Liao, W. Lightweight Oriented Detector for Insulators in Drone Aerial Images. Drones 2024, 8, 294. https://doi.org/10.3390/drones8070294

AMA Style

Qu F, Lin Y, Tian L, Du Q, Wu H, Liao W. Lightweight Oriented Detector for Insulators in Drone Aerial Images. Drones. 2024; 8(7):294. https://doi.org/10.3390/drones8070294

Chicago/Turabian Style

Qu, Fengrui, Yu Lin, Lianfang Tian, Qiliang Du, Huangyuan Wu, and Wenzhi Liao. 2024. "Lightweight Oriented Detector for Insulators in Drone Aerial Images" Drones 8, no. 7: 294. https://doi.org/10.3390/drones8070294

APA Style

Qu, F., Lin, Y., Tian, L., Du, Q., Wu, H., & Liao, W. (2024). Lightweight Oriented Detector for Insulators in Drone Aerial Images. Drones, 8(7), 294. https://doi.org/10.3390/drones8070294

Article Menu

Lightweight Oriented Detector for Insulators in Drone Aerial Images

Abstract

1. Introduction

2. Related Work

2.1. Object Detection

2.2. Insulator Detection

3. Methods

3.1. LIFPN

3.2. LIHead

3.2.1. LIHead Based on Deep Separable Convolution

3.2.2. LIHead Based on Group Convolution

3.2.3. Comparison of Two LIHead Structures

3.3. Lightweight Backbone Selection

3.4. Edge Device Deployment and Model Acceleration

4. Experiments

4.1. Experimental Setup and Dataset

4.2. Evaluation Metrics

4.3. Ablation Study

4.3.1. LIFPN Experiment

4.3.2. LIHead Experiment

4.3.3. Backbone Experiment

4.3.4. All Ablation Experiments

4.4. Comparative Experiment

4.5. Edge Device Experiment

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI