*5.1. Experiment Description*

5.1.1. Dataset

A dataset of vibration dampers for overhead transmission lines is required for the proposed theoretical validation and experimental analysis. At present, although there is a lot of research work on vibration dampers, but there is no completely public vibration damper detection dataset. Moreover, most of the vibration damper data in the article were obtained by geometric transformation methods such as flipping, cutting, and scaling. An insufficient number of vibration dampers would make it difficult to verify the correctness of the proposed theory. Therefore, a dataset was made for vibration damper detection based on the real UAV cruise video of overhead transmission lines, and named DamperDetSet. In the process of making the DamperDetSet dataset, LabelMe was used as a data labeling tool to label the positions of all existing line vibration dampers in the original image. The callout box was kept as close as possible to the smallest enclosing rectangle of the target area.

DamperDetSet contains a total of 3000 images, each of which contains vibration dampers, and the types of vibration dampers are not unique, such as hippocampus antislip vibration dampers, hook wire vibration dampers, etc. We randomly divided all 3000 images into a training set and a test set. The training set contains 2500 images and the test set contains 500 images. The ratio of training set and test set is 5:1. In addition, as the dataset is obtained by shooting with UAVs, the presentation angle of the vibration damper in the image is not unique, which also puts forward higher requirements for the robustness of the model.

#### 5.1.2. Experiment Configuration

In terms of hyperparameter settings in the experiment, we trained DamperYOLO for a total of 200 epochs. The learning rate of the first 100 epochs remained unchanged, and the learning rate of the last 100 epochs gradually decreased to 0. In terms of experimental software settings, all our programs were written in Python language and integrated based on the Pytorch 1.4 platform. In the system environment of the experimental platform, Ubuntu18.04 was used as the operating system. In terms of the hardware environment of the experimental platform, an NVIDIA RTX 2080 GPU was used as the main equipment for training calculation, matched with an AMD R5-3600X CPU and 32 GB RAM.

#### *5.2. The Baselines*

In the following experiments, we chose one-stage, two-stage, and anchor-free methods as comparison methods.

YOLOv4 [37]: This method is the latest achievement of the YOLO series. After continuing the advantages of the previous work, it introduces the structure of FPN + PAN, which improves the transferability of features in the network; it is also the basis of our proposed model.

Cascade R-CNN [49]: This framework is the latest achievement of the R-CNN series. It creatively introduces a cascade structure. The detection accuracy is state-of-the-art, but its excellent performance consumes a lot of computational resources.

CenterNet [50]: This method is a heatmap-based detection method rather than anchorbased, which has the advantage of fast testing and low space occupancy.

SSD [47]: SSD is another classic one-stage object detection method. It initially utilizes multiple detectors.

RetinaNet [51]: RetinaNet is based on FPN [43], and its contribution is to propose focal loss to solve the problem of category imbalance.

#### *5.3. Qualitative Evaluation*

To visually compare the difference between the detection effect of DamperYOLO and other baselines, we conducted qualitative analysis and comparison experiments based on the DamperDetSet dataset. The experimental results are shown in Figure 5. As can be seen from Figure 5, under the same test image, the detection effect of CenterNet is not stable enough. This proves that calculation of the heat map will be greatly disturbed by the current anchor-free algorithm in the face of complex scenes such as transmission lines. The performance of the two-stage Cascade R-CNN is very superior. As the latest framework of the R-CNNs series, the results obtained by the second iteration based on the proposal are more accurate. There is also room for improvement in the performance of a single-level SSD. And using VGG16 as a backbone may be weaker than the ResNet-like feature extraction network in feature extraction. RetinaNet and YOLOv4 perform better. Both of them benefit from the latest research results in one-stage. They can obtain high performance with only one calculation, but the edge detection effect of vibration damper still needs to be improved. Finally, DamperYOLO outperforms other one-stages. The detection result image of DamperYOLO proves that our proposed improvement strategy is effective, and its performance is no less than that of Cascade R-CNN.

**Figure 5.** Test examples of each model on the DamperDetSet dataset. Experimental results show that the performance of DamperYOLO is similar to Cascade R-CNN, better than SSD, RetinaNet and YOLOv4 in one-stage class, and CenterNet.

#### *5.4. Quantitative Evaluation*

We compared other baselines and performed the quantitative analysis shown in Table 2 with AP in the COCO [52] dataset as the evaluation standard. The calculation of AP is based on the ground truth and the IOU of the prediction result. The calculation formula is shown in Equation (3). The AP calculation results were selected when the IOU was 0.5, 0.7, and 0.9 as the evaluation basis, so that the performance difference of the model under different pressure levels could be more comprehensively evaluated.


**Table 2.** APs of the different models.

As can be seen from Table 2, under the same test picture, thanks to the two-stage detection strategy, the performance of Cascade R-CNN was still stable, and its performance under various AP standards was at the forefront; however, its good score came at the cost of great computation time.

The one-stage RetinaNet and YOLOv4 performed similarly, and YOLOv4 slightly outperformed RetinaNet. Compared with SSD, both of them had a certain degree of lead in terms of indicators, and the latest training tricks available from analysis confers an advantage in accuracy. In addition, the calculation speed of these three methods is faster than Cascade R-CNN, without the intermediate step of proposal, which shortens the calculation time considerably.

The anchor-free based CenterNet had the lowest score; so, it can be concluded that the calculation of the heatmap is very susceptible to interference from similar objects in the background. However, the advantage of the anchor-free class method is that the calculation speed is much faster than other baselines, which is a huge advantage for scenarios with extremely high real-time requirements.

Our proposed DamperYOLO takes the lead on AP, but the score of YOLOv4 is lower than Cascade R-CNN; therefore, the edge extraction, attention mechanism, and feature fusion structure proposed in this paper are better than Cascade R-CNN. The calculation speed of DamperYOLO was similar to other one-stage class methods. Therefore, DamperYOLO is a model with a balance between speed and accuracy.

#### *5.5. Sensitivity Analysis*

In this section, multiple sets of sensitivity analysis are performed on each component of DamperYOLO, which includes the choice of backbone, edge extraction, the attention mechanism, the number of training iterations, and the minimum amount of training data.

### 5.5.1. Backbone

We conducted a sensitivity analysis on the backbone used by DamperYOLO while retaining other improvements. As shown in Table 3, the CSPDarknet53 used by YOLOv4 was improved based on ResNet50, so it performed better. In addition, the only objects need to be detected were dampers. Therefore, we believe that it may be more effective to expand the number of network layers and improve the feature abstraction ability of the backbone. The performance of ResNet101 also supports our idea, but if network layers such as using ResNet152 are added, the improvement is limited, so ResNet101 is used as the backbone.

#### **Table 3.** APs of different backbones.


#### 5.5.2. Edge Extraction

To verify the effectiveness of preprocessing, a sensitivity analysis was performed on the image denoising, and edge detection used while retaining other improvements constant. Table 4 shows that, compared with not using any preprocessing strategy, using image denoising and edge extraction alone leads to a certain improvement in detection effect. If both are used, the AP50 increase by about five percentage points, which shows that the image augmentation method in this paper is effective.

**Table 4.** APs of different preprocessing methods.


#### 5.5.3. Attention Mechanism

The attention mechanism is an important mechanism pioneered in the field of NLP, and was developed in object detection in recent years. In order to verify the effect of adding an attention mechanism in different layers of ResNet101, we conducted a sensitivity analysis for the number of times an attention mechanism is introduced while retaining the other conditions. As shown in Table 5, when the attention mechanism was added to the first three layers of ResNet101, the detection effect improved to a certain extent. However, continuing to introduce attention-blocks containing edge information to the 4th and 5th layers caused a drop in detection accuracy. This is because there is more abstract information in the feature maps extracted by the fourth and fifth layers in ResNet101, and the edge information is the basic feature information. This is counterproductive and reduces the detection performance.


**Table 5.** APs of different introduction times of the attention mechanism.

#### 5.5.4. Number of Epochs

The number of epochs for experimental training affects the performance of the model. Because the number of training epochs is not enough, the model is under-fitted, and the model has not yet fully learned to identify all the objects to be detected. Excessive training epochs reduce the robustness of the model, the parameters are limited by the existing training data, and the realization of unfamiliar data in the test set is reduced. Therefore, we conducted an evaluation test of the number of training times for the performance of the model, and the test results are shown in Table 6. It can be seen from the table that when the training epoch is 200, the model is the most balanced.


**Table 6.** APs of different epoch numbers.

#### 5.5.5. Minimum Training Data Experiment

Changes in the amount of training data also affect the final performance of the model. At the same time, by comparing the detection accuracy of the model with different amounts of data, the feature extraction ability of the model can be judged. As shown in Table 7, we conducted experiments with the minimum amount of data. From the results, it can be seen that when the amount of data decreases, the performance of the model has weak performance, which indicates that our data is sufficient. The model performance did not drop significantly until the test set dropped to 1750. Moreover, DamperYOLO had strong robustness and could still learn key feature information on small-scale datasets, which overcame the shortcomings of the previous model's poor generalization ability to a certain extent.

**Table 7.** Results of minimum training data experimental.


#### *5.6. Ablation Analysis*

To analyze the functions of the different components of DamperYOLO, an ablative analysis was performed on DamperDetSet. As shown in Table 8, Model B had better indicators than Model A, which indicates that using ResNet101 as the backbone can better extract image features. Model C uses image augmentation for preprocessing, which improves the quality of the input image and provides the model with better training data. Compared with other stages, the performance of Model D has the highest improvement in detection effect. This indicates that the attention mechanism plays a sufficient role, because the attention mechanism allows the model to focus on the edge information of the damper when converging, with the help of the image enhancement model. In addition, it can be seen from other comparative experiments that the additional overhead brought by it is very low, so it is necessary for our task to add an attention mechanism to the backbone.

**Table 8.** The results of the ablation analysis.


#### *5.7. Computational Complexity*

The network parameters and training time were recorded to evaluate the space and time complexity of the networks. As shown in Table 9, compared with Cascade R-CNN, DamperYOLO has a similar detection effect, but its parameters and training time are greatly reduced. Compared with YOLOv4, the space complexity and the training time are basically unchanged, because we only changed the backbone and added the attention mechanism on its basis, but a higher detection effect was achieved. In addition, CenterNet still consumes the least resources. The computational complexity of SSD is slightly higher than that of RetinaNet, but the detection effect is slightly worse.



#### **6. Conclusions**

We propose a power line vibration damper detection model named DamperYOLO based on a deep neural network that can detect the position of the vibration damper in drone inspection aerial images. DamperYOLO first uses the Canny algorithm to obtain the edge information of the original image, then uses the attention mechanism to introduce edge information into ResNet101 to guide feature extraction. Finally, it outputs a feature map that is more conducive to small target detection with the FPN structure. The following conclusions can be drawn through qualitative and quantitative experiments on the power line vibration damper detection dataset built in this paper. Compared with the current baselines in the object detection field, the DamperYOLO proposed in this paper can output state-of-the-art detection accuracy. The results of sensitivity analysis experiments show that edge detection, attention mechanism, and feature pyramid network all significantly improve the detection accuracy. The ablation analysis shows that the attention mechanism and the feature pyramid network improve the accuracy of the output detection results. In addition, DamperYOLO consumes space similar to the computational resources and baselines of other one-stage classes, but the detection accuracy can reach the level of Cascade R-CNN, which shows the superiority of our model. In the future, we will continue to introduce appropriate training tricks for the detection accuracy of DamperYOLO and explore the application of the model to other power line components.

**Author Contributions:** Conceptualization, W.C. and Y.L.; methodology, W.C., Y.L. and Z.Z.; validation, W.C.; formal analysis, W.C. and Y.L.; investigation, W.C., Y.L. and Z.Z.; resources, W.C., Y.L., and Z.Z.; writing—original draft preparation, W.C.; writing—review and editing, W.C., Y.L. and Z.Z.; visualization, W.C.; supervision, Y.L.; project administration, Y.L.; All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by the National Natural Science Foundation of China, grant number 61962031, the National Natural Science Foundation of China, grant number 51667011, and the Applied Basic Research Project of Yunnan province, grant number 2018FB095.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data in this paper are undisclosed due to the confidentiality requirements of the data supplier.

**Acknowledgments:** We thank the Yunnan Electric Power Research Institute for collecting the transmission line inspection data, which provided a solid foundation for the verification of the model proposed in this paper. The authors thank the reviewers and editors for their constructive comments to improve the quality of this article.

**Conflicts of Interest:** The authors declare that there are no conflict of interest.

#### **References**

