4.3. Evaluation Metrics
To evaluate the effectiveness of the proposed detection algorithm, in this experiment, we evaluate the model using common deep learning metrics such as Precision (P), Recall (R), mAP, and FPS. Additionally, GFLOPs and the total number of parameters are employed to measure the computational complexity and model size.
Precision and Recall are commonly used metrics in detection and classification tasks, and their calculation formulas are as follows:
where
FP refers to negative samples that were incorrectly classified as positive,
TP indicates the correctly identified positive samples, and
FN denotes the positive samples that were overlooked.
AP and
mAP serve as key indicators for assessing model performance.
AP calculates the Average Precision for an individual object category, whereas
mAP measures the Mean Average Precision across all categories. The calculation formulas are as follows:
M indicates the complete set of object categories.
The following formula outlines how
FLOPs and Parameters are calculated for detection algorithms with
L convolutional layers:
where
indicates the kernel size of the
l-th convolutional layer. The height and width of the input feature map are represented by
and
, respectively. The variables
and
denote the number of input and output feature channels for the
l-th convolutional layer, while
signifies the stride of the convolutional kernel.
FPS measures the performance of a model by indicating how many images are processed per second. The formula for calculating
FPS is as follows:
In this formula, stands for the model’s total inference time, and represents the number of images processed during this period.
4.4. Ablation Experiment
To assess the impact of the enhancements in the DLCH-YOLO algorithm for detecting circuit breaker operation statuses, we performed ablation experiments using the power equipment dataset. These experiments evaluate how each individual improvement and their combinations affect the performance of the model. We used mAP@50, mAP@50-95, GFlops, and Parameters as evaluation metrics to analyze the experimental results. The table below shows the average values of each metric under different conditions.
According to the ablation experiment results presented in
Table 1, for Model 1, we introduced GSConv into the backbone network of the baseline model YOLOv8n. This not only improved detection accuracy but also reduced the number of parameters and computational load, resulting in a 1.1% increase in mAP and a reduction of 0.3G in the number of parameters. For Model 2, the improved C2f_DLKA module, with its deformable large kernel attention, demonstrated strong adaptability to multi-scale features, achieving a 3% increase in mAP, although GFlops increased by 0.6. For Model 3, the introduction of the improved Semantic Screening Feature Pyramid Network (SSFPN) in the neck improved detection accuracy by 3.8%. Although the number of parameters increased by 1.4G, the computational load decreased by 0.8M. For Model 4, when both the GSConv and C2f_DLKA modules were introduced, mAP increased by 2.8%, and the number of parameters was reduced by 0.4G compared to Model 2, which only used the C2f_DLKA module. For Model 5, with the introduction of both the GSConv and SSFPN modules, mAP reached 91.2%, which is a 4.1% improvement over the baseline model, and the number of parameters was reduced by 0.3G compared to Model 3, which only used the SSFPN module. For Model 6, when both the C2f_DLKA module and SSFPN module were introduced, mAP increased by 3.6%. The final model proposed in this paper incorporates improvements from GSConv, C2f_DLKA, and SSFPN modules, achieving the highest mAP of 91.8%, a 4.7% increase over the baseline model, with only a 1.5% increase in parameters. The results show that the algorithm achieves more precise and comprehensive object identification, with fewer false positives and missed detections.
C2f_DLKA expands the receptive field through large convolutional kernels, enhancing the understanding of contextual information for the model, which helps to better distinguish between targets and backgrounds in complex scenarios. SSFPN, through its feature filtering mechanism, enables precise target localization in challenging backgrounds, significantly reducing the false positives and missed detections caused by background interference. GSConv, combining standard and Depthwise Separable Convolutions, maintains performance similar to dense convolutions while improving the model’s inference speed. Both the large convolutional kernels in C2f_DLKA and SSFPN are implemented using an equivalent structure, expanding the receptive field significantly with minimal additional computation and parameters. By introducing GSConv, the detection performance for the model is further enhanced with only a slight increase in parameters. In high-risk power operation scenarios, detection accuracy is often more critical than speed.
4.5. Comparison Experiments
To comprehensively evaluate our improvements, this study comparison experiments will be conducted from various perspectives. The first step involves comparing the effects of different attention modules and their application at various positions, with results presented in
Table 2. This will reveal the differences in how each attention module enhances model performance. Next, we will compare the improved backbone networks, with relevant results shown in
Table 3, to analyze the impact of different backbone networks on detection performance. Subsequently, an analysis of mainstream feature fusion methods will be conducted, with results provided in
Table 4 to understand the advantages and disadvantages of various feature fusion strategies. Finally, to assess the overall effectiveness of the proposed methods, we will evaluate them against several existing approaches. The comparison results are shown in
Table 5. These comparison experiments aim to thoroughly assess the advantages and practical application effectiveness of the proposed improvements.
Table 2 presents the results of the comparison experiments for the attention mechanism. The experiments indicate that on the electrical equipment dataset, the model with the “DLKA improving the C2f module" outperforms other configurations in terms of
[email protected] and
[email protected]:0.95. Specifically, compared to the model without the DLKA module, this improvement increases mAP by 3% and 3.5%, respectively; compared to the model with the DLKA module inserted at the end of the backbone (network layer 10), it improves mAP by 8.3% and 5.7%, respectively. Although the “DLKA module improving the C2f module” results in a minor rise in computational demands and the number of parameters, and a minor reduction in FPS, it significantly enhances detection accuracy, reflecting a high cost-effectiveness ratio.
To evaluate the DLKA mechanism in detection tasks, we compared its performance with that of four mainstream attention modules (CA [
47], SE [
48], CBAM [
49], EMA [
50]). All four modules were used to improve the C2f module in the same manner, with the experimental results shown in
Table 2. The results demonstrate that the DLKA module excels in improving model detection performance. Specifically, compared to CA, SE, CBAM, and EMA modules, the DLKA module achieves
[email protected] and
[email protected]:0.95 that are higher by 5.4%, 9.6%, 1.6%, 3.4% and 5.8%, 7.6%, 3.5%, 4.7%, respectively. Additionally, the DLKA mechanism increases computational load and parameter count by only 0.4G while maintaining strong performance in FPS. This indicates that the DLKA mechanism not only improves detection accuracy in electrical equipment tasks but also maintains a good balance in computational efficiency.
Table 2.
Results of the attention mechanism comparison experiment.
Table 2.
Results of the attention mechanism comparison experiment.
Method | P (%) | R (%) | mAP@50 (%) | mAP@50-95 (%) | GFlops (G) | Parament (M) |
---|
YOLOv8n | 97.7 | 80.9 | 87.1 | 56.9 | 8.1 | 3.00 |
+DLKA attention (C2f) | 98.4 | 80.1 | 90.1 | 60.4 | 8.6 | 3.70 |
+DLKA attention (10) | 98.1 | 80.7 | 81.8 | 54.7 | 9.4 | 4.60 |
+DAttention (C2f) | 97.8 | 79.8 | 84.8 | 54.8 | 8.1 | 3.07 |
+CA (C2f) | 95.8 | 79.9 | 84.7 | 54.6 | 8.1 | 3.00 |
+SE (C2f) | 95.8 | 79.0 | 80.5 | 52.8 | 8.1 | 3.00 |
+CBAM (C2f) | 94.2 | 78.0 | 88.5 | 56.9 | 8.1 | 3.02 |
+EMA (C2f) | 93.1 | 79.2 | 86.7 | 55.7 | 8.1 | 3.00 |
The backbone network, which combines GSConv and the DLKA module, effectively reduces redundant computations in feature maps while enhancing dashboard localization and detection through contextual information.
Table 3 presents the performance comparison of different backbone networks, including GSConv+DLKA, YOLOv8n, FasterNet [
51], MobileNetV4 [
52], and HGNetV2 [
53], on the custom dataset. The results show that although our improved backbone network slightly increases the number of parameters, it demonstrates excellent detection accuracy, being only 0.4% lower than MobileNetV4, while reducing the parameter count by 14.2G, showcasing superior performance.
Table 3.
Results of the backbone network comparison experiment.
Table 3.
Results of the backbone network comparison experiment.
Method | P (%) | R (%) | mAP@50 (%) | mAP@50-95 (%) | GFlops (G) | Parament (M) |
---|
Base | 97.7 | 80.9 | 87.1 | 56.9 | 8.1 | 3.00 |
FasterNet | 97.0 | 80.4 | 81.3 | 53.7 | 10.7 | 4.17 |
MobileNetV4 | 96.1 | 79.6 | 90.3 | 59.1 | 22.5 | 5.70 |
HgNetV2 | 94.0 | 78.1 | 85.7 | 52.7 | 6.9 | 2.35 |
GSConv + DLKA | 96.6 | 80.2 | 89.9 | 60.0 | 8.3 | 3.50 |
Additionally, we compared SSFPN with other feature fusion networks such as PAN [
30], AFPN [
54], BiFPN [
31], Slimneck [
41], and HSFPN [
42], as shown in
Table 4. Under similar parameter and computation conditions, SSFPN achieved the highest detection accuracy in the circuit breaker operational status detection task. Through an efficient network structure design, SSFPN not only reduced computational complexity and resource consumption but also better preserved the complex edges and detailed information of defect objects, further enhancing detection performance.
Table 4.
Results of the feature fusion evaluation.
Table 4.
Results of the feature fusion evaluation.
Method | P (%) | R (%) | mAP@50 (%) | mAP@50-95 (%) | GFlops (G) | Parament (M) |
---|
Base + PAN | 97.7 | 80.9 | 87.1 | 56.9 | 8.1 | 3.00 |
Base + AFAN | 87.9 | 75.7 | 85.6 | 53.5 | 8.4 | 2.59 |
Base + BiFAN | 96.7 | 80.2 | 81.6 | 54.4 | 7.1 | 1.99 |
Base + Slimneck | 96.5 | 79.6 | 80.8 | 55.3 | 7.3 | 2.79 |
Base + HSFAN | 97.4 | 80.8 | 88.0 | 59.2 | 10.2 | 2.43 |
SSFPN | 98.3 | 81.1 | 90.9 | 59.9 | 9.5 | 2.20 |
The results of the ablation study validate the effectiveness of the various network components in DLCH-YOLO. Furthermore, we conducted a comprehensive comparison of DLCH-YOLO with other algorithms, including classical models like Faster-RCNN [
10] and Cascade-RCNN [
55], as well as widely adopted models such as YOLOv5, YOLOv8, YOLOv10, and RT-DETR-L [
53]. Compared to classical algorithms (Faster-RCNN, Cascade-RCNN), DLCH-YOLO not only significantly reduced parameters and computational complexity but also achieved superior performance in terms of
[email protected],
[email protected]:0.95, and FPS. When compared with lightweight models such as YOLOv5n, YOLOv8n, and YOLOv10, DLCH-YOLO showed notable improvements in
[email protected] and
[email protected]:0.95, despite an increase in parameters by 5.4G, 1.5G, and 3.1G, respectively. Specifically, compared to the model with the lowest mAP, YOLOv5n, DLCH-YOLO achieved an 8.6% increase in mAP. When compared to YOLOv8s, YOLOv10s, and RT-DETR, DLCH-YOLO maintained a lower parameter count and computational load while still outperforming them in
[email protected]:0.95,
[email protected] and FPS.
Table 5.
Experimental comparison between DLCH-YOLO and SOTA models.
Table 5.
Experimental comparison between DLCH-YOLO and SOTA models.
Method | mAP@50 (%) | mAP@50-95 (%) | GFlops (G) | Parament (M) | FPS |
---|
Faster-RCNN | 71.6 | 39.2 | 174 | 41.37 | 16.0 |
Cascade-RCNN | 70.5 | 43.1 | 201 | 69.23 | 12.9 |
YOLOv5n | 83.2 | 55.0 | 4.2 | 1.90 | 47.2 |
YOLOv8n | 87.1 | 56.9 | 8.1 | 3.01 | 146.0 |
YOLOv8s | 89.8 | 60.5 | 28.4 | 22.50 | 127.4 |
YOLOv10n | 88.8 | 55.3 | 6.5 | 2.26 | 115.3 |
YOLOv10s | 90.0 | 60.3 | 21.6 | 7.20 | 108.8 |
RT-DETR-L | 79.0 | 51.5 | 105 | 29.70 | 17.6 |
DLCH-YOLO | 86.7 | 55.7 | 9.6 | 2.60 | 72.8 |
Notably, while DLCH-YOLO utilizes large kernel convolutions that add complexity to the model, it achieves a performance of 72.8 FPS, which is substantially higher than that of two-stage algorithms and surpasses the real-time detection standard of over 60 FPS. This demonstrates that DLCH-YOLO improves detection accuracy while maintaining high speed and efficiency. In the field of power system monitoring, algorithms need to process images and make decisions in a very short time. Too high a delay may cause the system to fail to respond in time, thus affecting the decision effect. The actual performance of a model during deployment largely depends on the hardware platform. The computational power and memory constraints of different devices directly affect the inference speed and load-handling capabilities of the model.