This section focuses on the YOLOv8 architecture, enhanced by three optimization strategies, as the core algorithm investigated in this study. The aero-engine rotating machinery wear dataset serves as the input data source. The analysis includes the division of training and test sets, data labeling, and fault-mode distribution, followed by evaluation using performance metrics. Finally, the significance of the proposed improvements is validated through ablation testing.
4.4. Comparative Analysis
Table 4 presents the results of the ablation study, in which the baseline YOLOv8 architecture is used as the original model. The three innovative components proposed in this study are integrated step by step, and the evaluation metrics—precision, recall, mAP50, and mAP50-95—are reported. The analysis compares the performance of the proposed methods on the rotating machinery wear dataset. The results show a significant improvement over the original YOLOv8 architecture, with mAP50 increasing from 85.4% to 91%.
To verify the statistical significance of the performance improvements achieved by each of the improved modules (DWR-DRB, SPD-Conv, Focaler-MPDIoU) in
Table 4, we conducted paired
t-tests (
p < 0.05) and effect size analyses (Cohen’s d). The results show that introducing the DWR-DRB module alone increased mAP50 from 85.4% to 87.0% (
p = 0.008, d = 1.12), indicating its significant effect on sparse feature extraction; the addition of SPD-Conv further improved mAP50 to 87.9% (
p = 0.012, d = 0.98), validating its role in preserving fine-grained features; the mAP50 of the complete model (integrating all modules) reached 91.0%, showing a highly significant difference compared to the baseline YOLOv8’s 85.4% (
p = 0.002, d = 2.05), with an effect size exceeding 1.5, proving that the performance improvement is practically meaningful. Additionally, the synergistic effects of module combinations were significantly stronger than those of individual modules (e.g., DWR-DRB+SPD-Conv achieved an mAP50 of 88.3%,
p = 0.005, d = 1.53), indicating that the components collectively optimized model performance through complementary mechanisms. These statistical test results fully validate the effectiveness and necessity of the improved strategy.
Table 5 shows a comparison of the computational performance of different algorithms on the rotating machinery wear dataset. In terms of computational complexity (FLOPs), the number of FLOPs of YOLOv5 combined with the DWR-DRB module is 16.2 G, significantly lower than the 35.6 G of Cascade R-CNN, indicating higher computational efficiency; while the number of FLOPs of the EMSCP variant of YOLOv8 is 17.6 G, slightly higher than YOLOv5 but better than Cascade R-CNN’s 45.3 G, highlighting the advantages of its lightweight design. In terms of memory usage (RAM), YOLOv8’s RFCAConv requires only 8.0 GB, far below Cascade R-CNN’s 19.4 GB, making it more suitable for resource-constrained deployment environments. In terms of time efficiency, YOLOv8’s RFAConv requires only 12.9 ms, while Cascade R-CNN’s iRMB-Cascaded requires 46.7 ms, highlighting the real-time advantages of single-stage detectors. Overall, the YOLOv8 series outperforms Cascade R-CNN in terms of FLOPs, RAM, and time. In particular, the DWR-DRB module achieves a balance between accuracy and efficiency, maintaining high accuracy (mAP50 of 87.0%) while requiring only 15.8 G FLOPs, 8.5 GB RAM, and 14.9 ms of time.
All the ablation experiments were performed on a standardized hardware platform: NVIDIA RTX 3090 GPU (24 GB VRAM), Intel Xeon Gold 6226R CPU @ 2.90 GHz, 128 GB RAM, Ubuntu 20.04 LTS system. Training was performed using PyTorch 2.0.1 + CUDA 11.8, with a fixed batch size of 16. To eliminate the influence of randomness (1) a fixed random seed (seed = 2025) was used to control parameter initialization, data augmentation, and data loading order; (2) the training cycle was uniformly set to 150 epochs, and the learning rate was adjusted using a cosine annealing strategy (initial value 0.01, minimum value 0.001); (3) training times for each module variant were precisely recorded: the baseline YOLOv8 took 4.2 h, DWR-DRB increased the computational load by 23% due to 13 × 13 large convolutional kernels (5.1 h), SPD-Conv optimized memory access through spatial–depth conversion, reducing the time to 3.9 h, and the full model took 4.3 h due to module synergy effects.
The improved YOLOv8 architecture significantly enhances detection accuracy and robustness for three fault types—notches, abrasions, and scratches—on bearing surfaces by incorporating three core modules: DWR-DRB, SPD-Conv, and Focaler-MPDIoU.
DWR-DRB contributes through the integration of a dynamic weighting mechanism with depthwise-separable convolution. This enhances sensitivity to minor faults (e.g., shallow scratches) by adaptively adjusting the weights of feature channels and significantly reduces computational complexity. Moreover, its residual structure mitigates the vanishing gradient problem in deep networks and ensures deep semantic extraction of complex textures such as abrasions.
SPD-Conv improves detection performance for small targets by combining a multi-scale spatial pyramid structure with depthwise-separable convolution. It captures morphological features of defects at various scales (e.g., localized geometric distortions in notches, diffuse textures in abrasions) while minimizing redundant computation.
Focaler-MPDIoU introduces a fusion of a focus modulation mechanism with the modified point-distance IoU (MPDIoU). This addresses class imbalance by dynamically adjusting loss weights for difficult samples (e.g., blurred or occluded notches) and enhances bounding box regression accuracy, especially for elongated or fine defects like scratches.
The synergistic effect of these three components is as follows: DWR-DRB provides a lightweight, high-resolution feature base; SPD-Conv precisely localizes multi-scale fault features; and Focaler-MPDIoU further enhances classification and localization performance through dynamic loss optimization.
Table 5 presents the results of the ablation study, in which the baseline YOLOv8 architecture is used as the original model. The three innovative components proposed in this study are integrated step by step, and the evaluation metrics—precision, recall, mAP50, and mAP50-95—are reported. The analysis compares the performance of the proposed methods on the rotating machinery wear dataset. The results show a significant improvement over the original YOLOv8 architecture, with mAP50 increasing from 85.4% to 91%. Performance is compared across different baseline architectures, including YOLOv5, Cascade R-CNN, and YOLOv8. The results show that the optimized YOLOv8 integrated with DWR-DRB outperforms other variants, validating the effectiveness of the proposed enhancement strategy.
Compared with methods such as Faster-EMA, EMSCP, and RFCBAMConv, the DWR-DRB module achieves notably better recall (78.3%, versus 72.7% for Faster-EMA), while maintaining high precision (91.2%, comparable to EMSCP’s 92.2% but with lower computational cost). Its key innovations include the dynamic feature weighting mechanism, which improves sensitivity to tiny scratches, and the lightweight residual structure using depthwise-separable convolution. This yields an mAP50-95 of 47.5%, matching EMSCP while offering faster inference. The improvements are evident in three key areas:
(1) Multi-scale defect compatibility: DWR-DRB achieves an mAP50 of 87.0% for abrasions, exceeding RFCAConv (87.1%) while reducing the number of parameters by 15% (15.8 M vs. 18.6 M in RFCAConv).
(2) Real-time performance: Compared with iRMB-Cascaded (recall of 70.3%), DWR-DRB improves the recall by 8% (78.3% vs. 70.3%, p = 0.012) and reduces the number of FLOPs by 12% (15.8 G vs. 18.0 G).
(3) Robustness: The dynamic weighting mechanism enhances precision under variable lighting conditions, surpassing RFAConv (precision of 90.7%) in stability.
In comparative evaluations, the DWR-DRB module significantly improves small-defect detection (e.g., shallow scratches) through the fusion of dynamic weighting and residual structures. This yields a recall of 78.3%, outperforming traditional YOLOv5 (77.9%) and Cascade R-CNN (72.1%). Furthermore, it achieves an mAP50 of 87.0%, surpassing Faster-EMA (86.8%) and EMSCP (86.9%) while maintaining model lightweightness and computational efficiency. The architecture also demonstrates robust performance in complex industrial scenarios, with mAP50-95 reaching 47.5%, significantly outperforming iRMB-Cascaded (42.8%).
Table 6 presents the results of the ablation study, in which the baseline YOLOv8 architecture is used as the original model. The three innovative components proposed in this study are integrated step by step, and the evaluation metrics—precision, recall, mAP50, and mAP50-95—are reported. The analysis compares the performance of the proposed methods on the rotating machinery wear dataset. The results show a significant improvement over the original YOLOv8 architecture, with mAP50 increasing from 85.4% to 91%. Improvements to the head of the YOLOv8 architecture are explored by comparing different convolutional structures, using DWR-DRB as the backbone. The results indicate that the SPD-Conv structure provides substantial performance gains, affirming the superiority of this configuration.
SPD-Conv addresses the feature loss challenge in traditional convolutions when detecting tiny defects by leveraging spatial pyramid and depthwise-separable structures. It achieves an mAP50 of 87.9%, which is 0.9% higher than KernelWarehouse (87.0%), with significant gains in localizing fine features such as notches.
The DWR-DRB structure proposed in this paper strikes a remarkable balance between accuracy and computational overhead: its mAP50 reaches 87.0% (1.6% improvement over the baseline YOLOv8), and its numer of FLOPS is 15.8 G, which is only 27.4% higher than the original YOLOv8 (12.4 G), and much lower than EMSCP (17.6 G) and iRMB-Cascaded (20.1 G). DWR-DRB enhances sparse feature capture by large-kernel convolution (13 × 13), which increases the computation, but effectively controls RAM to 8.5 GB (22.7% lower than EMSCP) by fusing multiple branches during training to a single branch during inference through structural reparameterization. Comparing the two-stage model (e.g., 35.6 G FLOPS for Cascade R-CNN+DWR-DRB), this method improves the computational efficiency by 55.6% while maintaining similar recall (78.3% vs. 72.1%).
The combined effect of DWR-DRB and SPD-Conv optimizes multi-scale feature fusion: DWR-DRB delivers a high-resolution feature base, while SPD-Conv enhances fine-grained feature extraction. This leads to an mAP50-95 of 47.9%, outperforming all other comparative methods (best among them: 47.6%), while also exceeding the computational efficiency of lightweight methods such as Slim.
Table 7 presents a comparative analysis of several loss functions, focusing on mainstream IoU, Inner-MPDIoU, Focaler-IoU, and the proposed Focaler-MPDIoU. The results clearly indicate that Focaler-MPDIoU demonstrates superior algorithmic performance compared to the other methods, highlighting its effectiveness in enhancing detection accuracy.
Table 8 shows the comparative results of Inner-MPDIoU under different ratio value conditions. The analysis reveals that a ratio value of 1 yields the best fault detection performance. Additionally, the variations in performance with changing ratio values appear to follow a normal distribution pattern.
Additionally, to verify the robustness of the results under different SEED conditions, the outcomes are shown in
Table 9.
In five independent experiments with random seeds 2023, 2024, 2025, 2026, 2027, the method in this paper showed excellent stability: mAP50 mean value of 90.9% with a standard deviation of only ±0.18% (fluctuating range: 90.7–91.2%); recall mean value of 85.4% ± 0.22% (85.1–85.7%); and mAP50-95 mean of 52.9% ± 0.16% (52.7–53.1%). The standard deviations of the key metrics are all less than 0.25%, demonstrating that the model is insensitive to parameter initialization and data order. This strong robustness stems from a triple mechanism: (1) the multi-branch structure of DWR-DRB suppresses stochastic fluctuations through gradient path redundancy; (2) the spatial–depth transformation of SPD-Conv preserves structured features and reduces noise sensitivity; and (3) the difficult-sample-focusing mechanism of Focaler-MPDIoU reduces the dependence of the loss function on initialization. Compared to the baseline YOLOv8’s mAP50 fluctuation of about ±0.5% in historical experiments, the method in this paper improves the stability by 62%.