YOLO-MAD: Multi-Scale Geometric Structure Feature Extraction and Fusion for Steel Surface Defect Detection
Abstract
1. Introduction
- AKConv (adaptive kernel convolution) is integrated into the backbone to replace standard convolution modules. By introducing dynamic sampling locations and a learnable mask mechanism, AKConv enables the convolutional kernels to adaptively focus on fine-grained and irregular defects (e.g., micro-cracks), substantially boosting the network’s sensitivity to minute structural discontinuities.
- BiFPN (bidirectional feature pyramid network) is adopted to replace the original PANet in the neck. By incorporating a learnable weighted feature fusion strategy, BiFPN strengthens the information flow across multi-scale features and facilitates effective interaction among features at different levels. This design improves the model’s expressive power in handling defects of varying scales.
- Detect_DyHead (Detect_DynamicHead) is employed in the detection head, which integrates novel task-specific, spatial–contextual, and scale-sensitive attention layers on top of the existing branch separation. This integration further enhances the response to critical regions and improves robustness and accuracy in complex textured backgrounds beyond the baseline separation.
2. Related Work
2.1. YOLO
2.2. Industrial Defect Detection
3. Proposed Method
3.1. Enhancing Feature Extraction with AKConv
3.2. Improving Multi-Scale Feature Fusion with BiFPN
- Removal of single-input nodes: BiFPN removes single-input nodes (those with only one input edge). Since these non-fusion nodes contribute negligibly to the network’s feature integration objective. This optimization yields a more efficient bidirectional topology without compromising fusion performance.
- Same-level skip connections: Extra connections are established between corresponding input and output nodes residing at the same level. This facilitates the merging of a greater number of features without a substantial rise in computational expense.
- Repeated multi-layer fusion: Unlike PANet’s restricted single top-down and bottom-up pathway structure, BiFPN considers each bidirectional (top-down and bottom-up) pathway as an individual feature network layer. This identical layer is subsequently iterated multiple times to facilitate the integration of more advanced features.
3.3. Detect_DyHead: Enhancing Detection with 3D Decoupled Attention
4. Experiment
4.1. Dataset and Experimental Configuration
4.2. Performance Comparison with SOTA Approaches
4.3. The Result of Ablation Study
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Chen, Y.; Ding, Y.; Zhao, F.; Zhang, E.; Wu, Z.; Shao, L. Surface Defect Detection Methods for Industrial Products: A Review. Appl. Sci. 2021, 11, 7657. [Google Scholar]
- Qiao, Q.; Hu, H.; Ahmad, A.; Wang, K. A Review of Metal Surface Defect Detection Technologies in Industrial Applications. IEEE Access 2025, 13, 48380–48400. [Google Scholar]
- Lee, S.; Chang, L.M.; Skibniewski, M. Automated recognition of surface defects using digital color image processing. Autom. Constr. 2006, 15, 540–549. [Google Scholar]
- Rattanaphan, S.; Briassouli, A. Evaluating Generalization, Bias, and Fairness in Deep Learning for Metal Surface Defect Detection: A Comparative Study. Processes 2024, 12, 456. [Google Scholar] [CrossRef]
- Zhao, B.; Chen, Y.; Jia, X.; Ma, T. Steel surface defect detection algorithm in complex background scenarios. Measurement 2024, 237, 115189. [Google Scholar]
- Zheng, X.; Zheng, S.; Kong, Y.; Chen, J. Recent advances in surface defect inspection of industrial products using deep learning techniques. Int. J. Adv. Manuf. Technol. 2021, 113, 35–58. [Google Scholar]
- Jiang, P.; Ergu, D.; Liu, F.; Cai, Y.; Ma, B. A Review of Yolo Algorithm Developments. Procedia Comput. Sci. 2022, 199, 1066–1073. [Google Scholar]
- Xu, H.; Zhang, Z.; Ye, H.; Song, J.; Chen, Y. Efficient Steel Surface Defect Detection via a Lightweight YOLO Framework with Task-Specific Knowledge-Guided Optimization. Electronics 2025, 14, 2029. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Ultralytics. Ultralytics YOLO. Available online: https://docs.ultralytics.com/ (accessed on 10 May 2025).
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018. [Google Scholar] [CrossRef]
- Jocher, G. YOLOv5 by Ultralytics. 2020. Available online: https://github.com/ultralytics/yolov5 (accessed on 10 May 2025).
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
- Varghese, R.; Sambath, M. YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness. In Proceedings of the 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), Chennai, India, 18–19 April 2024; pp. 1–6. [Google Scholar]
- Jocher, G.; Chaurasia, A.; Qiu, J. YOLO by Ultralytics. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 10 May 2025).
- Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable Convolutional Networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar]
- Ouyang, D.; He, S.; Zhang, G.; Luo, M.; Guo, H.; Zhan, J.; Huang, Z. Efficient Multi-Scale Attention Module with Cross-Spatial Learning. In Proceedings of the ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]
- Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. GhostNet: More Features from Cheap Operations. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 1580–1589. [Google Scholar]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar]
- He, L.; Zheng, L.; Xiong, J. FMV-YOLO: A Steel Surface Defect Detection Algorithm for Real-World Scenarios. Electronics 2025, 14, 1143. [Google Scholar]
- Lu, M.; Sheng, W.; Zou, Y.; Chen, Y.; Chen, Z. WSS-YOLO: An improved industrial defect detection network for steel surface defects. Measurement 2024, 236, 115060. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Proceedings of the 29th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2025; Volume 39, pp. 1137–1149. [Google Scholar]
- He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Zhang, G.; Cui, K.; Hung, T.Y.; Lu, S. Defect-GAN: High-Fidelity Defect Synthesis for Automated Defect Inspection. In Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2021; pp. 2524–2534. [Google Scholar]
- Zabin, M.; Kabir, A.N.B.; Kabir, M.K.; Choi, H.J.; Uddin, J. Contrastive self-supervised representation learning framework for metal surface defect detection. J. Big Data 2023, 10, 145. [Google Scholar]
- Lu, H.; Zhu, Y.; Yin, M.; Yin, G.; Xie, L. Multimodal Fusion Convolutional Neural Network with Cross-Attention Mechanism for Internal Defect Detection of Magnetic Tile. IEEE Access 2022, 10, 60876–60886. [Google Scholar]
- Zhang, X.; Song, Y.; Song, T.; Yang, D.; Ye, Y.; Zhou, J.; Zhang, L. AKConv: Convolutional Kernel with Arbitrary Sampled Shapes and Arbitrary Number of Parameters. arXiv 2023. [Google Scholar] [CrossRef]
- Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
- Dai, X.; Chen, Y.; Xiao, B.; Chen, D.; Liu, M.; Yuan, L.; Zhang, L. Dynamic Head: Unifying Object Detection Heads with Attentions. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 7373–7382. [Google Scholar]
- Wang, L.; Liu, X.; Ma, J.; Su, W.; Li, H. Real-Time Steel Surface Defect Detection with Improved Multi-Scale YOLO-v5. Processes 2023, 11, 1357. [Google Scholar]
- You, C.; Kong, H. Improved Steel Surface Defect Detection Algorithm Based on YOLOv8. IEEE Access 2024, 12, 99570–99577. [Google Scholar]
- Zhang, H.; Li, S.; Miao, Q.; Fang, R.; Xue, S.; Hu, Q.; Hu, J.; Chan, S. Surface defect detection of hot rolled steel based on multi-scale feature fusion and attention mechanism residual block. Sci. Rep. 2024, 14, 7671. [Google Scholar]
- Yang, Y.; Feng, Z.; Jin, W.; Miao, P. ADD-YOLO: A new model for object detection in aerial images. Multimed. Syst. 2025, 31, 120. [Google Scholar]
- Bao, Y.; Song, K.; Liu, J.; Wang, Y.; Yan, Y.; Yu, H.; Li, X. Triplet-graph reasoning network for few-shot metal generic surface defect segmentation. IEEE Trans. Instrum. Meas. 2021, 70, 5011111. [Google Scholar]
- Song, K.; Yan, Y. A noise robust method based on completed local binary patterns for hot-rolled steel strip surface defects. Appl. Surf. Sci. 2013, 285, 858–864. [Google Scholar]
- He, Y.; Song, K.; Meng, Q.; Yan, Y. An end-to-end steel surface defect detection approach via fusing multiple hierarchical features. IEEE Trans. Instrum. Meas. 2019, 69, 1493–1504. [Google Scholar]
- Lv, X.; Duan, F.; Jiang, J.j.; Fu, X.; Gan, L. Deep metallic surface defect detection: The new benchmark and detection network. Sensors 2020, 20, 1562. [Google Scholar] [CrossRef] [PubMed]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the Computer Vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. Springer: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
- Ma, X.; Deng, X.; Kuang, H.; Liu, X. YOLOv7-BA: A Metal Surface Defect Detection Model Based On Dynamic Sparse Sampling And Adaptive Spatial Feature Fusion. In Proceedings of the 2024 IEEE 6th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC), Chongqing, China, 24–26 May 2024; Volume 6, pp. 292–296. [Google Scholar]
- Guo, Z.; Wang, C.; Yang, G.; Huang, Z.; Li, G. MSFT-YOLO: Improved YOLOv5 Based on Transformer for Detecting Defects of Steel Surface. Sensors 2022, 22, 3467. [Google Scholar] [CrossRef] [PubMed]
- Pang, J.; Chen, K.; Shi, J.; Feng, H.; Ouyang, W.; Lin, D. Libra r-cnn: Towards balanced learning for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 821–830. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Lipton, Z.C. The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery. Queue 2018, 16, 31–57. [Google Scholar]
- Shu, X.; Xu, L.; He, Z.; Sheng, L.; Ye, G.; Lu, X. Wafer Defect Detection Based on YOLO-BA. In Proceedings of the 2024 International Conference on Sensing, Measurement & Data Analytics in the Era of Artificial Intelligence (ICSMD), Huangshan, China, 31 October–3 November 2024; pp. 1–7. [Google Scholar]
- Jezek, S.; Jonak, M.; Burget, R.; Dvorak, P.; Skotak, M. Deep learning-based defect detection of metal parts: Evaluating current methods in complex conditions. In Proceedings of the 2021 13th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT), Brno, Czech Republic, 25–27 October 2021; pp. 66–71. [Google Scholar]
Dataset | Image Count | Defect Categories | Resolution | Data Split (Train:Test) | Annotation Type |
---|---|---|---|---|---|
NEU-DET [35] | 1800 | 6 | 4:1 (80%:20%) | Bounding Box | |
GC10-DET [37] | 2300 | 10 | 4:1 (80%:20%) | Bounding Box |
Method | mAP | CR | IN | PA | PS | RS | SC | GFLOPs |
---|---|---|---|---|---|---|---|---|
SSD [38] | 63.8 | 47.3 | 68.5 | 88.6 | 68.4 | 54.7 | 55.0 | 281.9 |
YOLO-BA [39] | 74.8 | 36.3 | 67.8 | 91.0 | 96.6 | 70.6 | 86.4 | - |
MSFT-YOLO [40] | 75.2 | 56.9 | 80.8 | 93.5 | 82.1 | 52.7 | 83.5 | - |
WSS-YOLO [21] | 82.3 | 58.1 | 80.9 | 93.9 | 94.2 | 73.1 | 93.9 | 7.7 |
YOLOv8n | 71.2 | 43.3 | 78.8 | 92.4 | 83.6 | 48.8 | 80.1 | 8.1 |
YOLOv11n | 71.1 | 49.3 | 79.5 | 92.2 | 80.2 | 60.6 | 65.0 | 6.3 |
YOLOv8l | 72.5 | 52.0 | 75.1 | 90.7 | 84.4 | 55.0 | 77.6 | 164.8 |
YOLOv11l | 71.6 | 41.2 | 80.4 | 92.4 | 81.0 | 56.4 | 78.2 | 86.6 |
YOLO-MAD | 76.6 | 38.5 | 85.0 | 94.2 | 83.3 | 65.5 | 93.0 | 9.4 |
Method | mAP | Pu | Wl | Cg | Ws | Os | Ss | In | Rp | Cr | Wf |
---|---|---|---|---|---|---|---|---|---|---|---|
Libra Faster R-CNN [41] | 58.8 | 99.5 | 42.9 | 94.9 | 72.8 | 72.1 | 62.8 | 18.8 | 37.4 | 17.6 | 69.3 |
RetinaNet [42] | 65.5 | 79.6 | 91.5 | 94.3 | 79.1 | 62.0 | 66.4 | 29.7 | 33.9 | 35.2 | 77.0 |
WSS-YOLO [21] | 72.0 | 98.2 | 95.2 | 93.5 | 87.3 | 57.9 | 62.8 | 38.0 | 35.4 | 58.1 | 93.4 |
YOLOv8n | 63.6 | 98.2 | 92.2 | 90.4 | 80.5 | 69.9 | 61.4 | 31.3 | 8.4 | 32.6 | 71.0 |
YOLOv8l | 66.1 | 96.9 | 94.0 | 91.8 | 86.5 | 69.5 | 58.2 | 38.8 | 17.4 | 38.0 | 70.2 |
YOLOv11n | 63.1 | 95.8 | 89.5 | 90.1 | 81.6 | 68.8 | 57.9 | 39.4 | 5.6 | 28.7 | 73.2 |
YOLOv11l | 66.9 | 96.4 | 95.1 | 93.3 | 83.9 | 71.5 | 62.5 | 39.3 | 22.1 | 34.7 | 70.7 |
YOLO-MAD | 68.4 | 98.6 | 93.4 | 96.0 | 80.7 | 68.7 | 56.5 | 38.6 | 25.2 | 33.5 | 77.1 |
Model | P | R | mAP50 | mAP50-95 |
---|---|---|---|---|
YOLOv8n | 0.646 ± 0.002 | 0.678 ± 0.003 | 0.712 ± 0.002 | 0.373 ± 0.001 |
+AKConv | 0.688 ± 0.003 | 0.689 ± 0.003 | 0.725 ± 0.002 | 0.385 ± 0.004 |
+BiFPN | 0.665 ± 0.002 | 0.712 ± 0.002 | 0.738 ± 0.005 | 0.398 ± 0.001 |
+Detect_DyHead | 0.672 ± 0.003 | 0.708 ± 0.003 | 0.745 ± 0.001 | 0.410 ± 0.002 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ding, H.; Chen, J.; Ye, H.; Chen, Y. YOLO-MAD: Multi-Scale Geometric Structure Feature Extraction and Fusion for Steel Surface Defect Detection. Appl. Sci. 2025, 15, 7887. https://doi.org/10.3390/app15147887
Ding H, Chen J, Ye H, Chen Y. YOLO-MAD: Multi-Scale Geometric Structure Feature Extraction and Fusion for Steel Surface Defect Detection. Applied Sciences. 2025; 15(14):7887. https://doi.org/10.3390/app15147887
Chicago/Turabian StyleDing, Hantao, Junkai Chen, Hairong Ye, and Yanbing Chen. 2025. "YOLO-MAD: Multi-Scale Geometric Structure Feature Extraction and Fusion for Steel Surface Defect Detection" Applied Sciences 15, no. 14: 7887. https://doi.org/10.3390/app15147887
APA StyleDing, H., Chen, J., Ye, H., & Chen, Y. (2025). YOLO-MAD: Multi-Scale Geometric Structure Feature Extraction and Fusion for Steel Surface Defect Detection. Applied Sciences, 15(14), 7887. https://doi.org/10.3390/app15147887