Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Detection of Welding Defects Tracked by YOLOv4 Algorithm

Appl. Sci. 2025, 15(4), 2026; https://doi.org/10.3390/app15042026

by Yunxia Chen^* and Yan Wu

Reviewer 1: Anonymous

Reviewer 2:

Muhamad Hilmil Muchtar Aditya Pradana

Appl. Sci. 2025, 15(4), 2026; https://doi.org/10.3390/app15042026

Submission received: 13 January 2025 / Revised: 6 February 2025 / Accepted: 12 February 2025 / Published: 14 February 2025

(This article belongs to the Section Computing and Artificial Intelligence)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

In time of reviewing this paper, the year is 2025, but in paper newest references are from year of 2021. I have no doubt that results are relevant due to specific domain, but this fast moving field and comparing with "older" networks such as yolo-v3, yolo-v4, Centernet is out of date.

Main weakness of paper is that wield defect detection is based on yolo-v4 architecture (yolo-4 came out in 2020 year), when current hotness are transformers, but as mentioned before due to specific domain is not critical problem, it needs to show experiment results with today relevant models.

As for detail, i have following comments:

1. Please make revision and update "Introduction" chapter with "current day" artificial intelligence methods (for example YoloV8,YoloV9, YoloV10, CenterNetv2, Centernet++, DINO, DETR, DETA and so on.).

2. Page 4 "YOLOv4-cs1" chapter, replaced ReLU with ReLU6 can prevent numerical explosion, did you did experiment without ReLU6 and got numerical explosion?

3. Page 4 "YOLOv4-cs1" chapter, you used ReLU6, is any reason why ReLU6? and not numerical clipping operation after oridanary ReLU?

4.4 in chapter "Experimental results and analysis" - please add table with model sizes, parameter count and other relevant information for better information

4. chapter "4.1. Data set", please add visual representation (figure) of typical "pores","slag inclusion", "incomplete penetration"

5. chapter 4.4.1, in my opinion ", etc" is not necessary or please elaborate indicators

6. "Table 3", the fps information is nowhere used in paper, is it necessary metric for model performance? If yes please, elaborate that information in paper.

7. "Table 3", "Table 4", "Table 5", same as with introduction, please compare results with more recent models

8. "Figure 9", "Figure 10", "Figure 11" add visual cues (such as grid) to better readability

9. "Conclusion", "The results show that the recall rate of the two optimized models on pores and slag
inclusion is greatly improved, and the overall detection effect is better." is very generic, please revise it with taking account problem domain.

Author Response

Please make revision and update "Introduction" chapter with "current day" artificial intelligence methods (for example YoloV8, YoloV9, YoloV10, CenterNetv2, Centernet++, DINO, DETR, DETA and so on.).

Response 1: The reviewed introduction section is as follows:

Aluminum alloy materials are not only corrosion-resistant, easy to use and maintain, but also significantly reduce the weight of objects. The development of the aluminum processing industry has primarily relied on mature technology, low-cost replication, and expansion, leading to increased production capacity and widespread application in aerospace, construction, and rail transportation. Due to the influence of various process parameters on the welding process of aluminum alloy weldments, both manual welding and robot-assisted automatic welding can result in defects such as pores, slag inclusion, and incomplete penetration. These welding defects pose potential safety hazards, including leakage and explosion, making weld defect detection a critical issue in the industry.

In industrial production, X-ray non-destructive testing is commonly used to detect internal defects in aluminum alloy weldments. However, prolonged exposure to X-ray images can lead to ophthalmic diseases and errors due to human subjectivity and fatigue, which are unacceptable in engineering practice. Moreover, current real-time weld defect detection systems face limitations in terms of accuracy, efficiency, and false positive rates. Therefore, it is essential to develop an automated weld defect detection system that can detect weld defects more effectively, quickly, and accurately [1].

Object detection is a fundamental task in computer vision, aimed at identifying object categories within images and precisely determining their locations. Object detection algorithms can be classified into convolution-based and transformer-based methods [2]. Convolution-based approaches can be further subdivided into two-stage and one-stage detectors. The two-stage detection model consists of a first stage that proposes candidate bounding boxes and a second stage that refines and classifies these proposals. Notable examples in the RCNN series include SPPNet [3], Fast R-CNN [4], and Region Proposal Network (RPN) [5]. While these methods achieve high detection accuracy, they do so at the expense of slower processing speed [6]. In contrast, single-stage detection models such as YOLO [7], RetinaNet [8], and SSD [9] directly predict both the bounding boxes and class probabilities of objects within an image, thereby significantly enhancing detection speed and better meeting practical application requirements. Zhao et al. [10] introduced a novel infrared aerial target detection method called YOLO-Mamba to address challenges such as distance dependence and computational complexity in existing approaches. Huang et al. [11] proposed an enhanced RetinaNet algorithm to improve the model's detection performance, specifically targeting issues arising from multi-scale variations in targets. To tackle the problem of detecting occluded objects, Biffi et al. [12] developed a deep learning method based on Adaptive Training Sample Selection (ATSS), which labels only the center points of objects, thereby enhancing practicality.

Weld defect detection, as a specialized application of computer vision, is a promising research area with significant prospects, particularly given the critical importance of oil and natural gas as primary energy sources, which provide a strong practical foundation. Currently, experts in weld seam detection primarily focus on challenges such as dataset scarcity, multi-scale defect detection, and tiny target detection. Guo et al. [13] proposed a detection method that integrates Generative Adversarial Networks (GANs) with transfer learning to balance data distribution and augment image samples, thereby addressing data imbalance issues. Kumaresan et al. [14] explored an image-centric approach and employed real-time image data augmentation techniques to overcome the limitations of X-ray datasets. Ji et al. [15] tackled inconsistent scales and wide boundary transitions in weld seam defects using geometric transformation-based data augmentation and a Feature Pyramid Network (FPN) to improve detection accuracy. Liu et al. [16] addressed significant shape and size variations by designing multi-scale feature extraction modules, proposing the LF-YOLO method that balances performance and computational cost. Pan et al. [17] introduced a gray value curve enhancement module and the WD-YOLO model to handle large shape and size variations. Yang et al. [18] proposed an end-to-end detection model with bidirectional convolutional LSTM blocks to optimize shortcuts, addressing the lack of time-based information in existing methods. To address the weak generalization of anchor-based detectors for large-scale defects, Zuo et al. [19] proposed an efficient anchor-free detector with Dynamic Receptive Field Allocation (DRFA) and task alignment, improving both localization and classification. Additionally, Zuo et al. [20] designed a Multilevel Attention Feature Fusion Network (MAFFN) to enhance prediction accuracy for multi-scale defects.

Gu Jing et al. [21] proposed an enhanced Faster-RCNN model for detecting weld defects. The model employs a multi-layer feature extraction network and incorporates multiple sliding windows to improve detection performance. Guo Feng et al. [22] primarily investigated the impact of different activation functions, including Mish, Swish, and Leaky-ReLU, on the detection results of the YOLOv4 model. Their findings indicate that the simultaneous use of Swish and Mish activation functions in the YOLOv4 model yields the best detection performance. Inspired by these studies, this manuscript introduces improvements to the original YOLOv4 model. Compared with existing algorithms, the improved model demonstrates superior performance metrics in defect detection, providing more accurate results across various types of defects.

This paper is organized as follows: Section 1 introduces deep learning models and their applications; Section 2 details the prediction principles, activation functions, and overall framework of the YOLOv4 algorithm; Section 3 presents the improved YOLOv4-cs1 and YOLOv4-cs2 models along with their framework diagrams; Section 4 describes the experimental conditions, hardware setup, and analysis of the experimental results. Finally, a comparative chart of different models' performance in defect detection is provided.

References

Z. Du; H.Y. Shen; J.Z. Fu; et al. Approaches for improvement of the X-ray image defect detection of automobile casting aluminum parts based on deep learning. NDT & E Int., 2019, 107,102144.
Arkin; N. Yadikar; X. Xu; et al. A survey: object detection methods from cnn to transformer. Multimed. Tools Appl., 2023, 82, 21353–21383.
He; X. Zhang; S. Ren; et al. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE TPAMI., 2015, 37, 1904–1916.
Girshick. Fast r-cnn. arXiv preprint arXiv, 2015, 1504.08083.
Ren. Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv preprint arXiv, 2015, 1506.01497.
-Y. Zhou; B.-B. Gao; J. Wu. Adaptive feeding: Achieving fast and accurate detections by adaptively combining object detectors. 2017 IEEE International Conference on Computer Vision, 2017, 3505–3513.
Jiang; D. Ergu; F. Liu; et al. A review of yolo algorithm developments. Procedia computer science, 2022, 199, 1066–1073.
-Y. Ross; G. Dollár. Focal loss for dense object detection. 2017 IEEE International Conference on Computer Vision, 2017, 2980–2988.
Liu; D. Anguelov; D. Erhan; et al. Ssd: Single shot multibox detector. Computer Vision–ECCV 2016: 14th European Conference, 2016, Part I 14, 21–37.
Zhao; P. He. Yolo-mamba: object detection method for infrared aerial images. Signal, Image Video P.,2024, 18, 8793–8803.
Huang; Z. Wang; X. Fu. Pedestrian detection using retinanet with multi-branch structure and double pooling attention mechanism. Multimed. Tools Appl., 2024, 83, 6051–6075.
J. Biffi; E. Mitishita; V. Liesenberg; et al. Atss deep learning-based approach to detect apple fruits. Remote Sensing, 2020, 13, 54.
Guo; H. Liu; G. Xie; et al. Weld defect detection from imbalanced radiographic images based on contrast enhancement conditional generative adversarial network and transfer learning. IEEE SENS J., 2021, 21, 10844–10853.
Kumaresan; K. J. Aultrin; S. Kumar; et al. Deep learning-based weld defect classification using vgg16 transfer learning adaptive fine-tuning. INT J INTERACT DES M., 2023, 17, 2999–3010.
Ji, H. Wang; H. Li. Defects detection in weld joints based on visual attention and deep learning. NDT & E Int., 2023, 133, 102764.
Liu; Y. Chen; J. Xie; et al. Lf-yolo: A lighter and faster yolo for weld defect detection of x-ray image. IEEE SENS J., 2023, 23, 7430–7439.
Pan; H. Hu; P. Gu. Wd-yolo: A more accurate yolo for defect detection in weld x-ray images. Sensors, 2023, 23, 8677.
Yang; S. Xu; J. Fan; et al. A pixel-level deep segmentation network for automatic defect detection. EXPERT SYST APPL., 2023, 215, 119388.
Zuo; J. Liu; M. Fu; et al. An efficient anchor-free defect detector with dynamic receptive field and task alignment. IEEE T IND INFORM., 2024, 20, 6, 8536 – 8547.
Zuo; J. Liu; M. Fu; et al. Stma-net: A spatial transformation-based multi-scale attention network for complex defect detection with x-ray images. IEEE TIM., 2024, 73, 5014511.
Gu; Z.Q. Xie; X.Y. Zhang. Weld Defect Detection based on Improved Deep Learning. Journal OF Astronautic Metrology and Measurement, 2020, 40, 75-79.
Guo; Y. Qian; Y.F. Shi. Real-time railroad track components inspection based on the improved YOLOv4 framework. Automat. Constr, 2021, 125, 103596.

Page 4 "YOLOv4-cs1" chapter, replaced ReLU with ReLU6 can prevent numerical explosion, did you did experiment without ReLU6 and got numerical explosion?

Response 2: Yes, we did.

Page 4 "YOLOv4-cs1" chapter, you used ReLU6, is any reason why ReLU6? and not numerical clipping operation after oridanary ReLU?

Response 3: The reason is that the pores and incomplete penetration defects in the training dataset are small targets, down-sampling in the original YOLOv4 model can lead to the loss of edge information for these defects, thereby reducing the model's effectiveness. Therefore, the down-sampling layers and corresponding feature fusion operations in the PANet network are removed to preserve more effective edge information as the network depth increases.

4.4 in chapter "Experimental results and analysis" - please add table with model sizes, parameter count and other relevant information for better information.

Response 4.4: This is not a public dataset. The dataset includes three object classes: pores, slag inclusion, and incomplete penetration, totaling 1005 images. Given the limited number of images (1005), which is insufficient for effective training of deep learning models, data augmentation is necessary to enhance the dataset. By rotating, cropping, and adjusting the contrast of the images, the dataset is expanded to include more feature information. For example, Figure 7a, b, and c illustrate the augmentation process for the incomplete penetration defect. Using the popular annotation tool LabelMe, the image database now comprises 5165 labeled images, as shown in the schematic diagram in Figure 8. The dataset is randomly divided into three subsets: 3,616 images for training, 516 images for validation, and 1,033 images for testing.

chapter "4.1. Data set", please add visual representation (figure) of typical "pores","slag inclusion", "incomplete penetration"

Response 4: Figure 9 has been newly added.

Figure 9 illustrates the data distribution for different object classes across the training dataset, validation dataset, and test dataset. It is important to note that the data used for subsequent comparison experiments were obtained from the test set. Utilizing this augmented dataset improves the robustness of the model during training.

Figure 9. Dataset Partitioning

chapter 4.4.1, in my opinion ", etc" is not necessary or please elaborate indicators

Response 5: Thank you, we deleted it.

"Table 3", the fps information is nowhere used in paper, is it necessary metric for model performance? If yes please, elaborate that information in paper.

Response 6: Thank you, we deleted it.

"Table 3", "Table 4", "Table 5", same as with introduction, please compare results with more recent models.

Response 7: Thank you. We will continue to focus on developing an advanced intelligent detection system for aluminum alloy weld defects. Upon completion, we will conduct a comparative analysis with the original results.

"Figure 9", "Figure 10", "Figure 11" add visual cues (such as grid) to better readability

Response 8: Some grids have been added to Figures 9, 10 and 11, respectively.

Figure 10. F1 values of different models.

Figure 11. Missed detection rate of different models.

Figure 12. Comparison of AP and mAP for varied models.

"Conclusion", "The results show that the recall rate of the two optimized models on pores and slag inclusion is greatly improved, and the overall detection effect is better." is very generic, please revise it with taking account problem domain.

Response 9: The results indicate that the recall rates for pores and slag inclusion have significantly improved in both optimized models, attributed to their enhanced ability to learn edge information. In the future, we will continue focusing on designing an advanced intelligent detection system for aluminum alloy weld defects, aimed at improving the safety and automation of equipment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This paper proposes welding defects using YOLOv4. There are several comments to improve the quality of this paper:

1. The paper proposes an improved YOLOv4-based model for defect detection but in actual, this paper makes minor modifications such as replacing activation functions and adding SPP modules. These are incremental changes rather than substantial innovations in the field of deep learning for defect detection.

2. The improvements from YOLOv4-cs1 or YOLOv4-cs2 using k-means++ clustering and SPP modules do not significantly advance the current state of the art.

3. The dataset size is limited, with originally only 1,005 images (expanded to 2,801 through augmentation). This small dataset raises concerns about the model's generalizability to real-world scenarios.

4. The comparison models (e.g., YOLOv3, YOLOv4, SSD) are outdated. Including more recent state-of-the-art models would provide a more meaningful benchmark.

5. The reported improvements in metrics are marginal and could result from overfitting rather than improvement of the detection algorithm.

6. The manuscript contains numerous grammatical errors and is unclear in this manuscript. This reduces the overall readability and professional quality of the paper.

Author Response

The paper proposes an improved YOLOv4-based model for defect detection but in actual, this paper makes minor modifications such as replacing activation functions and adding SPP modules. These are incremental changes rather than substantial innovations in the field of deep learning for defect detection.

Response 1: Agree.

The improvements from YOLOv4-cs1 or YOLOv4-cs2 using k-means++ clustering and SPP modules do not significantly advance the current state of the art.

Response 2: Compared to k-means, k-means++ optimizes the initial clustering center selection method to ensure that the distance between the initial cluster centers is maximized, thereby achieving a more effective clustering outcome.

The dataset size is limited, with originally only 1,005 images (expanded to 2,801 through augmentation). This small dataset raises concerns about the model's generalizability to real-world scenarios.

Response 3: This is not a public dataset. The dataset includes three object classes: pores, slag inclusion, and incomplete penetration, totaling 1005 images. Given the limited number of images (1005), which is insufficient for effective training of deep learning models, data augmentation is necessary to enhance the dataset. By rotating, cropping, and adjusting the contrast of the images, the dataset is expanded to include more feature information. For example, Figure 7a, b, and c illustrate the augmentation process for the incomplete penetration defect. Using the popular annotation tool LabelMe, the image database now comprises 5165 labeled images, as shown in the schematic diagram in Figure 8. The dataset is randomly divided into three subsets: 3,616 images for training, 516 images for validation, and 1,033 images for testing. Figure 9 illustrates the data distribution for different object classes across the training dataset, validation dataset, and test dataset. It is important to note that the data used for subsequent comparison experiments were obtained from the test set. Utilizing this augmented dataset improves the robustness of the model during training.

Figure 9. Dataset Partitioning

The comparison models (e.g., YOLOv3, YOLOv4, SSD) are outdated. Including more recent state-of-the-art models would provide a more meaningful benchmark.

Response 4: Agree, we will continue to focus on developing an advanced intelligent detection system for aluminum alloy weld defects. Upon completion, we will conduct a comparative analysis with the original results. The reviewed introduction section is as follows:

References

Z. Du; H.Y. Shen; J.Z. Fu; et al. Approaches for improvement of the X-ray image defect detection of automobile casting aluminum parts based on deep learning. NDT & E Int., 2019, 107,102144.
Arkin; N. Yadikar; X. Xu; et al. A survey: object detection methods from cnn to transformer. Multimed. Tools Appl., 2023, 82, 21353–21383.
He; X. Zhang; S. Ren; et al. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE TPAMI., 2015, 37, 1904–1916.
Girshick. Fast r-cnn. arXiv preprint arXiv, 2015, 1504.08083.
Ren. Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv preprint arXiv, 2015, 1506.01497.
-Y. Zhou; B.-B. Gao; J. Wu. Adaptive feeding: Achieving fast and accurate detections by adaptively combining object detectors. 2017 IEEE International Conference on Computer Vision, 2017, 3505–3513.
Jiang; D. Ergu; F. Liu; et al. A review of yolo algorithm developments. Procedia computer science, 2022, 199, 1066–1073.
-Y. Ross; G. Dollár. Focal loss for dense object detection. 2017 IEEE International Conference on Computer Vision, 2017, 2980–2988.
Liu; D. Anguelov; D. Erhan; et al. Ssd: Single shot multibox detector. Computer Vision–ECCV 2016: 14th European Conference, 2016, Part I 14, 21–37.
Zhao; P. He. Yolo-mamba: object detection method for infrared aerial images. Signal, Image Video P.,2024, 18, 8793–8803.
Huang; Z. Wang; X. Fu. Pedestrian detection using retinanet with multi-branch structure and double pooling attention mechanism. Multimed. Tools Appl., 2024, 83, 6051–6075.
J. Biffi; E. Mitishita; V. Liesenberg; et al. Atss deep learning-based approach to detect apple fruits. Remote Sensing, 2020, 13, 54.
Guo; H. Liu; G. Xie; et al. Weld defect detection from imbalanced radiographic images based on contrast enhancement conditional generative adversarial network and transfer learning. IEEE SENS J., 2021, 21, 10844–10853.
Kumaresan; K. J. Aultrin; S. Kumar; et al. Deep learning-based weld defect classification using vgg16 transfer learning adaptive fine-tuning. INT J INTERACT DES M., 2023, 17, 2999–3010.
Ji, H. Wang; H. Li. Defects detection in weld joints based on visual attention and deep learning. NDT & E Int., 2023, 133, 102764.
Liu; Y. Chen; J. Xie; et al. Lf-yolo: A lighter and faster yolo for weld defect detection of x-ray image. IEEE SENS J., 2023, 23, 7430–7439.
Pan; H. Hu; P. Gu. Wd-yolo: A more accurate yolo for defect detection in weld x-ray images. Sensors, 2023, 23, 8677.
Yang; S. Xu; J. Fan; et al. A pixel-level deep segmentation network for automatic defect detection. EXPERT SYST APPL., 2023, 215, 119388.
Zuo; J. Liu; M. Fu; et al. An efficient anchor-free defect detector with dynamic receptive field and task alignment. IEEE T IND INFORM., 2024, 20, 6, 8536 – 8547.
Zuo; J. Liu; M. Fu; et al. Stma-net: A spatial transformation-based multi-scale attention network for complex defect detection with x-ray images. IEEE TIM., 2024, 73, 5014511.
Gu; Z.Q. Xie; X.Y. Zhang. Weld Defect Detection based on Improved Deep Learning. Journal OF Astronautic Metrology and Measurement, 2020, 40, 75-79.
Guo; Y. Qian; Y.F. Shi. Real-time railroad track components inspection based on the improved YOLOv4 framework. Automat. Constr, 2021, 125, 103596.

The reported improvements in metrics are marginal and could result from overfitting rather than improvement of the detection algorithm.

Response 5: The results indicate that the recall rates for pores and slag inclusion have significantly improved in both optimized models, attributed to their enhanced ability to learn edge information. In the future, we will continue focusing on designing an advanced intelligent detection system for aluminum alloy weld defects, aimed at improving the safety and automation of equipment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

My advice to add YOLOv7 model into results.

Author Response

Comments 1: My advice to add YOLOv7 model into results.

Response 1: Welding porosity in aluminum alloys is considered the most detrimental welding defect. Its adverse effects are primarily manifested in several critical areas: a reduction in mechanical properties, the induction of welding cracks, diminished fatigue performance, and compromised sealing integrity and corrosion resistance. Consequently, it is expected that the accuracy rates for detecting pores should not be lower than 90%. However, experimental results demonstrate that the accuracy rates for detecting pores using the YOLOv7, YOLOv6, and RT-DETR models decreased to 81.7%, 87.5%, and 87.4%, respectively, compared to the original YOLOv4 model's accuracy of 97.28%. Table 1 summarizes the results in terms of accuracy of different models.

We would prefer not to include the suboptimal data from the YOLOv7, YOLOv6, and RT-DETR models in Table 1. We respectfully request that the reviewer consider our rationale and understand our decision. We are committed to continuing the optimization of the model.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Some points, such as point 1, have still not been addressed in the reviewer comments. The author agrees that this paper has few contributions.

Author Response

Comments 1: Some points, such as point 1, have still not been addressed in the reviewer comments. The author agrees that this paper has few contributions.

Response 1: Welding porosity in aluminum alloys is considered the most detrimental welding defect. Its adverse effects are primarily manifested in several critical areas: a reduction in mechanical properties, the induction of welding cracks, diminished fatigue performance, and compromised sealing integrity and corrosion resistance. Consequently, it is expected that the accuracy rates for detecting pores should not be lower than 90%. However, experimental results demonstrate that the accuracy rates for detecting pores using the YOLOv7, YOLOv6, and RT-DETR models decreased to 81.7%, 87.5%, and 87.4%, respectively, compared to the original YOLOv4 model's accuracy of 97.28%. Additionally, experimental results demonstrate that the recall rates for pore and slag inclusion detection using the YOLOv4-cs1 and YOLOv4-cs2 models increased by 28.9% and 16.6%, and 45% and 25.2%, respectively, compared to the original YOLOv4 model. Table 1 summarizes the results in terms of accuracy of different models.

Our work aims to develop an automated weld defect detection system that can detect weld defects more effectively, rapidly, and accurately. We would prefer not to include the suboptimal data from the YOLOv7, YOLOv6, and RT-DETR models in Table 1. We respectfully request that the reviewer consider our rationale and understand our decision. We are committed to continuing the optimization of the model.

Article Menu

Detection of Welding Defects Tracked by YOLOv4 Algorithm

Further Information

Guidelines

MDPI Initiatives

Follow MDPI