Steel Surface Defect Detection Algorithm Based on Improved YOLOv8n

Zhang, Tian; Pan, Pengfei; Zhang, Jie; Zhang, Xiaochen

doi:10.3390/app14125325

Open AccessArticle

Steel Surface Defect Detection Algorithm Based on Improved YOLOv8n

School of Mechanical Engineering, Shenyang Jianzhu University, Shenyang 110168, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(12), 5325; https://doi.org/10.3390/app14125325

Submission received: 13 May 2024 / Revised: 10 June 2024 / Accepted: 15 June 2024 / Published: 20 June 2024

(This article belongs to the Special Issue Advances in Image Recognition and Processing Technologies)

Download

Browse Figures

Versions Notes

Abstract

:

The traditional detection methods of steel surface defects have some problems, such as a lack of feature extraction ability, sluggish detection speed, and subpar detection performance. In this paper, a YOLOv8-based DDI-YOLO model is suggested for effective steel surface defect detection. First, on the Backbone network, the extended residual module (DWR) is fused with the C2f module to obtain C2f_DWR, and the two-step approach is used to carry out the effective extraction of multiscale contextual information, and then fusing feature maps formed from the multiscale receptive fields to enhance the capacity for feature extraction. Also based on the above, an extended heavy parameter module (DRB) is added to the structure of C2f_DWR to make up for the lack of C2f’s ability to capture small-scale pattern defects between training to enhance the training fluency of the model. Finally, the Inner-IoU loss function is employed to enhance the regression accuracy and training speed of the model. The experimental results show that the detection of DDI-YOLO on the NEU-DET dataset improves the mAP by 2.4%, the accuracy by 3.3%, and the FPS by 59 frames/s compared with the original YOLOv8n.Therefore, this paper’s proposed model has a superior mAP, accuracy, and FPS in identifying surface defects in steel.

Keywords:

defect detection; YOLOv8n; C2f_DWR module; C2f_DRB module; inner-IoU loss function; feature fusion

1. Introduction

Nowadays, our country vigorously develops new industries, vigorously promotes industry, and its information business is a very important asset. In steel production processes, the detection of surface defects is a very critical part, as it is related to the quality of the entire industrial products and production safety aspects. Steel is extensively employed in many fields, including, in the past few years in our country, the rapidly developing aerospace and aviation industry, the oil industry, the automobile industry, the shipbuilding industry, the large-scale equipment industry, and other important industrial areas. Steel surface defects in these important industry areas are very important and have an immediate impact on the caliber of the industrial goods we produce and manufacture out of the finished product quality. If the quality of the finished product is poor, the economy will suffer. Therefore, it is urgent to improve the accuracy of steel surface defect detection.

Current surface defect detection methods fall into one of three categories: deep-learning-based, traditional-technology-based, and machine vision methods. These include manual inspections using magnetic particles, eddy currents, vision, and visible light inspection. Artificial visual inspection consists of checking the appearance or performance defects of the product by visual or tactile means, and the shortcomings of artificial visual inspection are also very obvious, e.g., too subjective, low efficiency, and the influence of external factors. The eddy-current-based inspection method is a non-destructive surface defect detection method. If there are some common defects on the steel surface, these defects will change the distribution of eddy currents and the characteristics of the magnetic field, so that the defects can be identified and localized by detecting changes in the magnetic field. However, when it comes to the detection of more complex defects, eddy current detection has relatively low sensitivity and also needs experienced technicians to interpret and analyze the data; therefore, from another economic point of view, the labor cost is relatively high. Magnetic particle inspection is widely used in welding, manufacturing, foundry, aerospace, and other fields relying on steel production. It can be used to detect cracks, inclusions, porosity, fatigue cracks, and other defects on the steel surface. It has the advantage of being fast, economical, easy to operate, and able to identify items of various sizes and shapes. The disadvantage of magnetic particle inspection is that it is only applicable to materials with magnetic properties, and it does not work for the defect detection of non-magnetic materials. It may be difficult to implement with respect to parts characterized by complex surface shapes.

Machine vision surface defect detection is a method that combines algorithms and vision techniques. The input image is processed by a computer to achieve fast and accurate recognition and localization of defects. The visual light inspection method is a commonly used method for detecting surface defects on steel, such as dents, blemishes, and scratches. It usually involves the use of a light source and optical equipment to observe and record an image of the surface of the material being inspected. The advantage is that it is relatively fast, especially for single-object detection. The disadvantage is that it mainly depends on the subjective judgment and experience of the observer when detecting the surface defects of steel which will be greatly affected by human factors, and the detection accuracy is low for tiny objective or hidden defects. Machine vision defect detection is frequently used to identify and detect defects on the surface of objects. It processes and analyzes the image by computer to achieve fast and accurate recognition and localization of defects.

Compared with the traditional manual visualization technology, defect detection by machine vision includes the advantages of quick speed and good versatility. It can lessen the impact of human factors on the detection results and, at the same time, it can realize automated inspections on a large scale and in relation to continuous production lines, so it is often used for inspection of actual production. Advantages of the automated inspection process include the improvement of production efficiency, the ability to handle a large amount of image data, and the ability to distinguish production lines. However, this method also has a lot of defects, namely the classification of defect types is relatively poor, it requires manual extraction of the characteristics of various defects, and the precision of defect detection is relatively poor. Therefore, deep-learning-based industrial defect detection technologies have a broad prospect in our intelligent manufacturing sector, which needs our continuous exploration and innovation.

The advantage of deep learning is that, by simply inputting specific data, the computer can automatically recognize various data and extract features, avoiding the disadvantage of manually extracting features. Currently, deep-learning-driven target recognition techniques are advancing rapidly. These target detection techniques for deep learning [1] methods are primarily categorized into the following two types. The main features of the one-stage methods are a good performance, a faster speed, and ease of practical application. The main one-stage target detection methods are YOLO [2] and SSD [3]. The two-stage detection methods need to generate the required region in advance and then the network classifies the corresponding region and obtains the prediction information. Typical representatives of two-stage algorithms include U-net [4], R-CNN [5], fast R-CNN [6], RetinaNet [7], faster R-CNN [8], and mask R-CNN [9]. These algorithms first generate the corresponding alternative regions in stage one, followed by classification and bounding box adjustment of these regions in stage two to arrive at the final recognition result. In addition, there are transformer [10,11,12,13] and unsupervised learning [14,15] approaches that have been brought to the computer vision field to solve the related problems in target detection.

There have been a lot of scholars who have studied the application of algorithms deriving from deep learning target detection of steel surface defects. Ren [16] proposed that the information based on the model should incorporate the ECA-SimSPPFSIoU-Yolov5 effective channel attention to enhance the significant weights. Douzhi et al. [17] proposed a small sample steel plate defect detection algorithm based on lightweight YOLOv8 to bypass the difficult-to-apply deep learning approach when it comes to the detection of small sample defects. The LMSRNet network was designed to replace the YOLOv8 backbone, and the CBFPN and ECSA modules were developed to address the issue of the light weight of the model. Guo et al. [18] proposed an algorithm based on the improved MSFT-YOLO model, integrating the TRANS module into the network and data enhancement processing to increase the accuracy of steel defect detection. Cui Kebin et al. [19] proposed MCB-FAH-YOLOv8 to address the problems of defect detection, such as false detection and high leakage rate, by adding the improved convolutional attention mechanism module (CBAM) and the replaceable four-head ASFF predictor, so that the network could enhance the ability to detect tiny objects and dense targets and improve the accuracy at the expense of speed. The above methods are all based on YOLO for steel surface defects.

Zhou Yan et al. [20] proposed a method for detecting steel using multiscale lightweight attention, using a channel attention module for multilevel feature maps to reconstruct their channel-related information and improve the detection effect. Zhu et al. [21] used an efficient swin transformer to detect and classify steel surface defects, an approach which strengthened the connection between the feature mapping channels, reduced the resolution, and solved the image information retention problem. He [22] suggested that an MFN network can effectively improve the ability to obtain information about steel surface defects, and its features are able to integrate information from various levels to detect steel surface defects. A baseline convolutional neural network (CNN) is employed to produce feature maps for subsequent stages. The proposed network (MFN) is capable of fusing multiple features into a single feature and, based on these multilevel features, the region proposal network (RPN) is employed to produce the region of interest (ROI); finally, the baseline network ResNet 34 is used to achieve 74.8% mAP. The above methods have been designed and improved based on transformer and convolutional neural networks, respectively. Although all of the above methods improve the performance of the target detection algorithm to some degree, they still fall short in terms of accuracy and some other aspects. To summarize, we propose the steel surface defect detection algorithm DDI-YOLO, and the contributions made in this essay are the following:

(1) The resolution of the issue of the original C2f module of YOLOv8 having insufficient ability to extract defective features on the steel surface and being unable to extract multiscale contextual information. Therefore, the dilation-wise residual module is added to the original C2f to further enhance the capacity of the network to extract multiscale contextual information.

(2) Since C2f in YOLOv8 cannot detect small-scale patterns during training, the dilated reparam block module is used to enable the detection of defects by C2f on small-scale patterns, thus enhancing the training ability of the model.

(3) The C2f_DWR module and the C2f_DRB module are combined to form the C2f_DWR_DRB module, which blends the benefits of every module to improve the model’s comprehensiveness.

(4) A faster and more accurate convergence by making use of the Inner-IoU loss function to take the place of YOLOv8’s original CIoU loss function.

2. Related Works

2.1. Model Architecture of YOLOv8

YOLOv8 is a model built by Ultraytics based on the success of previous generations of YOLO, including upgrades and new features to further enhance performance and flexibility. YOLOv8 introduces innovations such as a new backbone network, an anchorless detection header, and a new loss function to support multi-platform operations from CPUs to GPUs. According to the model depth and feature map size, YOLOV8 is divided into different versions: YOLOv8-s, YOLOv8-n, YOLOv8-m, YOLOv8-l, and YOLOv8-x, with a total of five versions. The network structure diagram of YOLOv8 can be separated into the following four major parts, namely input, backbone, neck, and head. Among them, input is the input start, which is in charge of adjusting the scaling of the supplied image to the required training size in a certain proportion, and contains operations such as scaling, adjusting the image’s color tone, etc. The backbone is the module used to extract the main information and is made up of a convolution module, C2f, a module (CSPLayer_2Cnov) which replaces the C3 module and the SPPF spatial pyramid pooling module used in YOLOv5 block. The neck module is employed to improve the merging of features of different dimensions through its structure containing an FPN feature pyramid and a path aggregation structure called dual stream FPN, which has the advantage of high efficiency, speed, etc. The head part is similar to that found in the previous YOLOv6 [23] and YOLOX [24]. The decoupled head is used, while the coupled head is used in the previous YOLOv3 [25], YOLOv4 [26], and YOLOv5 [27]. YOLOv8 also uses three output branches, but each output branch is subdivided into two parts which are used to categorize and regress the bounding box, respectively. Considering the model’s detecting capabilities, this experiment used YOLOv8n as the baseline model and improved on it.

2.2. Based on the Improved YOLOv8 Algorithm (DDI-YOLO)

2.2.1. DDI-YOLO

As shown in Figure 1, the structure of the algorithmic model of the improved YOLOv8n consisted of the C2f_DWR and C2f_DRB modules in the backbone to replace the C2f modules in layer 6 and layer 7, with the C2f_DWR_DRB module improving the model’s synthesis ability and incorporating each of the advantages associated with the C2f_DWR and C2_DRB modules into the backbone network. Then, in the neck layer, the C2f module in the twelfth, fifteenth, and twenty-first layers was replaced with the C2f_DRB module, with the use of C2f_DRB compensating for the inability of C2f to detect defects in small-scale patterns and enhancing the training ability of the model. Finally, the Inner-IoU loss function was used instead of YOLOv8n’s loss function CIoU to make the model training faster and more accurate. Shown in Figure 2 is the whole process of detecting defects on steel surface by the DDI-YOLO model.

2.2.2. C2f-DWR Module

The network structure of YOLOv8 contains a large number of C2f modules, the full name of which is “Cross Stage Partial Feature Fusion”, which improve the performance and efficiency of the model by partially fusing features at different stages of the network. Further, the C2f module performs feature fusion at different stages of the network to fully utilize the multilayer feature representation of the network. This feature fusion method helps increase the model’s detection accuracy and cut down on the number of parameters while the model’s computational effectiveness is also enhanced. However, because the defects on the steel surface are relatively large no matter the shape, location, or size differences, especially the cracking-type defects, rolling oxide skin defects, and pitting surface defects, the defects have a large distribution range, the size and shape are not uniform, and the original C2f module of YOLOv8 has insufficient ability to extract the defective features on the steel surface and it cannot extract the multiscale contextual information. Therefore, in this paper, to enhance the network’s capacity of extracting multiscale contextual information, a brand-new module, the C2f-DWR module, is designed. The C2f-DWR structure is characterized by the addition of the DWR module to the original C2f module, a decision which strengthens the extraction of features from the extensible sensory field at the higher level of the network.

The DWR module is fully known as a dilation-wise residual [28], which is also known as a dilation residual module. The DWR module is designed in a residual manner. With the residual, a two-step approach is used to efficiently extract multiscale contextual information and then fuse the feature maps derived from multiscale sensory fields. Specifically, the earlier method of acquiring multiscale context information in a single step is decomposed into a two-step method to reduce the acquisition difficulty.

The network structure of the DWR module, which comprises two branches, is shown in Figure 3. The two branches are implemented as follows:

The first branch generates the associated residual features from the input features of the steel surface defects which are referred to as region residuals. In this branch, a series of concise feature maps in the form of regions with different sizes are generated as material for the next morphological filtering. This step is achieved by a common 3 × 3 convolution paired with a ReLU and a BN layer. The 3 × 3 convolution is employed for initial feature extraction. The ReLU activation function, instead of the commonly used PReLU layer, has a significant impact on the activation of the region features and on their conciseness.

The second branch consists of the morphological filtering of the regional features of steel surface defects using multirate extended depth direction convolution, which becomes the semantic decidualization of steel defects. For every channel feature, just one intended receptive field is applied to avoid, if at all possible, redundant receptive fields. In practical steel surface defect detection, the desired concise region feature map can be judiciously discovered in the initial phase based on the extent of the second step’s receptive field for fast matching with the receptive field.

2.2.3. C2f-DRB Module

The DRB module, known in full as the dilated reparam block [29] extended reprameterization module, utilizes a compact kernel and multiple layers with varying dilation rates to enhance a larger kernel convolutional layer. Its key hyperparameters include the size (k) of the larger kernel, the size (k) of the parallel convolutional layers, and the rate of expansion (r). Figure 4 depicts the scenario with four parallel layers, indicated by k = 9, r = (1, 2, 3, 4), and k = (5, 5, 3, 3). For greater K, we can employ more expansion layers with bigger kernel sizes or expansion rates. The core size and expansion ratio of parallel branches are flexible, the sole restriction being (k − 1) r + 1 < k.

To create a sizable kernel convolution layer for inference from the dilated repararm block, we first merged each BN into the preceding convolution layer, converted each layer with a scaling of r > 1 using function 1, and summed all resultant kernels with appropriate zero padding. The non-dilated large core layer was enhanced by the dilated small kernel curl layer in the dilated reparam block.

From a parametric point of view, such an expanded layer is comparable to a non-expanded convolutional greater sparse kernel layer, thus allowing the entire block to be converted correspondingly into a single big kernel convolution. The use of a parallel small kernel convolution in conjunction with the large kernel convolution is recommended, as the latter facilitates the training process by capturing small-scale patterns. Their outputs are summed after two corresponding batch normalization (BN) layers. After training, the BN layers were merged into the convolution layers using a structural reparameterization method, in order for the large kernel convolution and the tiny kernel convolution to be equivalently combined for inference. In addition to small-scale patterns, augmenting the capacity of the big kernel to detect sparse patterns (i.e., pixels on the feature map that might have a stronger correlation than their neighbors with certain distant pixels) can produce higher-quality features. The need to capture such patterns is perfectly addressed by the mechanism of dilation convolution. From the viewpoint of a sliding window, the input channel is scanned by a dilation convolution layer with a dilation rate of r to identify spatial patterns in which each pixel of interest is r-1 pixels away from its neighboring pixels. As a result, we employed dilated convolutional layers in parallel with the larger kernel and summed their outputs. Since C2f in YOLOv8 lacks the ability to capture small-scale patterns between training sessions, the dilated reparam block module was used to compensate for this shortcoming, thus enhancing the model’s training capability.

2.2.4. Inner-IoU Loss Functions

The loss function for the regression used in YOLOv8 is CIoU [30]. The CIoU loss function considers the entire intersection between target frames and adds a correction factor to more precisely quantify the similarity between target frames. The CIoU loss function has the following advantages: it is more robust relative to different shapes of target frames, it more easily captures the exact shape of the target, and it takes into account several factors such as position, shape, and direction, all of which can help enhance the model’s performance in complex situations. However, with respect to practical applications in the context of steel surface defect detection, it cannot be adapted according to the different detection assignments and its capacity for generalization is weak; therefore, considering the aforementioned shortcomings, the model cannot be applied to the detection of steel surface defects. Based on the above shortcomings, the regression loss function, known as the Inner-IoU loss function, was introduced in this study. The Inner-IoU loss function was proposed by Zhang Hao [31] et al. in 2023. This loss function calculates the IoU loss using an auxiliary bounding box, and a scale factor ratio is introduced to regulate the auxiliary bounding box scale size, which is utilized to determine the loss. To compensate for the shortcomings of the CIoU loss function, Inner-IoU entails the use of an auxiliary bounding box to compute the loss and quicken the process of bounding box regression. The scale factor ratio is introduced in the Inner-IoU and can be used to regulate the scale size of the auxiliary bounding box to overcome the limitation of the weak generalization ability of existing methods. The formula for the calculation of the Inner-IoU is as follows:

b_{l}^{g t} = x_{c}^{g t} - \frac{w^{g t} * r a t i o}{2}, b_{r}^{g t} = x_{c}^{g t} + \frac{w^{g t} * r a t i o}{2}

(1)

b_{t}^{g t} = y_{c}^{g t} - \frac{h^{g t} * r a t i o}{2}, b_{b}^{g t} = y_{c}^{g t} + \frac{h^{g t} * r a t i o}{2}

(2)

where the ground truth (GT) frame and the anchor are identified as

B^{g t}

and

B

respectively, and the centers of the GT frame and the internal GT frame are denoted by the points (

x_{c}^{g t}

,

y_{c}^{g t}

), which also denote the centers of the anchor and of the internal anchor.

b_{l} = x_{c} - \frac{w * r a t i o}{2}, b_{r} = x_{c} + \frac{w * r a t i o}{2}

(3)

b_{t} = y_{c} - \frac{h * r a t i o}{2}, b_{b} = y_{c} + \frac{h * r a t i o}{2}

(4)

\begin{array}{l} i n t e r = (m i n (b_{r}^{g t}, b_{r}) - m a x (b_{l}^{g t}, b_{l})) * \\ (m i n (b_{b}^{g t}, b_{b}) - m a x (b_{t}^{g t}, b_{t})) \end{array}

(5)

\begin{array}{l} u n i o n = (w^{g t} * h^{g t}) * {(r a t i o)}^{2} + \\ (w * h) * {(r a t i o)}^{2} - i n t e r \end{array}

(6)

The width and height of the GT frame are denoted by

w^{g t}

and

h^{g t}

, respectively, and

w

is the width and

h

is the height. The variable “ration” corresponds to the scale factor, which is usually in the value range of [0.5,1.5].

I o U^{i n n e r} = \frac{i n t e r}{u n i o n}

(7)

L_{I n n e r - I o U} = 1 - I o U^{i n n e r}

(8)

The

i n t e r

and

u n i o n

are calculated using the above formula.

3. Results

3.1. Image Dataset

The NEU-DET [32] dataset was utilized in this study, consisting of a steel surface dataset from Northeastern University on which training and validation were performed. This dataset contains 1800 images in total, consisting of six categories of classified defects, and each type of defect is exemplified by 300 images which contain rolled-in scale (RS), patches (Pa), crazing (Cr), scratches (Sc), pitting surfaces (Ps), and inclusions (In), as shown in Figure 5. The dataset images contain 200 × 200 pixels and the total of 1800 images were randomly divided into an 8:1:1 ratio to create NEU-DET training, test, and validation sets, i.e., 1440 training samples, 180 test samples, and 180 validation samples.

3.2. Experimental Environment

The operating system used for the experiments in this paper was Windows 11, the CPU was a 13th Gen Intel(R) Core(TM) i5-13500HX, the GPU used was the NVIDIA GeForce RTX 4060 GPU, and the RAM was 16 GB. The deep learning training architecture used was Pytorch 1.13.1. The specific parameters of the experiment were as follows: the learning rate was 0.01, the image size was 640 × 640, the number of iteration rounds (Epochs) was 200, and the optimizer chosen was SGD.

3.3. Experimental Metrics

The data evaluation metrics used in this study were precision and recall curves P–R (precision–recall curve), the number of parameters x (parameters), the average precision (AP) of each defect category, the average accuracy (mAP), and the frames per second (FPS), with the AP determined by the precision (Precision, P) and the recall (Recall, R). The P–R curve is a curve formed by the coordinate system of test precision and recall, and the area enclosed by the curve is mAP. The related calculation formula is:

P r e c i s i o n = \frac{T P}{(T P + F P)}

(9)

where

T P

is the number of predicted positive samples that are actually positive, i.e., positive samples correctly identified.

F P

is the number of predicted positive samples that are actually negative, i.e., negative samples misreported.

R e c a l l = \frac{T P}{(T P + F N)}

(10)

where

F N

is the number of predicted negative samples that are actually positive, i.e., positive samples missed.

A P = \int_{0}^{1} p (r) d r

(11)

where

A P

is the accuracy of each detection type.

m A P = \frac{1}{n} \sum_{i = 1}^{n} A P_{i}

(12)

where

m A P

denotes the mean of the values in all categories and

n

is the number of categories.

3.4. Analysis of Experimental Results

3.4.1. Ablation Experiments

To demonstrate the enhancement effect of each of our improvements to YOLOv8, we conducted five sets of ablation experiments separately. First, the DWR module was added to the sixth and eighth layers of the C2f structure in the backbone network, and then the DRB module was added outside the second, fourth, sixth, and eighth layers of the C2f structure. The DRB module was also added to the neck and head layers. In addition to that, a combination of DRB and DWR modules were added to the sixth and eighth layers in the C2f structure in the backbone network, and, lastly, the Inner-IoU loss function took the place of the original loss function. Each part of the modules corresponding to the above was added to the YOLOv8 model accordingly, and we report the experimental results of the various combinations of the improved modules. Table 1 shows the outcomes of our ablation studies.

As it can be seen from Table 1, after replacing C2f in the backbone network with C2f_DWR, except for a slight decrease in recall and FPS, the rest of the metrics, such as mAP and precision, improved, with the mAP item experiencing a notable upgrade of 2.5 percentage points which was attributed to the excellent ability of DWR to extract the multiscale contextual information and then fuse the wildly formed by the multiscale fields into the feature maps, with a decrease in the number of parameters and computational effort. After substituting the C2f structure in the main network with the C2f_DRB structure, the DRB reparameterization module used a non-inflated small kernel and multiple inflated small kernel layers to augment a non-inflated large kernel convolutional layer, an approach which greatly reduced the number of parameters and significantly increased mAP and FPS, as well as FPS to 104. This was because DRB compensated for the inability of Cf2 to capture small-scale patterns between training sessions, thus improving the model’s training and detection capabilities. Then, after substituting the loss function of YOLOv8 with the Inner-IoU loss function alone, while not affecting the size of the model parameters, there was a notable increase in accuracy, mAP, and FPS, with a 5.4% increase in accuracy, a 1.5% increase in mAP, and a 9% increase in FPS. Therefore, our improvement not only optimized the speed of the model, but also enhanced the detection of defects by the model, the average accuracy rate, and the speed of defect detection, indicating that our improvement is effective. The fourth set of experiments entailed the DWR module and the DRB module being combined to form a new C2f_DWR_DRB structure and, from Table 1, we can see that, except for the recall rate having declined, all other indexes experienced a great improvement, with the precision rate, mAP, and FPS improving by 2.1%, 4.6%, and 27%, respectively. The most obvious improvement was that experienced by the mAP and FPS indexes, as well as the quantity of calculations and the number of parameters. There was a decrease in the number of calculations and the number of parameters, showing that the C2f_DWR_DRB structure combined the advantages of the two respective modules. Finally, our DDI-YOLO experienced a decrease in recall, but all other metrics remained optimal. In summary, the improvement method of the model put forth in this research works well.

3.4.2. Comparative Experiments

To confirm that our suggested DDI-YOLO modeling algorithm is effective, in this paper, we used the original YOLOv8n as the baseline and tested YOLOv8n and DDI-YOLO using the NEU-DET dataset. Table 2 provides a summary of the findings. The indicator “↑1.8” indicates that for the defect type In, the mAP of DDI-YOLO was 1.8% greater than the value associated with the baseline model YOLOv8n, as well as the other models. As shown in Table 2, the accuracy was improved for the remaining four defect types, except for Cr defects and Ps defects for which the map values experienced a slight decrease. In addition, the AP values of In defects and Rs defects for the original YOLOv8n were only 84.7 and 74.6, respectively, and our model improved the mAP values of these two defects by 1.8% and 10.6%, respectively. Our introduction of the C2f_DWR_DRB module in place of the C2f module enhanced the extraction of features from the scalable receptive fields at the higher level of the network. In addition to this, our method improved the accuracy of Cr defects from 35.1% to 49.7% and of Rs defects from 52.1% to 67.4%, that is, an increase of 14.6% and 15.3%, respectively. Our approach offers a notable enhancement of the detection accuracy and precision of these defects, thus demonstrating the efficacy of our approach.

To further learn about the detection capability of our proposed DDI-YOLO and baseline model YOLOv8n, the P–R curves of the two methods are shown in Figure 6a,b. The region enclosed by our proposed DDI-YOLO was to be greater than the region enclosed by YOLOv8. The NEU-DET dataset’s overall mAP was 78.3%, which was a significant improvement of 2.4 percentage points over YOLOv8n.

To confirm the efficacy of the network improvement proposed in this work, the improved YOLOv8n algorithmic model was contrasted with other algorithmic models for experimentation purposes. In this paper, the classical SSD, YOLOv3-tiny, YOLOv5n, YOLOv6, YOLOv7-tiny [33], YOLOv8n (baseline), and the newer RT-DETR [34] were selected for comparative studies. These experiments were carried out utilizing an identical environment, dataset, and equipment, as shown in the graphs for the outcomes of the studies conducted to compare various algorithms.

Table 3 shows that, compared to several other algorithms, this paper’s algorithm achieved the best performance (78.3% and 158 frames/s) in terms of mAP and FPS values, and was only second to YOLOv8n and YOLOv5n in terms of recall, but performed better than YOLOv8n with respect to all other metrics. This demonstrates that our improved model is superior to other algorithmic models in terms of detection accuracy and detection speed. Regarding the quantity of model parameters, compared to other algorithms, it was only a little higher than that characterizing YOLOv5, which is much smaller than most other algorithmic models. Figure 7 displays the experimental detection outcomes of several different algorithms, and it can be observed from the figure that the other algorithmic models were associated with wrong detections, missed detections, etc., while this paper’s proposed algorithmic model had the best detection effect. To sum up, the improved algorithm presented in this essay not only performed better in terms of precision of measurements, accuracy, and detection speed, but also reduced the amount of arithmetic involved, something which, in turn, improved the detection efficiency. Therefore, the improved algorithm proposed in this work has a rather high generalization value and practicality.

4. Conclusions

In response to the necessity of detecting steel surface defects in actual production, our institute proposes the defect detection algorithm DDI-YOLO. First, the backbone layer of YOLOv8 was improved by proposing the C2f_DWR_DRB module structure, which improved the backbone’s ability to extract features. Second, in the neck structure, the C2f_DRB module was proposed to compensate for the inability of the C2f module in the neck network to capture the defects of small-scale patterns during training, and it improved the model’s training ability. Finally, the Inner-IoU loss function was used to replace the loss function CIoU of the initial YOLOv8n model, leading to a more accurate model and faster training. Based on the experimental results, the following conclusions can be drawn: compared with YOLOv8n, DDI-YOLO enhanced the mAP by 2.4%, the accuracy by 3.3%, and the FPS by 59 frames/s, all of which can satisfy more practical needs. In summary, our proposed algorithmic model can meet the requirements of industrial detection, offers advantages in practical industrial applications, and can be applied to a variety of industrial scenarios. We will assess the suggested method’s effectiveness in actual industrial defect detection in the future and make the necessary improvements based on the actual detection outcomes.

Author Contributions

Writing—review & editing, T.Z., P.P. and J.Z.; supervision, X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China [grant numbers 52075348, 52175107, 52275119], National Key Research and Development Program of China [Grant numbers 2023YFB3408502], Natural Science Foundation of Liaoning Province [grant numbers 2022-MS-280], Shenyang Outstanding Young and Middle-aged Science and Technology Talents Project [Grant numbers RC230739], Basic Scientific Research Project of Liaoning Provincial Department of Education [Grant numbers JYTMS20231568].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: http://faculty.neu.edu.cn/songkechen/zh_CN/zdylm/263270/list/index.htm (accessed on 25 May 2024).

Conflicts of Interest

The authors declare no conflict of interest.

References

Yu, Q.; Wu, Q.; Liu, H. Research on X-Ray Contraband Detection and Overlapping Target Detection Based on Convolutional Network. In Proceedings of the 2022 4th International Conference on Frontiers Technology of Information and Computer (ICFTIC), Qingdao, China, 2–4 December 2022; pp. 736–741. [Google Scholar]
Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Lecture Notes in Computer Science. Springer International Publishing: Cham, Switzerland, 2016; Volume 9905, pp. 21–37, ISBN 978-3-319-46447-3. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Munich, Germany, 5–9 October 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Lecture Notes in Computer Science. Springer International Publishing: Cham, Switzerland, 2015; Volume 9351, pp. 234–241, ISBN 978-3-319-24573-7. [Google Scholar]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Girshick, R. Fast R-Cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-Cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. In Proceedings of the Computer Vision—ECCV 2020, Glasgow, UK, 23–28 August 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M., Eds.; Lecture Notes in Computer Science. Springer International Publishing: Cham, Switzerland, 2020; Volume 12346, pp. 213–229, ISBN 978-3-030-58451-1. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image Is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv 2021, arXiv:2010.11929. [Google Scholar]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
Radford, A.; Metz, L.; Chintala, S. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv 2016, arXiv:1511.06434. [Google Scholar]
Applied Sciences|Free Full-Text|A Feature-Oriented Reconstruction Method for Surface-Defect Detection on Aluminum Profiles. Available online: https://www.mdpi.com/2076-3417/14/1/386 (accessed on 8 June 2024).
Ren, F.; Fei, J.; Li, H.; Doma, B.T. Steel Surface Defect Detection Using Improved Deep Learning Algorithm: ECA-SimSPPF-SIoU-Yolov5. IEEE Access 2024, 12, 32545–32553. [Google Scholar] [CrossRef]
Dou, Z.; Gao, H.; Liu, B.; Chang, F. Small sample steel plate defect detection algorithm of lightweight YOLOv8. Comput. Eng. Appl. 2024, 60, 90–100. [Google Scholar] [CrossRef]
Guo, Z.; Wang, C.; Yang, G.; Huang, Z.; Li, G. Msft-Yolo: Improved Yolov5 Based on Transformer for Detecting Defects of Steel Surface. Sensors 2022, 22, 3467. [Google Scholar] [CrossRef]
Cui, K.; Jiao, J. Steel surface defect detection algorithm based on MCB-FAH-YOLOv8. J. Graph. 2024, 45, 112–125. [Google Scholar]
Zhou, Y.; Meng, J.; Wang, D.; Tang, Y. Steel defect detection based on multi-scale lightweight attention. Control Decis. 2024, 39, 901–909. [Google Scholar] [CrossRef]
Zhu, W.; Zhang, H.; Zhang, C.; Zhu, X.; Guan, Z.; Jia, J. Surface Defect Detection and Classification of Steel Using an Efficient Swin Transformer. Adv. Eng. Inform. 2023, 57, 102061. [Google Scholar] [CrossRef]
He, Y.; Song, K.; Meng, Q.; Yan, Y. An End-to-End Steel Surface Defect Detection Approach via Fusing Multiple Hierarchical Features. IEEE Trans. Instrum. Meas. 2019, 69, 1493–1504. [Google Scholar] [CrossRef]
Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO Series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Jocher, G.; Chaurasia, A.; Stoken, A.; Borovec, J.; Kwon, Y.; Michael, K.; Fang, J.; Yifu, Z.; Wong, C.; Montes, D. Ultralytics/Yolov5: V7. 0-Yolov5 Sota Realtime Instance Segmentation. Zenodo 2022. [Google Scholar] [CrossRef]
Wei, H.; Liu, X.; Xu, S.; Dai, Z.; Dai, Y.; Xu, X. DWRSeg: Rethinking Efficient Acquisition of Multi-Scale Contextual Information for Real-Time Semantic Segmentation. arXiv 2023, arXiv:2212.01173. [Google Scholar]
Ding, X.; Zhang, Y.; Ge, Y.; Zhao, S.; Song, L.; Yue, X.; Shan, Y. UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition. arXiv 2024, arXiv:2311.15599. [Google Scholar]
Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12993–13000. [Google Scholar]
Zhang, H.; Xu, C.; Zhang, S. Inner-IoU: More Effective Intersection over Union Loss with Auxiliary Bounding Box. arXiv 2023, arXiv:2311.02877. [Google Scholar]
Song, K.; Yan, Y. A Noise Robust Method Based on Completed Local Binary Patterns for Hot-Rolled Steel Strip Surface Defects. Appl. Surf. Sci. 2013, 285, 858–864. [Google Scholar] [CrossRef]
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. DETRs Beat YOLOs on Real-Time Object Detection. arXiv 2024, arXiv:2304.08069. [Google Scholar]

Figure 1. DDI-YOLO network diagram.

Figure 2. The overall process of DDI-YOLO model steel defect detection.

Figure 3. Dilation-wise residual module.

Figure 4. Dilated reparam block module.

Figure 5. Schematic diagram of six defect samples.

Figure 6. Comparison of different loss functions.

Figure 7. Comparison of the experimental results.

Table 1. Ablation experiment results.

Model	P%	R%	mAP%	Params/M	GFLOPs/G	FPS (Frames/s)
YOLOv8n	69.2	77.4	75.9	3.01	8.1	99
YOLOv8n + DWR	71.7	76.8	78.4	2.95	8.0	95
YOLOv8n + DRB	68.1	75.6	76.6	2.62	7.4	104
YOLOv8n + Inner-Iou	74.6	72.5	77.4	3.01	8.1	108
YOLOv8n + DWR + DRB	73.8	71.6	78.0	2.66	7.5	126
DDI-YOLO (ours)	72.5	71.4	78.3	2.66	7.5	158

Table 2. The detection performance of DDI-YOLO using the NEU-DET datasets.

Dataset	Methods	Defect Type	P%	R%	mAP%
NEU-DET	YOLOv8n	Cr	35.1	54.7	38.8
		In	78.9	86.2	84.7
		Pa	83.7	85.7	88.9
		Ps	82.9	81.2	83.9
		Rs	52.1	64.0	64.0
		Sc	82.5	92.7	95.1
	DDI-YOLO	Cr	49.7	31.2	38.7 (↓0.1)
		In	78.6	78.4	86.5 (↑1.8)
		Pa	81.1	86.7	90.5 (↑1.6)
		Ps	76.9	75.0	83.7 (↓0.2)
		Rs	67.4	64.7	74.6 (↑10.6)
		Sc	81.4	92.4	95.6 (↑0.5)

Table 3. Comparative experimental results.

Model	P%	R%	mAP%	Params/M	GFLOPs/G	FPS/(Frames/s)
SSD	75.50	61.12	73.03	21.5	31.5	42
YOLOv3-tiny	61.8	70.7	69.0	12.13	18.9	95
YOLOv5n	72.3	73.9	77.7	2.50	7.1	111
YOLOv6n	71.8	70.7	76.4	4.23	11.8	99
YOLOv7-tiny	70.5	61.2	66.8	6.03	13.2	89
YOLOv8n	69.2	77.4	75.9	3.01	8.1	99
RT-DETR	61.4	63.2	66.5	28.5	100.6	44
DDI-YOLO (ours)	72.5	71.4	78.3	2.66	7.5	158

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, T.; Pan, P.; Zhang, J.; Zhang, X. Steel Surface Defect Detection Algorithm Based on Improved YOLOv8n. Appl. Sci. 2024, 14, 5325. https://doi.org/10.3390/app14125325

AMA Style

Zhang T, Pan P, Zhang J, Zhang X. Steel Surface Defect Detection Algorithm Based on Improved YOLOv8n. Applied Sciences. 2024; 14(12):5325. https://doi.org/10.3390/app14125325

Chicago/Turabian Style

Zhang, Tian, Pengfei Pan, Jie Zhang, and Xiaochen Zhang. 2024. "Steel Surface Defect Detection Algorithm Based on Improved YOLOv8n" Applied Sciences 14, no. 12: 5325. https://doi.org/10.3390/app14125325

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Steel Surface Defect Detection Algorithm Based on Improved YOLOv8n

Abstract

1. Introduction

2. Related Works

2.1. Model Architecture of YOLOv8

2.2. Based on the Improved YOLOv8 Algorithm (DDI-YOLO)

2.2.1. DDI-YOLO

2.2.2. C2f-DWR Module

2.2.3. C2f-DRB Module

2.2.4. Inner-IoU Loss Functions

3. Results

3.1. Image Dataset

3.2. Experimental Environment

3.3. Experimental Metrics

3.4. Analysis of Experimental Results

3.4.1. Ablation Experiments

3.4.2. Comparative Experiments

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI