D2-SPDM: Faster R-CNN-Based Defect Detection and Surface Pixel Defect Mapping with Label Enhancement in Steel Manufacturing Processes

Wi, Taewook; Yang, Minyeol; Park, Suyeon; Jeong, Jongpil

doi:10.3390/app14219836

Open AccessArticle

D²-SPDM: Faster R-CNN-Based Defect Detection and Surface Pixel Defect Mapping with Label Enhancement in Steel Manufacturing Processes

¹

Department of Smart Factory Convergence, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon 16419, Republic of Korea

²

Department of Artificial Intelligence Convergence, Sungkyunkwan University, 2066 Seobu-ro, Jangan-gu, Suwon 16419, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(21), 9836; https://doi.org/10.3390/app14219836

Submission received: 22 September 2024 / Revised: 14 October 2024 / Accepted: 23 October 2024 / Published: 28 October 2024

(This article belongs to the Special Issue Deep Learning in Object Detection)

Download

Browse Figures

Versions Notes

Abstract

:

The steel manufacturing process is inherently continuous, meaning that if defects are not effectively detected in the initial stages, they may propagate through subsequent stages, resulting in high costs for corrections in the final product. Therefore, detecting surface defects and obtaining segmentation information is critical in the steel manufacturing industry to ensure product quality and enhance production efficiency. Specifically, segmentation information is essential for accurately understanding the shape and extent of defects, providing the necessary details for subsequent processes to address these defects. However, the time-consuming and costly process of generating segmentation annotations poses a significant barrier to practical industrial applications. This paper proposes a cost-efficient segmentation labeling framework that combines deep learning-based anomaly detection and label enhancement to address these challenges in the steel manufacturing process. Using ResNet-50, defects are classified, and faster region convolutional neural networks (faster R-CNNs) are employed to identify defect types and generate bounding boxes indicating the defect locations. Subsequently, recursive learning is performed using the GrabCut algorithm and the DeepLabv3+ model based on the generated bounding boxes, significantly reducing annotation costs by generating segmentation labels. The proposed framework effectively detects defects and accurately defines them, even in randomly collected images from the steel manufacturing process, contributing to both quality control and cost reduction. This study presents a novel approach for improving the quality of the steel manufacturing process and is expected to enhance overall efficiency in the steel manufacturing industry.

Keywords:

surface defect detection; object detection; computer vision; steel manufacturing; smart factory

1. Introduction

Steel is a mainstay of our daily lives. It is employed in various industries, such as construction, automotive, and aviation, and it often comprises materials directly related to the safety of many people, making quality control in the steel manufacturing process a critical challenge. Steel surface defects can be caused by a variety of factors, including temperature variations during the manufacturing process, pressure imbalances, and the introduction of impurities. These can result in defects such as microscopic cracks, scratches, inclusions, and rolled-in scale on the steel surface. These defects directly affect the physical properties and durability of the product, and if they are not detected early enough in the continuous steel manufacturing process and introduced into downstream processes, they can lead to costly problems and, in severe cases, process accidents [1].

With the advent of Industry 4.0, automation is rapidly proliferating throughout the manufacturing sector. The steel industry is also embracing this shift and is actively leveraging deep learning-based algorithms to enhance the detection of surface defects during production [2]. Deep learning techniques for detecting steel surface defects predominantly fall into three categories: classification, object detection, and object segmentation. For instance, Wang et al. [3] developed a method employing the Hough transform and deep learning to classify surface defects and identify defect types. Liu et al. [4] explored the automated detection of steel strip surface defects using generative adversarial networks (GANs) and one-class classifiers (OCCs).

To precisely locate defects, Li et al. [5] proposed the optimized lightweight network WearNet for surface defect detection, minimizing the parameters of the network layers while maximizing scratch detection performance, achieving 99.67% accuracy on the NEU-DET dataset. In particular, the Grad-CAM technique was utilized to visualize the areas that most influence the prediction, supporting decision-making in classifying the severity of defects. Karangwa et al. [6] employed faster R-CNN and the VGG feature extractor, and surface defects on objects with light-reflecting properties were detected with high accuracy. Specifically, highly reflective surfaces, such as mirrors, were effectively identified by the faster R-CNN model and the approach yielded superior results compared to traditional methods. Previous research has effectively identified approximate defect locations, yet challenges remain in accurately determining the exact shape or size of defects.

Defects in steel products can significantly affect the subsequent processing, depending on their shape and extent. Precise information about the size and location of defects is crucial to facilitate defect follow-up. Therefore, among the three main defect detection techniques—classification, object detection, and segmentation—segmentation remains the most suitable method for analyzing defects on steel surfaces. Precisely identifying the location and size of defects is a valuable approach that can greatly enhance the efficiency of the manufacturing process. Nonetheless, a point to bear in mind with segmentation approaches is that they demand highly precise annotations to differentiate between healthy and defective regions, and gathering such annotation data is not only time-consuming but also costly. In fact, compared to bounding box annotation in object detection, segmentation annotation can be up to 15 times more time-consuming and also more expensive [7], necessitating highly skilled human experts.

To address these issues, this paper proposes an efficient framework that utilizes ResNet-50 to classify images from the steel manufacturing process into defect-free and defective categories, generates bounding box annotations using faster R-CNN on the defective images, and then trains a segmentation model using only the bounding boxes as lower-level labels. This framework provides an effective method for determining the location and size of defects in the steel process at a relatively low cost without the need for expensive and detailed segmentation annotation, and contributes to the appropriate subsequent handling of defects.

This study presents the following three main contributions:

This paper proposes an efficient framework to segment and detect defects effectively from raw images collected in a steel manufacturing process using only bounding box annotations. This contributes to a better understanding of the size and shape of defects necessary for subsequent processes, thereby aiding in cost reduction in defect management.
We compare different networks on fault detection and classification tasks to determine which models perform best on the steel surface defect dataset.
Using steel surface defect data, we explored recursive learning methods to train a segmentation model using only bounding box annotations and GrabCut, and continuously improved the model’s segmentation predictions through an incremental learning process.

This paper is structured as follows. Section 2 discusses steel surface defect detection and deep learning techniques relevant to this study. Section 3 describes the methodology proposed in this work. Section 4 details the data used in this study and the experimental results obtained using the proposed methodology. Section 5 presents the conclusions of this study.

2. Related Work

2.1. Steel Surface Defect Detection

The steel manufacturing process is primarily categorized into integrated and compact processes, with large-scale steel manufacturing plants typically adopting the integrated process. This process involves ironmaking, steelmaking, continuous casting, and hot rolling, as illustrated in Figure 1. The ironmaking process combines iron ore and compounds to produce steel composed of iron and carbon, which is then melted in a furnace. Subsequently, the molten steel undergoes steelmaking to remove impurities before progressing to the continuous casting process. During continuous casting, molds are used to form intermediate shapes, leading to the hard characteristics of steel. Finally, the steel is hot rolled into shapes such as billets, blooms, and slabs, during which various defects may occur, detected by an inline camera for inspection.

The objective of steel surface defect detection is to classify and localize defects within an image. Traditional methods primarily include Sobel [8], scale-invariant feature transform (SIFT) [9], Fourier transform [10], histogram of oriented gradient (HOG) [11], and local binary pattern (LBP) [12], combined with classifiers to differentiate defects from the background. However, these methods are limited in their capacity to extract features from images varying in complexity and scale. In response, recent advancements have seen the adoption of deep learning algorithms for surface defect detection, which offer enhanced detection performance and accuracy despite the complexity of images.

Deep learning models for steel flaw detection are categorized into one-stage and two-stage models based on their structural differences.

One-stage models encompass the You Only Look Once (YOLO) series and the single-shot multi-box detector (SSD). Previous studies have introduced techniques such as multi-feature fusion and adaptive anchor adjustment to enhance the somewhat limited fault detection accuracy of one-stage models. For instance, Liu et al. improved YOLOX by integrating a parallel fusion network structure and a self-adjusting label assignment algorithm, addressing the varied semantic hierarchy of defects in images and dynamic changes during model training, significantly enhancing defect detection accuracy [13]. The RAF-SSD model incorporates the channel attention mechanism of [14] with multi-feature fusion, markedly increasing the detection accuracy for steel surface defects compared to the traditional SSD model and achieving superior accuracy to two-stage models. The model performed well for certain defect types, yet it did not outperform the two-stage model across all defect categories.

These models boast rapid inference speeds, supporting real-time object detection; however, they generally exhibit lower detection accuracy compared to the two-stage model. Zhang et al. [15] enhanced the Faster RCNN network by incorporating multi-scale feature fusion and induced anchor RPNs. This improvement addressed challenges such as micro-defect detection on textured and uneven solar cell surfaces, markedly enhancing the defect detection speed compared to the existing fast R-CNN. To detect defects on steel strip surfaces in real-time, Ren et al. [16] substituted the convolutional layers in the faster R-CNN with depthwise separable convolutions, significantly cutting the inference time per image from 0.2 s to 0.05 s. Nevertheless, this speed is still slower than the YOLO model, which achieves an impressive processing speed of 0.012 s.

To summarize, the one-stage model is well-suited for applications that demand high speed and real-time processing, whereas the two-stage model is better for scenarios that require greater detection accuracy and precision. For this study, accuracy was prioritized to train a segmentation model using only bounding boxes, leading us to opt for the faster R-CNN model for object detection.

Figure 2 shows a flowchart of the proposed framework. Initially, images are resized to 224 × 224 for input into ResNet-50, pre-trained on the comprehensive ImageNet dataset. The trained model performs classification on the dataset, designating images as defect-free or passing defective images to faster R-CNN, which was also pre-trained on the extensive ImageNet and COCO datasets. Faster R-CNN localizes object positions on defect images into bounding boxes. These annotated images are used as initial data for training the segmentation model. Utilizing the GrabCut algorithm, the area inside the bounding box is classified as the foreground while the outside is the background, creating a mask that is input into DeepLabv3+. Finally, a segmentation label delineating the boundary information of the defect is derived. The following sections discuss the techniques and models employed in the proposed methodology.

2.2. ResNet

ResNet was developed by He et al. [17] in 2015 and has significantly contributed to the learning and performance of deep learning models. In deep learning, deeper layers are preferable for learning more complex patterns and high-level features, but this can lead to gradient vanishing or exploding issues during backpropagation. To address these challenges, the authors introduced residual connections to mitigate the gradient vanishing problem in very deep neural networks and to maximize the learning benefits of deep networks. ResNet has shown that performance can improve continually as network depth increases, offering various depth versions, including 18, 32, 50, 101, and 152. The skip connection, also known as a residual connection, is a mechanism in ResNet that adds the input of a specific layer directly to its output. This mechanism is expressed by the following equation:

y = f (x) + x

(1)

The input x refers to either the initial value or the output of the previous layer, while

f (x)

represents the transformation applied to x by the layer, with y indicating the final output value. In ResNet, skip connections enable the direct transfer of the input x to the next layer, thus preserving essential information that may be lost during training. This structure allows the network to learn new features

F (x)

while maintaining the original input x, thereby preventing issues like vanishing gradients during backpropagation and ensuring stable training. It also reduces redundant computations, aiding in faster convergence of the network. If the residual is zero, it simply performs the identity mapping of x, passing the input unchanged.

Unlike ResNet-18 and ResNet-34, which utilize a basic block structure, the ResNet-50 model and beyond utilize a bottleneck block to mitigate gradient issues as network depth increases. Figure 3 depicts the distinctions between basic and bottleneck blocks and describes the formula for skip connections based on network depth. The basic block in (a) consists of two 3 × 3 convolution layers with a skip connection, whereas the bottleneck block in (b) includes three convolutional layers. The first 1 × 1 convolution layer reduces the number of input channels, thereby reducing computational cost, while the 3 × 3 convolution preserves the spatial dimensions of the input, performing critical feature extraction. The second 1 × 1 convolution restores the dimensions to their original size. The formula for the bottleneck block structure is presented below.

y = W_{3} (W_{2} (W_{1} x))

(2)

W_{1}

,

W_{2}

, and

W_{3}

represent 1 × 1, 3 × 3, and 1 × 1 convolution operations, respectively.

Figure 3. ResNet block diagram. (a) Basic block for ResNet-18, ResNet-34 (b) Bottleneck block for ResNet-50, ResNet-101, and ResNet-152.

2.3. Faster R-CNN

The R-CNN model was the first to apply CNNs to object detection. Traditional object detection models showed limited improvement in performance through minor modifications and combinations, necessitating a new approach to solve the object detection challenge. R-CNN leveraged the CNN model, which had shown superior performance in image classification, to object detection by splitting the task into classification and localization, which determines the object’s coordinates. It employs a selective search for region proposals to identify potential areas containing objects within an image, reducing randomness and proposing regions of interest (ROIs), enabling more efficient processing. With the powerful feature extraction capabilities of CNNs, the model can learn the diverse and complex features of the input images. After undergoing processes like SVM (support vector machine) and bounding box regression for box adjustments, the R-CNN model achieved a 30% higher performance than previous state-of-the-art models based on PASCAL VOC 2012 [18].

R-CNN introduced a scalable yet straightforward object detection algorithm that breathed new life into the previously stagnant area of object detection. However, it faced significant challenges, including prolonged training times due to having to feed thousands of ROIs one by one into the CNN, and reduced accuracy because the SVM and regression models were trained separately within the architecture. To tackle these issues, fast R-CNN was developed [19]. Fast R-CNN processes the entire original image through the CNN to generate high-resolution feature maps in a single pass, thus eliminating the need to input individual ROIs sequentially. It reduces redundant computations and decreases overall processing time through ROI Pooling. Furthermore, employing multi-task loss to concurrently train the object classifier and the regressor simplifies the training process and enhances computational efficiency. Fast R-CNN is over nine times faster in training than R-CNN and boasts improvements such as shared weights. Nonetheless, it still encountered limitations due to the region proposal method being executed on the CPU, necessitating a new algorithm. This led to the development of faster R-CNN, which markedly increased object detection speeds by integrating RPN with object detection [20]. Unlike previous R-CNN models, which used fixed-size region proposals, faster R-CNN utilizes a region proposal network to dynamically generate proposal regions.

In this study, we selected the faster R-CNN model for our experiments because it can accurately localize defects within the image while maintaining a reasonable processing speed, using only bounding box labels to generate segmentation results.

2.4. Weakly Supervised Learning

Image segmentation is a technique introduced to distinguish between objects and backgrounds or delineate boundaries within an image [21]. It plays a vital role in computer vision and finds applications across various fields. Traditionally, segmentation methods have primarily utilized region-based approaches [22]. These methods typically segment images based on regional similarities. Key techniques include thresholding, which determines a threshold by analyzing pixel value distributions to binarize images for segmentation; region growing, which begins with initial seed pixels and expands to include neighboring pixels sharing similar characteristics; and split and merge, which divides the image into uniform regions and then combines adjacent regions exhibiting similar properties.

Various deep learning-based methods have been applied in the field of image segmentation. For example, fully convolutional networks (FCNs) [23] are crafted to execute pixel-level classification by altering the traditional CNN architecture; they replace fully connected layers with convolutional ones to preserve the input image’s resolution while assessing class probabilities for each pixel. U-Net [24], characterized by its U-shaped architecture, proves especially effective for medical imaging tasks due to its symmetrical encoder-decoder structure, which facilitates precise pixel-wise segmentation by concurrently capturing fine-grained details and high-level characteristics. Additionally, DeepLab [25] incorporates Atrous Convolution and Pyramid Pooling to apprehend features at various scales, while maintaining the spatial resolution, rendering it highly effective in segmenting complex scenes where distinguishing between objects and backgrounds poses a challenge. These deep learning models enable accurate pixel-wise segmentation, evidencing high performance across diverse applications such as medical image analysis, autonomous driving, and object recognition. Contrary to traditional image segmentation techniques, weakly supervised segmentation (WSSS) can effectively segment images without requiring precise pixel-level labeling. WSSS utilizes limited label information (e.g., image-level labels or bounding boxes) to conduct segmentation, thus reducing the costs and time associated with data labeling while potentially enhancing performance. Since it does not depend on pixel-level labels, WSSS is particularly valuable in industrial settings where the prevalence of large-scale unlabeled image data is common, offering a promising approach for training models on such datasets.

Chen et al. [26] utilize class activation maps (CAMs), which generate an initial segmentation mask based on the feature maps from the final convolutional layer of the image classification model and the class-specific learned weights. They then refine this initial mask through a method termed re-activation maps for weakly supervised semantic segmentation (ReCAM). Wei et al. [27] used saliency maps to train an initial segmentation model and progressively enhanced the model’s performance by re-training it with the predicted results of previous models on increasingly complex images.

Dai et al. [28] introduced BoxSup, a novel method leveraging bounding box annotations rather than segmentation masks to produce segmentation outcomes. This approach iteratively refines segmentation masks utilizing region proposals and CNN training, progressively enhancing network performance. Similarly, Khoreva et al. [29] utilized a recursive training methodology based on DeepLabv2, incorporating advanced GrabCut+, dense-CRF, and MCG post-processing techniques. They achieved roughly 95% of the results of fully supervised methods, using only box annotations on the VOC12 and COCO datasets.

This study employs a WSSS technique that uses bounding boxes and iterative learning to gradually enhance the labeling process [30]. In this paper, we introduce a process that integrates faster R-CNN with DeepLabv3+, generates bounding boxes for surface defects on steel, and improves segmentation labels based on these boxes. This strategy provides more precise defect location and size information in the steel manufacturing process, thereby enabling more accurate corrective measures and boosting quality control efficiency.

3. $D^{2}$ -SPDM: Faster R-CNN-Based Defect Detection and Classification with Label Enhancement

3.1. System Structure

This section offers an overview of the

D^{2}

-SPDM framework, designed for efficient automated detection of steel surface defects, with a comprehensive discussion of defect image classification and segmentation label generation. Figure 4 depicts the structure of the

D^{2}

-SPDM framework described in this paper. The framework is structured into two stages: the first involves data collection and defect detection, and the second, defect classification and label enhancement.

Initially, in stage 1: the defect detection process, defect images from the NEU-DET dataset, together with defect-free images created through interpolation and image augmentation, are fed into the ResNet-50 model for detecting defects on steel surfaces. This simulates an instance where images randomly obtained from a real hot-rolling process are classified based on the presence or absence of defects. Images identified as defect-free by the ResNet-50 classifier are stored in a defect-free database, whereas those identified as defective are moved to Stage 2.

In Stage 2: SPDM, the faster R-CNN model generates bounding boxes to capture the local information of defects, and these bounding boxes are input to the DeepLabv3+ model. These bounding boxes serve as initial mask data for the DeepLabv3+ model. To convert these bounding boxes into initial mask data, the work applies the GrabCut algorithm, which designates 80% of the bounding box’s interior as foreground and the remainder as background. This method produces an initial segmentation mask that closely adheres to the object’s boundaries. The model uses this generated mask as the ground truth for the first training round. Through iterative learning, the segmentation results from each round are used as input for the subsequent rounds, gradually enhancing the model’s performance.

The proposed framework integrates the SPDM process by detecting defects at an early stage during the hot-rolling stage of the steel manufacturing process and generating segmentation masks based on bounding boxes created for defect images. Providing such precise, pixel-level defect information allows process managers to analyze defect shapes and sizes more accurately, thereby facilitating more informed decisions regarding appropriate corrective actions.

3.2. Defect Detection Process

The ResNet-50 model used in the Defect Detection process is pre-trained on large-scale datasets such as ImageNet, which consists of approximately 14 million images spanning 1000 class labels, and the COCO dataset, which contains over 2 million object instances across 80 classes. Training a model from scratch on such extensive datasets demands significant resources and time. However, by employing a pre-trained model through transfer learning, only essential parts are fine-tuned, greatly reducing training time and improving initial model performance.

Figure 5 illustrates the network architecture of ResNet-50. ResNet-50 is a deep neural network with 50 layers, organized into five stages, each comprising multiple residual blocks. Considering the network’s depth of 50 layers, it uses a bottleneck block structure with a 1 × 1, 3 × 3, and 1 × 1 layer configuration to address gradient vanishing issues. In Stage 1, the model employs a 7 × 7 convolution filter to extract features from the input, utilizing 64 filters with a stride of 2 for convolution operations. In Stage 2, three layers with kernel sizes of 64, 64, and 256 form a bottleneck structure, and a total of three blocks are utilized in convolution operations.

Subsequent stages progressively double the number of feature maps, capturing increasingly complex and abstract representations. To manage computational complexity, the input size is halved, enabling the network to process higher-level features. Curves between each block denote skip connections. Dashed lines indicate skip connections implemented when input and output dimensions differ, requiring a transformation before addition. Notably, this transformation occurs at the beginning of blocks when the stride changes, as indicated by the dashed lines. Following convolution operations up to Stage 5, the network employs global average pooling and concludes with a fully connected layer that outputs predictions for six defect classes.

3.3. Surface Pixel Defect Mapping

Figure 6 depicts the structure of the SPDM framework, which comprises two main components as shown: object detection and classification, as well as label enhancement.

Faster R-CNN is a representative two-stage object detection model comprising two primary modules: the region proposal network (RPN) and the classifier. These modules share convolutional layers in an end-to-end style. Initially, the input image is resized to a fixed dimension of 600 × 800 and fed into faster R-CNN. The backbone network, ResNet-50, extracts feature maps from the image, which are then transferred to the RPN. The RPN generates anchor boxes in various sizes and aspect ratios at each feature map position. The network assigns each anchor box an objectness score to indicate whether it contains an object and performs bounding box regression to refine the anchor box coordinates to better fit the object’s location. The following equations describe the operations for calculating objectness score and bounding box regression in faster R-CNN.

p_{obj} = \frac{e^{s_{obj}}}{e^{s_{obj}} + e^{s_{bg}}}

(3)

Here,

p_{obj}

denotes the probability that an object is present within the anchor box,

s_{obj}

is the score for the object class, and

s_{bg}

is the score for the background class.

\begin{matrix} t_{x} & = \frac{x - x_{a}}{w_{a}} \\ t_{y} & = \frac{y - y_{a}}{h_{a}} \\ t_{w} & = log (\frac{w}{w_{a}}) \\ t_{h} & = log (\frac{h}{h_{a}}) \end{matrix}

(4)

Bounding box regression learns the transformation relationship between the anchor box and the ground truth box. Here,

t_{x}

and

t_{y}

represent the offset transformations for the x and y coordinates, whereas

t_{w}

and

t_{h}

represent the offset transformations for the width and height, respectively. The coordinates (x, y) indicate the center of the actual bounding box, and (w, h) represent the width and height of the bounding box. The coordinates (

x_{a}

,

y_{a}

) refer to the center of the anchor box, and (

w_{a}

,

h_{a}

) to its width and height. After generating multiple potential bounding box predictions, the highest-scoring proposal boxes are selected. Non-maximum suppression (NMS) is then applied to remove overlapping boxes. The chosen proposal boxes are resized using ROI pooling to produce a uniform feature map. These feature vectors are input into fully connected layers, and ultimately, the softmax probabilities for each defect object class and background class, along with the refined bounding box locations, are output using the following equations.

\begin{matrix} \hat{x} & = t_{x} \cdot w_{a} + x_{a} \\ \hat{y} & = t_{y} \cdot h_{a} + y_{a} \\ \hat{w} & = exp (t_{w}) \cdot w_{a} \\ \hat{h} & = exp (t_{h}) \cdot h_{a} \end{matrix}

(5)

(

\hat{x}

,

\hat{y}

) represent the center coordinates of the transformed bounding box, while (

\hat{w}

and

\hat{h}

) denote the width and height of the transformed bounding box. The classification loss is calculated using cross-entropy loss, and the regression loss is computed through Smooth L1 Loss. The model is updated via backpropagation using the total loss, which is the sum of these two losses, as depicted in the equation below:

L_{total} = L_{cls} + λ \sum_{i \in {x, y, w, h}} L_{reg} (t_{i}, t_{i}^{*})

(6)

L_{reg} (t_{i}, t_{i}^{*})

represents the regression loss, where

t_{i}

denotes the predicted transformation parameter and

t_{i}^{*}

indicates the ground truth transformation parameter.

This section outlines the conversion of the bounding box generated by faster R-CNN into refined segmentation labels. The ‘bounding box to segmentation’ process employs the GrabCut technique to isolate the object from the background. All pixels within the given bounding box are classified as belonging to a single class, while pixels outside the box are considered background. This initial mask is processed by DeepLabv3+, using the results from each training round as input for the subsequent round. DeepLabv3+’s ASPP deploys filters of various sizes to analyze the image in parallel, enabling the model to capture details at multiple scales. Through iterative recursive training, the network progressively learns to delineate more precise object boundaries.

4. Experiment and Results

4.1. Dataset

To validate the proposed methodology, this study utilizes the NEU-DET dataset, provided by the research lab at Northeast University, USA. This dataset comprises images depicting various types of manufacturing defects on steel surfaces during the production process [31]. It is categorized into six defect types: inclusion, scratches, pitted surface, patches, crazing, and rolled-in-scale, each represented by 300 grayscale images of 200 × 200 pixels, totaling 1800 images. The complex characteristics of these defects, varying in size and appearance, make them challenging to identify and locate accurately through manual inspection. Figure 7 presents sample images of the six defect types included in the dataset. The NEU-DET dataset is accessible at https://www.kaggle.com/datasets/kaustubhdikshit/neu-surface-defect-database (accessed on 5 May 2024).

Table 1 presents the dataset composition used for defect detection in stage 1. This dataset also includes images without defects, generated by replacing defective areas in defect images with normal background pixels using interpolation methods. To simulate various factors that may occur in real production processes, augmentation techniques such as adjustments in resolution, lighting conditions, and random vertical and horizontal flips were applied. The resulting dataset comprises 1440 images for training and 360 images for testing, with the validation set created by separating 20% of the training set.

Table 2 shows the dataset composition used for object detection in Stage 2. The dataset, consisting of 300 images for each defect type, was divided into training and test sets at an 8:2 ratio. The validation set was formed by taking 20% of the training set.

The experiments were conducted on a Windows 11 operating system using the PyTorch ver. 3.10.12 deep learning framework. As demonstrated in Table 3, the hardware setup included an Intel Core i7 CPU, 32 GB RAM, and an NVIDIA L4 GPU.

4.2. Performance Metrics

The performance metrics used for the experiments include precision, recall, average precision (AP), and mean average precision (mAP). Their equations and descriptions are as follows:

Precision = \frac{TP}{TP + FP}

(7)

Precision is represented mathematically as shown in Equation (7), serving as an indicator of the accuracy of predictions. Specifically, it quantifies the proportion of accurately identified instances among all predictions made, making it crucial for assessing the reliability of detection results, particularly the true defect identification.

Recall = \frac{T P}{T P + F N}

(8)

Recall is represented mathematically as shown in Equation (8), indicating the proportion of actual positive instances successfully identified by the model. Essentially, recall measures the model’s ability to accurately detect all relevant instances that should have been identified.

AP quantifies how consistently a model maintains high precision at varying recall levels. Defined as the area under the precision–recall curve, with recall on the x-axis and precision on the y-axis, a larger area suggests superior model performance by demonstrating the model’s capability to sustain high precision even with increasing recall. In essence, a high AP value implies that the model effectively identifies true positives without accruing false positives while trying to capture all relevant instances.

mAP = \frac{1}{N} \sum_{i = 1}^{N} {AP}_{i}

(9)

The mAP represents the average AP values calculated for each class. N denotes the total number of classes, and

A P_{i}

refers to the Average Precision value for the i-th class. A higher mAP value indicates effective model performance across various categories, achieving high precision and recall for each class.

4.3. Results

4.3.1. Stage 1: Defect Detection

This study constructed a binary classification model for defect detection in stage 1, utilizing the transfer learning technique to classify defects in steel images. ResNet-50, ResNet-101, and ResNet-152, all pre-trained on the ImageNet database, served as base models for comparative experiments to ensure consistent defect distribution with the original dataset. Table 4 displays the hyperparameters for each model used in the experiments.

Figure 8 presents the training and validation loss curves over 30 epochs for model1, one of the three binary classification models referenced in Table 4. For the loss function, BCELoss was employed as the loss function in addressing the binary classification task, and the model weights were updated using the Adam optimizer with a learning rate of 0.0001. A StepLR scheduler reduced the learning rate by 0.2 every 10 epochs. The batch size varied between 16 and 32 during experiments, although this change did not significantly impact results. The training loss curve of model1 quickly declined in the initial 10 epochs, then stabilized and converged to nearly zero. The validation loss curve also showed a consistent decrease, indicating effective performance on both training and validation datasets.

Figure 9 illustrates the training and validation loss curves for model2 over 30 epochs. The curves reveal a sharp decrease in both training and validation losses during the first 5 epochs, approaching zero. However, the curves exhibit a sudden spike in loss values afterward, indicating potential optimization issues due to overfitting or learning instability in specific data patterns.

Figure 10 displays the training and validation loss curves for model3. Although the learning rate of model3 was slower in the initial epochs compared to the others, it stabilized and converged rapidly after a certain point. Examination of all three models’ loss curves reveals that each model’s training loss approaches near-zero values after specific epochs. However, except for ResNet-50, the remaining models exhibited unstable loss fluctuations during training, reflecting differences in their generalization abilities. Confusion matrices were used to provide a more intuitive visualization of each model’s performance on defect and non-defect data.

Figure 11 presents the confusion matrices for the three binary classification models described in Table 4. These matrices provide a clear visualization of the performance of each model on the test dataset, which includes 720 samples. Each matrix displays results for two classes: “No Defect” and “Defect.” The rows indicate the true labels, while the columns show the predicted labels. All three models effectively differentiated between defect and no defect samples. However, ResNet-101 and ResNet-152 each had one or two misclassified samples, whereas ResNet-50 achieved perfect predictions across all samples, demonstrating its superior performance and potential for defect detection in binary classification tasks.

4.3.2. Stage 2: Surface Pixel Defect Mapping

Figure 12 illustrates the results of the faster R-CNN object detection model for detecting six different types of defects. The NMS threshold was set at 0.5, and only bounding boxes with a confidence score over 0.7 were displayed. Each pair of images includes a left column showing the ground truth bounding box labels and a right column displaying the model’s predictions. Labels (a) to (f) denote the defect types as follows: crazing, inclusion, patches, pitted surface, rolled-in scale, and scratches. For (a), the difference in pixel appearance between defect and normal conditions is minimal, making it the most challenging prediction. The results also reveal that the model detected the least stable bounding box coordinates for this type, suggesting difficulty in distinguishing this defect. In the case of (b), the model identified defects that were not originally labeled, indicating that the original data might have been inaccurately labeled yet showing the model’s ability to detect even subtle defects. For the other defect types, the model displayed stable detection performance, especially for types (c) and (f), where it accurately identified multiple objects, showcasing strong detection capabilities across various defect scenarios.

In the surface pixel defect mapping process, the performance of faster R-CNN combined with various backbone networks was assessed using precision, recall, AP, and mAP. Table 5 summarizes the defect detection performance of faster R-CNN based on three different backbone networks: VGG-16, Inception-V2, and ResNet-50. This table presents the AP values for each defect type, and the last column shows the mAP for each model. According to the results, Inception-V2 exhibited slightly higher AP values for patches and pitted surface defects. However, ResNet-50 achieved the highest AP across most classes and had the highest mAP, particularly excelling in detecting rolled-in scale and scratch defects, outperforming other backbone networks. The ResNet-50-based model recorded a peak mAP of 0.766, establishing it as the most reliable backbone network for defect detection.

Based on the results presented in Table 6, a comprehensive comparison of faster R-CNN, SSD, and YOLOv4 on the NEU-DET dataset is conducted, with a focus on detection performance (mAP), speed (FPS), and computational complexity (GFLOPs). The analysis highlights the trade-offs between accuracy and efficiency across the different stages of models. The one-stage models, SSD and YOLOv4, exhibit relatively higher processing speeds and lower computational costs compared to the two-stage model, faster R-CNN. However, their detection performance is significantly inferior. This trade-off between speed and accuracy is often a key factor in choosing models for real-time applications. Faster R-CNN is better suited for such tasks where the accuracy of the bounding box determines the segmentation, as it excels in accurately identifying and localizing objects. This capability is particularly important in scenarios requiring detailed analysis of object boundaries, where even slight inaccuracies in bounding box placement could lead to significant performance degradation.

Table 7 presents a comparison of Precision, Recall, and AP for steel surface defects using the faster R-CNN model with the ResNet-50 backbone, which showed the best results among the three models. The AP value for the crazing defect was the lowest at 0.6528, reflecting the difficulty in identifying this defect, which often closely resembles normal pixels. Consequently, the model struggles to distinguish this defect clearly, resulting in lower precision and recall. Future research could improve detection by implementing augmentation techniques to enhance defect patterns or using multi-scale approaches to better discern the shape, size, and brightness of defects.

The bounding boxes detected by faster R-CNN were refined for weakly supervised segmentation by excluding certain regions within the boxes and the GrabCut algorithm was applied. The GrabCut algorithm is one of the image segmentation techniques that separates the foreground and background based on minimal input provided by the user. To create an initial mask, the bounding box is first reduced to 80% of its original size, with the center of the box serving as the reference, to make the boundary closer to the object. Then, using the GrabCut algorithm, a binary mask is generated by marking the interior of the shrunken bounding box as the foreground and the exterior as the background. As a result, an initial mask is generated, as shown in Figure 13c, which is then fed into the DeepLabv3+ model for inference. The DeepLabv3+ model used in this process was pre-trained on large-scale segmentation datasets such as PASCAL VOC and Cityscapes, which are specifically designed for semantic segmentation tasks.

Subsequently, the initial mask is input into the DeepLabv3+ model, which uses ResNet-101 as the backbone network. Recursive learning was used to train the segmentation model across 10 rounds, with 5 epochs per round. The model is trained on both the shrunken bounding box masks and the original image. In each round, the predicted mask is compared with the ground truth from the previous round. If the predicted mask covers a larger area, the previous ground truth is retained. This approach starts from the second round, with the mask from the shrunken bounding box used as the initial ground truth in the first round.

This approach ensures that the model progressively refines its predictions and prevents over-prediction by mitigating the accumulation of incorrect predictions.

Figure 13 illustrates the step-by-step outcomes from the bounding box to the segmentation process; (a) shows the original image from the dataset, (b) displays the ground truth bounding box labeled image, (c) represents the initial mask image from GrabCut with the applied shrunken bounding box, (d) shows the outcome after 5 rounds of training, and (e) presents the results after 10 rounds of training. The visualization demonstrates that, starting from the initial bounding box-based mask, the defect boundaries become progressively more refined through recursive learning. A comparison of images (d) and (e) in the second row indicates a reduction in noise over the rounds. Although the final round did not capture all defect pixels perfectly, the segmentation mask delineates the defect’s shape and extent more accurately compared to the bounding box annotation.

5. Conclusions

This study proposed a framework to efficiently detect defects in steel manufacturing and conduct detailed analysis using label enhancement techniques. The approach begins with binary classification using ResNet-50, followed by the generation of bounding boxes through faster R-CNN. Segmentation is then performed based on these bounding boxes, accurately identifying the location and size of defects. The experimental results show that the proposed framework effectively detects surface defects and has the potential to replace traditional, costly annotation processes.

However, there are some limitations to this study. First, the performance of object detection for defect types such as pitted surfaces and crazing, which have fine sizes and complex patterns, still requires improvement. These defects necessitate additional feature extraction and multi-scale analysis to enhance detection accuracy. Second, the segmentation results are highly dependent on the quality of the initial mask generated from the bounding box, which may pose challenges in detecting small or complex-shaped defects. Third, the study does not fully represent the diverse defect patterns and environmental factors relevant to real-time applications in the actual manufacturing process, highlighting the need for further validation in industrial settings.

Future research could concentrate on improving recognition capabilities for more complex defect patterns by integrating multi-scale and multi-feature deep learning models or by developing lightweight models for real-time defect detection and segmentation. Additionally, applying the proposed framework to actual steel manufacturing processes to assess its performance under diverse conditions and further optimizing it for practical process improvements would be valuable.

This study introduces an effective framework for detecting defects and conducting precise analysis in the steel manufacturing process. Further research could improve this methodology, thereby enhancing quality control and process efficiency in the steel manufacturing industry.

Author Contributions

Conceptualization, T.W.; methodology, T.W.; software, T.W.; validation, T.W.; formal analysis, T.W. and M.Y.; investigation, T.W. and S.P.; resources, T.W.; data curation, T.W.; writing—original draft preparation, T.W.; writing—review and editing, J.J.; visualization, T.W. and M.Y.; supervision, J.J.; project administration, T.W.; funding acquisition, T.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by ICT Creative Consilience Program through the Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (RS-2020-II201821).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The NEU Surface Defect Database presented in this study is available in the following repositories: https://www.kaggle.com/datasets/kaustubhdikshit/neu-surface-defect-database (accessed on 5 May 2024).

Acknowledgments

This work was supported by ICT Creative Consilience Program through the Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (RS-2020-II201821).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Luo, Q.; He, Y. A cost-effective and automatic surface defect inspection system for hot-rolled flat steel. Robot. Comput.-Integr. Manuf. 2016, 38, 16–30. [Google Scholar] [CrossRef]
Božič, J.; Tabernik, D.; Skočaj, D. Mixed supervision for surface-defect detection: From weakly to fully supervised learning. Comput. Ind. 2021, 129, 103459. [Google Scholar] [CrossRef]
Wang, J.; Fu, P.; Gao, R.X. Machine vision intelligence for product defect inspection based on deep learning and Hough transform. J. Manuf. Syst. 2019, 51, 52–60. [Google Scholar] [CrossRef]
Liu, K.; Li, A.; Wen, X.; Chen, H.; Yang, P. Steel surface defect detection using GAN and one-class classifier. In Proceedings of the 2019 25th International Conference on Automation and Computing (ICAC), Lancaster, UK, 5–7 September 2019; pp. 1–6. [Google Scholar]
Li, W.; Zhang, L.; Wu, C.; Cui, Z.; Niu, C. A new lightweight deep neural network for surface scratch detection. Int. J. Adv. Manuf. Technol. 2022, 123, 1999–2015. [Google Scholar] [CrossRef] [PubMed]
Karangwa, J.; Kong, L.; Yi, D.; Zheng, J. Automatic optical inspection platform for real-time surface defects detection on plane optical components based on semantic segmentation. Appl. Opt. 2021, 60, 5496–5506. [Google Scholar] [CrossRef] [PubMed]
Zhang, F.; Gu, C.; Zhang, C.; Dai, Y. Complementary patch for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 7242–7251. [Google Scholar]
Borselli, A.; Colla, V.; Vannucci, M.; Veroli, M. A fuzzy inference system applied to defect detection in flat steel production. In Proceedings of the International Conference on Fuzzy Systems, Barcelona, Spain, 18–23 July 2010; pp. 1–6. [Google Scholar]
Weimer, D.; Scholz-Reiter, B.; Shpitalni, M. Design of deep convolutional neural network architectures for automated feature extraction in industrial inspection. CIRP Ann. 2016, 65, 417–420. [Google Scholar] [CrossRef]
Aiger, D.; Talbot, H. The phase only transform for unsupervised surface defect detection. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 295–302. [Google Scholar]
Wang, Y.; Xia, H.; Yuan, X.; Li, L.; Sun, B. Distributed defect recognition on steel surfaces using an improved random forest algorithm with optimal multi-feature-set fusion. Multimed. Tools Appl. 2018, 77, 16741–16770. [Google Scholar] [CrossRef]
Liu, Y.; Xu, K.; Xu, J. An improved MB-LBP defect recognition approach for the surface of steel plates. Appl. Sci. 2019, 9, 4222. [Google Scholar] [CrossRef]
Liu, S.; Jia, M. An adaptive shunt model for steel defect detection based on yolox. In Proceedings of the 2023 IEEE 6th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Chongqing, China,, 24–26 February 2023; Volume 6, pp. 950–954. [Google Scholar]
Liu, X.; Gao, J. Surface defect detection method of hot rolling strip based on improved SSD model. In Proceedings of the Database Systems for Advanced Applications. DASFAA 2021 International Workshops: BDQM, GDMA, MLDLDSA, MobiSocial, and MUST, Taipei, Taiwan, 11–14 April 2021; Proceedings 26. Springer: Berlin/Heidelberg, Germany, 2021; pp. 209–222. [Google Scholar]
Zhang, H.; Li, S.; Miao, Q.; Fang, R.; Xue, S.; Hu, Q.; Hu, J.; Chan, S. Surface defect detection of hot rolled steel based on multi-scale feature fusion and attention mechanism residual block. Sci. Rep. 2024, 14, 7671. [Google Scholar] [CrossRef] [PubMed]
Ren, Q.; Geng, J.; Li, J. Slighter Faster R-CNN for real-time detection of steel strip surface defects. In Proceedings of the 2018 Chinese Automation Congress (CAC), Xi’an, China, 30 November–2 December 2018; pp. 2173–2178. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Usamentiaga, R.; Lema, D.G.; Pedrayes, O.D.; Garcia, D.F. Automated surface defect detection in metals: A comparative review of object detection and semantic segmentation using deep learning. IEEE Trans. Ind. Appl. 2022, 58, 4203–4213. [Google Scholar] [CrossRef]
Gould, S.; Gao, T.; Koller, D. Region-based segmentation and object detection. Adv. Neural Inf. Process. Syst. 2009, 22, 655–663. [Google Scholar]
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.Y.; Kainz, B.; et al. Attention u-net: Learning where to look for the pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar]
Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
Chen, Z.; Wang, T.; Wu, X.; Hua, X.S.; Zhang, H.; Sun, Q. Class re-activation maps for weakly-supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 969–978. [Google Scholar]
Wei, Y.; Liang, X.; Chen, Y.; Shen, X.; Cheng, M.M.; Feng, J.; Zhao, Y.; Yan, S. Stc: A simple to complex framework for weakly-supervised semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 2314–2320. [Google Scholar] [CrossRef] [PubMed]
Dai, J.; He, K.; Sun, J. Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1635–1643. [Google Scholar]
Khoreva, A.; Benenson, R.; Hosang, J.; Hein, M.; Schiele, B. Simple does it: Weakly supervised instance and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 876–885. [Google Scholar]
Chang, Y.T.; Wang, Q.; Hung, W.C.; Piramuthu, R.; Tsai, Y.H.; Yang, M.H. Weakly-supervised semantic segmentation via sub-category exploration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 8991–9000. [Google Scholar]
Song, K.; Yan, Y. A noise robust method based on completed local binary patterns for hot-rolled steel strip surface defects. Appl. Surf. Sci. 2013, 285, 858–864. [Google Scholar] [CrossRef]

Figure 1. Process of steel manufacturing.

Figure 2. System flowchart.

Figure 4.

D^{2}

-SPDM framework.

Figure 4.

D^{2}

-SPDM framework.

Figure 5. Network structure of ResNet-50.

Figure 6. Steel pixel defect mapping.

Figure 7. NEU-DET dataset.

Figure 8. Training and validation loss curves of defect detection model1.

Figure 9. Training and validation loss curves of defect detection model2.

Figure 10. Training and validation loss curves of defect detection model3.

Figure 11. Confusion matrices for different models for classification. ResNet-50, ResNet-101, ResNet-152 as following order of (a–c).

Figure 12. Bounding boxes for defect types detected through Faster RCNN with ResNet-50. The left column of each result displays the ground truth, while the right column shows the predictions. (a) Crazing, (b) inclusion, (c) patches, (d) pitted surface, (e) rolled-in scale, (f) scratches.

Figure 13. DeepLabv3+, GrabCut-based label enhancement results. (a) Original image, (b) bounding box, (c) initial mask, (d) result after 5 rounds, (e) result after 10 rounds.

Table 1. Defect detection dataset.

Class	Train Set	Validation Set	Test Set	Total
With Defects	1152	288	360	1800
Without Defects	1152	288	360	1800
Total	2304	576	720	3600

Table 2. Object detection dataset.

Defect Category	Train Set	Validation Set	Test Set	Total
Crazing	192	48	60	300
Inclusion	192	48	60	300
Patches	192	48	60	300
Pitted	192	48	60	300
Roll-in scale	192	48	60	300
Scratches	192	48	60	300
Without Defect	1152	288	360	1800
Total	2304	576	720	3600

Table 3. Description of parameters.

Parameter	Description
CPU	Intel core i7
Ram	32 GB
Graphics Card	NVIDIA L4

Table 4. Hyperparameters of defect detection models.

Model Number	Base Model	Description
1	ResNet-50	`batch_size = 32, lr = 0.0001`
2	ResNet-101	`batch_size = 16, lr = 0.0001`
3	ResNet-152	`batch_size = 20, lr = 0.0001`

Table 5. Comparison of AP and mAP for each defect type using different backbone networks in faster R-CNN.

Backbone	Average Precision						mAP
Backbone	Crazing	Inclusion	Patches	Pitted Surface	Rolled-In Scale	Scratches	mAP
VGG-16	0.6392	0.8043	0.7793	0.7242	0.6556	0.7041	0.7178
Inception-V2	0.6041	0.767	0.874	0.7524	0.7147	0.6822	0.7324
ResNet-50	0.6528	0.8208	0.8723	0.7409	0.768	0.7448	0.7666

Table 6. Model comparison results on the NEU-DET dataset.

Method	Backbone	mAP%	FPS	GFLOPs
Faster R-CNN	ResNet-50	0.7666	16	182.7
SSD	VGG-16	0.6102	51	42.4
YOLOv4	CSPDarkNet	0.6433	55	34.2

Table 7. Precision, recall, and average precision results for each defect type using faster R-CNN with ResNet-50.

Defect Category	Precision%	Recall%	AP%
Crazing	0.7259	0.6759	0.6528
Inclusion	0.875	0.8792	0.8208
Patches	0.9583	0.8639	0.8723
Pitted surface	0.8333	0.7756	0.7409
Rolled-in scale	0.8529	0.7941	0.768
Scratches	0.8534	0.7822	0.7448
Average	0.8498	0.7636	0.7666

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wi, T.; Yang, M.; Park, S.; Jeong, J. D²-SPDM: Faster R-CNN-Based Defect Detection and Surface Pixel Defect Mapping with Label Enhancement in Steel Manufacturing Processes. Appl. Sci. 2024, 14, 9836. https://doi.org/10.3390/app14219836

AMA Style

Wi T, Yang M, Park S, Jeong J. D²-SPDM: Faster R-CNN-Based Defect Detection and Surface Pixel Defect Mapping with Label Enhancement in Steel Manufacturing Processes. Applied Sciences. 2024; 14(21):9836. https://doi.org/10.3390/app14219836

Chicago/Turabian Style

Wi, Taewook, Minyeol Yang, Suyeon Park, and Jongpil Jeong. 2024. "D²-SPDM: Faster R-CNN-Based Defect Detection and Surface Pixel Defect Mapping with Label Enhancement in Steel Manufacturing Processes" Applied Sciences 14, no. 21: 9836. https://doi.org/10.3390/app14219836

APA Style

Wi, T., Yang, M., Park, S., & Jeong, J. (2024). D²-SPDM: Faster R-CNN-Based Defect Detection and Surface Pixel Defect Mapping with Label Enhancement in Steel Manufacturing Processes. Applied Sciences, 14(21), 9836. https://doi.org/10.3390/app14219836

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

D²-SPDM: Faster R-CNN-Based Defect Detection and Surface Pixel Defect Mapping with Label Enhancement in Steel Manufacturing Processes

Abstract

1. Introduction

2. Related Work

2.1. Steel Surface Defect Detection

2.2. ResNet

2.3. Faster R-CNN

2.4. Weakly Supervised Learning

3. $D^{2}$ -SPDM: Faster R-CNN-Based Defect Detection and Classification with Label Enhancement

3.1. System Structure

3.2. Defect Detection Process

3.3. Surface Pixel Defect Mapping

4. Experiment and Results

4.1. Dataset

4.2. Performance Metrics

4.3. Results

4.3.1. Stage 1: Defect Detection

4.3.2. Stage 2: Surface Pixel Defect Mapping

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

D2-SPDM: Faster R-CNN-Based Defect Detection and Surface Pixel Defect Mapping with Label Enhancement in Steel Manufacturing Processes

Abstract

1. Introduction

2. Related Work

2.1. Steel Surface Defect Detection

2.2. ResNet

2.3. Faster R-CNN

2.4. Weakly Supervised Learning

3. D 2 -SPDM: Faster R-CNN-Based Defect Detection and Classification with Label Enhancement

3.1. System Structure

3.2. Defect Detection Process

3.3. Surface Pixel Defect Mapping

4. Experiment and Results

4.1. Dataset

4.2. Performance Metrics

4.3. Results

4.3.1. Stage 1: Defect Detection

4.3.2. Stage 2: Surface Pixel Defect Mapping

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

D²-SPDM: Faster R-CNN-Based Defect Detection and Surface Pixel Defect Mapping with Label Enhancement in Steel Manufacturing Processes

3. $D^{2}$ -SPDM: Faster R-CNN-Based Defect Detection and Classification with Label Enhancement