Research on the Detection of Steel Plate Defects Based on SimAM and Twin-NMF Transfer

Zou, Yongqiang; Zhang, Guanghui; Fan, Yugang

doi:10.3390/math12172782

Open AccessArticle

Research on the Detection of Steel Plate Defects Based on SimAM and Twin-NMF Transfer

by

Yongqiang Zou

,

Guanghui Zhang

^* and

Yugang Fan

Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(17), 2782; https://doi.org/10.3390/math12172782

Submission received: 16 July 2024 / Revised: 16 August 2024 / Accepted: 6 September 2024 / Published: 8 September 2024

(This article belongs to the Section Engineering Mathematics)

Download

Browse Figures

Versions Notes

Abstract

:

Pulsed eddy current thermography can detect surface or subsurface defects in steel, but in the process of combining deep learning, it is expensive and inefficient to build a complete sample of defects due to the complexity of the actual industrial environment. Consequently, this study proposes a transfer learning method based on Twin-NMF and combines it with the SimAM attention mechanism to enhance the detection accuracy of the target domain task. First, to address the domain differences between the target domain task and the source domain samples, this study introduces a Twin-NMF transfer method. This approach reconstructs the feature space of both the source and target domains using twin non-negative matrix factorization and employs cosine similarity to measure the correlation between the features of these two domains. Secondly, this study integrates a parameter-free SimAM into the neck of the YOLOv8 model to enhance its capabilities in extracting and classifying steel surface defects, as well as to alleviate the precision collapse phenomenon associated with multi-scale defect recognition. The experimental results show that the proposed Twin-NMF model with SimAM improves the detection accuracy of steel surface defects. Taking NEU-DET and GC10-DET as source domains, respectively, in the ECTI dataset, [email protected] reaches 99.3% and 99.2%, and the detection accuracy reaches 98% and 98.5%.

Keywords:

defect detection; NMF; transfer learning; eddy current thermal imaging; YOLOv8

MSC:

62H25; 65F55; 68T45; 11A51; 15A23

1. Introduction

In modern industrialized society, steel is prized for its high strength and good seismic resistance, making it one of the most widely used materials in the engineering field. However, the service environment for steel plates is often complex and variable, with external factors such as environmental corrosion, fluctuating loads, and drastic temperature differences contributing to the likelihood of fatigue cracking [1]. The process of crack initiation, propagation, and fracture gradually accelerates, and crack fatigue accounts for more than 80% of all failures in metal parts [2]. Therefore, non-destructive testing for early crack defects in steel plates enables timely detection of potential fault hazards, which is of great significance in preventing engineering accidents.

Non-destructive testing technology is based on the premise that the performance of the object being tested will not be adversely affected. This technology utilizes the physical and chemical properties of the object, such as sound, light, electricity, and magnetism, to identify and assess potential defects. Currently, the most prevalent non-destructive testing methodologies used both domestically and internationally include a variety of techniques such as penetrant testing, radiographic testing, ultrasonic testing, magnetic particle testing, eddy current testing, and others. Pulsed eddy current thermography employs pulsed electromagnetic excitation, encompassing a variety of physical and temporal characteristics associated with electricity, magnetism, and heat, along with a wealth of transient information. In this imaging technology, the application of heat to the defect results in an increase in temperature contrast between the defect and non-defect areas, thereby improving the signal-to-noise ratio and the detection sensitivity of small defects [3]. Pulse eddy current thermography provides a distinctive advantage in detecting surface cracks in steel. Compared to other non-destructive testing methodologies, it is capable of delivering detailed data about the morphology, dimensions, and position of defects. However, traditional machine vision recognition methods are constrained in their practical application due to their dependence on the operator’s expertise, complex image preprocessing, and manual feature extraction.

With the advancement and application of deep learning, the integration of non-destructive testing technology with deep learning methods has garnered widespread attention. This defect detection approach, which leverages deep learning, is adaptive and can significantly reduce costs while enhancing the accuracy and efficiency of product inspections. For example, Zhao et al. [4] introduced a Multi-Scale Adaptive Fusion (MSAF) YOLOv8n defect detection algorithm, which takes into account detection speed and accuracy, and achieves real-time detection in embedded devices. Chen et al. [5] improved YOLOv8 and proposed a lightweight EFS-YOLO model. The experimental results of NEU-DET and GC10-DET showed that the parameters of the model were reduced by 49.5%. Xie et al. [6] proposed an Autoencoder Thermography (AET) method to enhance the visibility of backside cracks during infrared thermal imaging, which is capable of effectively extracting the underlying features of backside cracks and constructing new, clearer images. Yang et al. [7] developed a rolling electric heating device as a heat excitation source to construct a thermal imaging defect databank, and combined it with an improved Faster R-CNN to detect steel plate crack defects, with a detection accuracy of 95.54%. Hu et al. [8] used principal component analysis and Faster R-CNN to perform deep mining of time and space pattern crack information, achieving accurate crack location. However, the aforementioned approach focused solely on the accuracy of upstream tasks, overlooking the fact that deep learning requires a vast number of training samples. Acquiring a thermal imaging training dataset that encompasses complete defect types and an adequate sample size is a costly endeavor [9].

The advent of deep learning methods has introduced novel concepts to the field of non-destructive testing technology. However, challenges persist in constructing a comprehensive set of feature samples under actual working conditions. To address this, scholars have begun to adopt the concept of transfer learning, based on deep learning, to complete the eddy current thermal imaging training sample set. Transfer learning allows the model to apply knowledge learned from source domain tasks to different but related target domain tasks, solving the problem of incomplete training sample sets. To illustrate, the model transfer method loads the pre-trained weights, which are trained on large datasets, and then fine-tunes the target domain dataset to achieve transfer learning. Nevertheless, the general pre-trained weights are unable to adjust the feature differences between the source and target domains, which may result in negative transfer in specific tasks. In order to measure the differences between the domain samples and overcome the negative transfer problem, a metric relationship needs to be established between the domains. For instance, Chen et al. [10] proposed a DAYOLOv5 model that employs MMD to minimize the distance between the source domain and the target domain. It reduces the high demand for target domain data for magnetic tile surface defect detection. Yan et al. [11] proposed the Joint Dynamic MMD (JD-MMD) model, which extracts multiple features to achieve facial expression recognition by minimizing the difference in feature distribution between subdomains. Ge et al. [12] proposed a domain knowledge transfer method called Gaussian process-based transfer kernel learning (GPTKL). This method introduces a cross-domain Gaussian process and a shared classification model, which enables the transfer of domain knowledge across different domains. Li et al. [13] introduced manifold information into NMF (non-negative matrix factorization) and proposed a dual graph regularization constrained NMF, which achieved good results in clustering experiments on face datasets. Hao et al. [14] proposed a method that combines transfer learning with NMF, utilizing single NMF to facilitate the transfer of domain knowledge. Jia et al. [15] proposed a face matching method based on the Siamese NMF, which imposes intra-class and inter-class errors on the coefficient matrix. This method is robust in the event of various light intensity changes. These studies demonstrate that NMF can effectively address the feature difference problem in transfer learning. Nevertheless, the current research lacks a concise NMF method that can directly process information from both the source and target domains simultaneously, thereby reducing the model’s complexity and enhancing its practical utility.

Building on the existing NMF transfer method, this study further proposes a steel surface defect detection method based on Twin-NMF transfer learning. The effectiveness of the proposed transfer and attention improvement methods is verified through experimental analysis. The main contributions of this study are outlined as follows:

(1): This study introduces the idea of Siamese networks into non-negative matrix factorization (NMF) and designs a Twin-NMF model. Twin-NMF simultaneously processes samples from the source and target domains, and uses the shared base matrix W to help the model capture and utilize common features across domains, thereby achieving feature space reconstruction between samples across domains.
(2): After the source and target domain samples are reconstructed in the feature space by Twin-NMF, a source domain sample screening algorithm based on the cosine similarity is designed to further obtain the source domain samples that are most relevant to the target domain task. The knowledge transfer from the source domains NEU-DET and GC10-DET to the target domain ECTI is achieved by fine-tuning with pre-trained weights.
(3): This study introduces SimAM, a parameter-free attention mechanism, into the Neck-end improvement of YOLOv8. The enhanced multi-scale fusion ability alleviates the recognition accuracy collapse phenomenon of the basic model in the ECTI dataset.
(4): The method proposed in this study provides a reference for solving the domain adaptation problem of samples in the source domain and the target domain and for achieving the organic combination of eddy current thermal imaging detection and deep learning.

The paper is organized as follows: Section 2 introduces the proposed Twin-NMF transfer method and SimAM-YOLOv8 defect model. Section 3 verifies the effectiveness of the proposed method through experiments. Finally, Section 4 concludes the paper.

2. Proposed SimAM Attention and Twin-NMF Transfer Methods

Eddy current thermography is a non-destructive testing technique that uses electromagnetism to induce eddy currents on the surface of conductive materials. These currents are captured by an infrared thermal imager, which provides a detailed image of the temperature field associated with the currents. This method offers an intuitive way to detect the shape, size, and location of surface and subsurface defects in conductive materials, benefiting from non-contact operation and high sensitivity. However, the complexity of industrial environments makes it costly and inefficient to gather comprehensive defect data for deep learning training on steel surfaces. Moreover, the complexity of the actual industrial environment makes it costly and inefficient to obtain comprehensive defect data for deep learning training on steel surfaces. When presented with defect samples that encompass multiple scales of information, existing deep learning models demonstrate inconsistency in their ability to accurately identify defect types across different scales. The models in question are unable to process features of intermediate scales effectively, resulting in a collapse of accuracy to the two adjacent scales.

To overcome these challenges in detecting defects in steel surfaces using deep learning, this study introduces a novel approach combining Twin-NMF transfer and the SimAM attention mechanism to enhance feature fusion and accuracy across multiple scales.

2.1. Overall Network Structure and TNMF Transfer Methodology

The essence of transfer learning revolves around addressing the domain adaptation challenge between source and target domain samples, which often share intersecting or subset relationships. Eliminating redundant source domain data can reduce computational costs and increase the efficiency of transfer learning. Initially, NMF is concurrently applied to both source and target domain samples, using a shared basis matrix to impose twin constraints on the coefficient matrices, thus reconstructing the feature space. This setup facilitates the comparison of samples across domains using cosine similarity, allowing for the precise selection of relevant source domain data that align with target domain characteristics. Subsequently, the selected source domain samples are pre-trained using YOLOv8, and a fine-tuning transfer approach is adopted to transfer knowledge by loading the pre-trained weights. The enhanced SimAM-YOLOv8 model is then integrated to address the decrease in recognition accuracy when handling multi-scale features, as depicted in Figure 1, which illustrates the workflow combining Twin-NMF and SimAM-YOLOv8.

2.1.1. The Design of Twin-NMF

In the context of image analysis, NMF [16] (non-negative matrix factorization) has the ability to reduce the dimensionality and extract local image features. NMF is a technique that approximately decomposes a large non-negative matrix into two non-negative small matrices.

X_{n \times m} \approx W_{n \times k} H_{k \times m}, k \leq n,

(1)

where

X_{n \times m}

represents a non-negative matrix formed by arranging the image data into m column vectors within

R^{n}

space. Here,

W_{n \times k}

serves as a non-negative basis matrix, and

H_{k \times m}

functions as a non-negative coefficient matrix, projecting

X_{n \times m}

onto a reduced

R^{k}

space.

Establishment of the Twin-NMF model: Twin-NMF, an innovation in this study, combines traditional NMF principles with the Siamese Network concept. Siamese Network models have been extensively employed in semantic analysis, face recognition, target tracking, and other domains [17]. This study employs the publicly available NEU-DET dataset as the source domain samples to complete transfer learning for the target domain detection tasks of the ECTI dataset. This paper applies NMF to both the source and target domains, using a shared basis matrix and imposing a twinning constraint on the coefficient matrix to make the feature spaces of the two domains converge. The designed Twin-NMF model is shown in Figure 2.

The initial phase involved selecting 50 representative infrared defect images from the eddy current thermal imaging (ECTI) dataset, which were then compiled into a specialized defect sample library known as ECTI-50. It contains 17 large, 17 medium, and 16 small steel plate crack defects under three types of eddy current thermal imaging. This library was utilized as the target domain data within the Twin-NMF framework. The source domain data were the 1800-image public dataset NEU-DET. The dual input structure enables the model to process two datasets in parallel, and the dimension k of the Twin-NMF is set to 50. The objective of this configuration is to reduce the dimensionality of the source domain samples in order to obtain a feature space M, and subsequently map the target domain samples to the same feature space M, without any form of compression. The constraint coefficient matrix is used to adjust the observation angle of the two domains’ data, and achieve the reorganization of the source and target domain features in the

R^{50}

space.

Design of the loss function: The objective function of this TNMF is shown below:

{a r g m i n}_{W, H \geq 0} \frac{1}{2} \{{‖X_{1} - W H_{1}‖}_{F}^{2} + {‖X_{2} - W H_{2}‖}_{F}^{2} + {α ‖{H_{1}}^{*} - H_{2}‖}_{F}^{2}\},

(2)

where

{X_{1}, X}_{2}

represent the feature matrices of the dataset in the source and target domains, respectively. W denotes the feature basis matrix that is shared between the two domains, which is used to achieve the Twin-NMF. And

{H_{1}, H}_{2}

represent the decomposition coefficient matrices of the dataset in the source and target domains, respectively. The transformation of the source domain coefficient matrix

H_{1}

yields the

{H_{1}}^{*}

coefficient matrix. Due to the difference in the dataset sizes between the source and target domains, the coefficient matrices

H_{i}

, derived from the Twin-NMF decomposition, exhibit variable dimensions. Directly using mathematical dissimilarity to determine convergence is not feasible. Therefore, to standardize

{H_{1}}^{*}

and

H_{2}

to the same dimension, an averaging aggregation method is employed, resulting in

{H_{1}}^{*} = H_{1} A

, where A is an average aggregation operation performed on the columns.

The loss function designed in this study consists of reconstruction loss and twin constraint loss, the reconstruction loss of the matrix contains two NMF

{‖X_{1} - W H_{1}‖}_{F}^{2}

and

{‖X_{2} - W H_{2}‖}_{F}^{2}

,

{‖\cdot‖}_{F}

refers to the Frobenius Norm based on the Euclidean distance, and the twin constraints are mainly expressed by the convergence of the two coefficient matrices

{H_{1}}^{*}

and

H_{2}

. The parameter α serves as a regularization term to modulate the intensity of the twin constraint.

Iterative approach for the associated basis and coefficient matrices: In this paper, the gradient descent method is used to update the basis matrix W and the coefficient matrix H. The complete loss function is shown below:

D (W, H_{1}, H_{2}) = \frac{1}{2} \{{‖X_{1} - W H_{1}‖}_{F}^{2} + {‖X_{2} - W H_{2}‖}_{F}^{2} + α {‖H_{1}^{*} - H_{2}‖}_{F}^{2}\},

(3)

where

{‖\cdot‖}_{F}

is the Frobenius norm with

{‖A‖}_{F}^{2} = t r (A^{T} A)

, and the expansion for D has

D (W, H_{1}, H_{2}) = \frac{1}{2} \{\begin{matrix} t r [{X_{1}}^{T} X_{1} - 2 {X_{1}}^{T} W H_{1} + {H_{1}}^{T} W^{T} W H_{1}] \\ + t r [{X_{2}}^{T} X_{2} - 2 {X_{2}}^{T} W H_{2} + {H_{2}}^{T} W^{T} W H_{2}] \\ + α \cdot t r [{{(H}_{1}^{*})}^{T} {H_{1}}^{*} - 2 {{(H}_{1}^{*})}^{T} H_{2} + {H_{2}}^{T} H_{2}] \end{matrix}\},

(4)

The original data

X_{1}, X_{2}

can be reconstructed by minimizing the loss function

D (W, H_{1}, H_{2})

to find the optimal basis matrix W and coefficient matrices

H_{1}, H_{2}

. The gradients of the shared basis matrix W and the two coefficient matrices

H_{1}, H_{2}

can be obtained by calculating the partial derivatives of

D (W, H_{1}, H_{2})

with respect to

W, H_{1}, H_{2}

:

\frac{\partial D (W, H_{1}, H_{2})}{\partial W} = W H_{1} {H_{1}}^{T} + W H_{2} {H_{2}}^{T} - (X_{1} {H_{1}}^{T} + X_{2} {H_{2}}^{T}),

(5)

\frac{\partial D (W, H_{1}, H_{2})}{\partial H_{1}} = W^{T} W H_{1} + α H_{1} A A^{T} - (W^{T} X_{1} + α H_{2} A^{T}),

(6)

\frac{\partial D (W, H_{1}, H_{2})}{\partial H_{2}} = W^{T} W H_{2} + α H_{2} - (W^{T} X_{2} + α H_{1} A),

(7)

The iterative update equation for the parameters of

W, H_{1}, H_{2}

using gradient descent at time step

t + 1

is given by:

W^{t + 1} = W^{t} - a \frac{\partial D (W, H_{1}, H_{2})}{\partial W},

(8)

{H_{1}}^{t + 1} = {H_{1}}^{t} - b \frac{\partial D (W, H_{1}, H_{2})}{\partial H_{1}},

(9)

{H_{2}}^{t + 1} = {H_{2}}^{t} - c \frac{\partial D (W, H_{1}, H_{2})}{\partial H_{2}},

(10)

In order to ensure that the basis and coefficient matrices obtained after the decomposition of the Twin-NMF can continue to retain the property of non-negativity, let

a = W^{t} \frac{1}{W H_{1} {H_{1}}^{T} + W H_{2} {H_{2}}^{T}},

(11)

b = {H_{1}}^{t} \frac{1}{W^{T} W H_{1} + α H_{1} A A^{T}},

(12)

c = {H_{2}}^{t} \frac{1}{W^{T} W H_{2} + α H_{2}},

(13)

Substituting Equations (11)–(13) into Equations (8)–(10), we obtain the iterative formula for solving the optimal basis matrix and coefficient matrix:

W_{i k}^{t + 1} = W_{i k}^{t} \frac{{(X_{1} {H_{1}}^{T} + X_{2} {H_{2}}^{T})}_{i k}}{{(W H_{1} {H_{1}}^{T} + W H_{2} {H_{2}}^{T})}_{i k}},

(14)

{(H_{1})}_{i k}^{t + 1} = {(H_{1})}_{i k}^{t} \frac{{(W^{T} X_{1} + α H_{2} A^{T})}_{i k}}{{(W^{T} W H_{1} + α H_{1} A A^{T})}_{i k}},

(15)

{(H_{2})}_{i k}^{t + 1} = {(H_{2})}_{i k}^{t} \frac{{(W^{T} X_{2} + α H_{1} A)}_{i k}}{{(W^{T} W H_{2} + α H_{2})}_{i k}},

(16)

where

W_{i k}

denotes the i-th row and k-th column element of the W matrix, and similarly

{(H_{1})}_{i k}

and

{(H_{2})}_{i k}

denote the i-th row and k-th column elements of the

H_{1}

and

H_{2}

matrices, respectively.

Iteration over the associated basis and coefficient matrices is implemented by Algorithm 1:

Algorithm 1: Twin non-negative matrix factorization (TNMF) for W,

H_{1}

,

H_{2}

matrix updates.

Inputs: Source domain data matrix

X_{1}

, Destination domain data matrix

X_{2}

, Dimension

K

, Maximum round M, Twinning strength α, Decomposition threshold ξ.

Initialize W,

H_{1}

,

H_{2}

to positive Gaussian random values.

for

t = 1

to M do

H_{1}^{*}

←

H_{1} A

Update

W

:

W^{(t + 1)}

←

W^{(t)} (\frac{X_{1} H_{1}^{T} + X_{2} H_{2}^{T}}{W H_{1} H_{1}^{T} + W H_{2} H_{2}^{T}})

;

A

←

H_{1}^{*} H_{1}^{- 1}

Update

H_{1}

,

H_{2}

:

H_{1}^{(t + 1)}

←

H_{1}^{(t)} (\frac{X_{1} W^{T} + α H_{2} A^{T}}{W H_{1} W^{T} + α H_{1}^{*} A^{T}})

;

H_{2}^{(t + 1)}

←

H_{2}^{(t)} (\frac{X_{2} W^{T} + α H_{1}^{*}}{W H_{2} W^{T} + α H_{2}})

If max

\{{‖X_{1}^{(t)} - W^{(t + 1)} H_{1}^{(t + 1)}‖}_{F}^{2}, {‖X_{2}^{(t)} - W^{(t + 1)} H_{2}^{(t + 1)}‖}_{F}^{2}, α {‖H_{1}^{* (t + 1)} - H_{2}^{(t + 1)}‖}_{F}^{2}\}

<

ξ

then

continue

else

break

end if

end for

Output: W,

H_{1}

,

H_{2}

.

2.1.2. Cosine Similarity Filtering on Source Domains

In this paper, the Twin-NMF model was applied to the source domain dataset NEU-DET and the target domain defect library ECTI-50, successfully reconstructing a shared feature space for both domains. Subsequently, the cosine similarity metric was employed to selectively filter the source domain samples.

Cosine similarity measures the similarity between vectors by evaluating the cosine of the angle between them, where a smaller angle indicates higher similarity [18].

H_{1}

and

H_{2}

are the feature representations of the source domain data

X_{1}

and the target domain data

X_{2}

, respectively, under the shared basis matrix W, which exists as a matrix in the model. According to the principles of NMF, each column vector in the coefficient matrix corresponds to an image within the dataset. The cosine similarity can quantify the directional similarity between two vectors independent of their magnitudes. In the TNMF model, if the cosine similarity of

{H_{1}}^{*}

and

H_{2}

is high, it indicates that the feature representations of the source and target domains under the shared basis matrix W are more consistent. This consistency reflects the similarity or proximity between the source domain data and the target domain data in the feature space. The source domain coefficient matrix

H_{1} = [a_{1}, a_{2}, a_{3}, \dots, a_{n}]

obtained from the Twin-NMF model, and the target domain coefficient matrix

H_{2} = [b_{1}, b_{2}, b_{3}, \dots, b_{m}]

do the cosine similarity calculation between the two vectors, which can be filtered by the cosine similarity to ensure that the features learned by the model in the source domain are more applicable to the target domain and to enhance the migration ability of the model, thereby improving its performance in the target domain. Where this study in TNMF^(NEU) involved

n = 1800, m = 50

. The cosine similarity is calculated as follows:

η_{c s} = \frac{〈a_{n}, b_{m}〉}{{‖a_{n}‖}_{2} \cdot {‖b_{m}‖}_{2}}

(17)

In this study, we traversed the typical defect samples in matrix

H_{2}

through the source domain defect samples in matrix

H_{1}

, calculating the cosine similarity for each typical defect sample in ECTI-50 relative to the top 30 NEU-DET source domain defect samples. A similarity threshold,

η_{c s} = 0.875

[15], was established. Samples from NEU-DET exhibiting cosine similarities above this threshold were identified as having high correlation with the target domain samples and were subsequently selected. The processes of cosine similarity calculation and source domain screening are detailed in Algorithm 2.

Algorithm 2: Selecting and moving source images based cosine similarity.

Inputs: Dataset images

X_{1}

,

X_{2}

, similarity threshold

η_{c s} = 0.875

.

Output: Images moved to the target folder based on similarity.

Use the TNMF to process datasets

X_{1} a n d X_{2}

, obtaining

W

,

H_{1}

,

H_{2}

.

Create a target folder for storing selected images.

Reconstruct

X_{1} a n d X_{2}

using W,

H_{1}

,

H_{2}

:

X_{1} = W H_{1}

,

X_{2} = W H_{2}

for each column vector in

X_{2}

do

Calculate the cosine similarity using

η_{c s} = \frac{〈a_{n}, b_{m}〉}{{‖a_{n}‖}_{2} \cdot {‖b_{m}‖}_{2}}

Select images exceeding the threshold

η_{c s} = 0.875

.

Sort by similarity and select the images with top 30 similarity.

for each selected image do

Move the image file to the target folder.

Update the list of images in dataset

X_{1}

to prevent duplicates.

end for

Print: The process of selecting and moving images has been completed.

2.2. SimAM-YOLOv8 Model

YOLOv8 [19] is one of the most prevalent object detection models in the industry today. This model has been enhanced from YOLOv5 [20], with a focus on both detection speed and accuracy. In comparison to YOLOv5, YOLOv8 reduces the size of the convolutional kernel in the initial layer of the model from 6 × 6 to 3 × 3. Furthermore, the two convolutional linking layers situated at the Neck have been removed, and all C3 modules have been substituted with C2f. These enhancements ensure that the defect information can be extracted with greater efficiency while maintaining a reduced computational burden. In the Head, the original coupled head structure is decoupled, with the two tasks of classification and localization separated. This eliminates parameter sharing and introduces the anchorless frame concept, which discards the previous anchor-based offset with the anchor-free and directly predicts the defect center.

2.2.1. SimAM Attention Module

SimAM [21] is a parameter-free attention mechanism that does not add to the parameter count of the network model structure. The underlying principle of this mechanism is rooted in the concept of spatial suppression in neuroscience, where an active neuron suppresses the activities of its neighboring neurons [22]. In seeking linear separability between neurons, an energy function is defined to facilitate this process. The energy function about the neurons is defined as follows:

e_{t} (w_{t}, b_{t}, y, x_{i}) = {(y_{t} - \hat{t})}^{2} + \frac{1}{M - 1} \sum_{i = 1}^{M - 1} {(y_{o} - \hat{x_{i}})}^{2},

(18)

where

\hat{t} = w_{t} t + b_{t}

is the linear representation of the target neuron

\hat{t}

,

\hat{x_{i}} = w_{t} x_{i} + b_{t}

is the linear representation of the other neurons

\hat{x_{i}}

in the same channel, and

M = H \times W

denotes the number of neurons in the channel. The energy function obtained by using binary labeling is such that

y_{t} = 1

and

y_{o} = - 1

and adding a regular term is as follows:

e_{t} (w_{t}, b_{t}, y, x_{i}) = {(1 - (w_{t} t + b_{t}))}^{2} + λ w_{t}^{2} + \frac{1}{M - 1} \sum_{i = 1}^{M - 1} {(- 1 - (w_{t} x_{i} + b_{t}))}^{2},

(19)

where the mean

μ_{t}

and variance

σ_{t}^{2}

of the channel after removing the target neuron t can be expressed as follows:

μ_{t} = \frac{1}{M - 1} \sum_{i = 1}^{M - 1} x_{i},

(20)

σ_{t}^{2} = \frac{1}{M - 1} \sum_{i = 1}^{M - 1} {(x_{i} - μ_{t})}^{2},

(21)

The partial derivatives of the weights

w_{t}

and bias

b_{t}

are as follows:

\frac{\partial e_{t}}{\partial w_{t}} = \sum_{i = 1}^{M - 1} x_{i}^{2} w_{t} + x_{i} b_{t} + x_{i} + t^{2} w_{t} + t b_{t} - t + λ (M - 1) w_{t},

(22)

\frac{\partial e_{t}}{\partial b_{t}} = \frac{2}{M - 1} (\sum_{i = 1}^{M - 1} (w_{t} x_{i} + b_{t} + 1) + (w_{t} t + b_{t} - 1)),

(23)

By setting the partial derivatives of Equations (22) and (23) to zero and combining Equations (20)–(23), we obtain the following expression for the weight

w_{t}

and bias

b_{t}

of the neuron t:

w_{t} = - \frac{2 (t - μ_{t})}{{(t - μ_{t})}^{2} + 2 σ_{t}^{2} + 2 λ},

(24)

b_{t} = - \frac{1}{2} (t + μ_{t}) w_{t},

(25)

Substitution of the weight

w_{t}

and bias

b_{t}

into the Equation (19) yields the following formula for the analytical solution with respect to the minimum energy

e_{t}^{*}

:

e_{t}^{*} = \frac{4 (σ_{t}^{2} + λ)}{{(t - \hat{μ})}^{2} + 2 {\hat{σ}}_{t}^{2} + 2 λ},

(26)

From the aforementioned formula, it is evident that a lower value of the minimum energy,

e_{t}^{*}

, indicates a greater disparity between a neuron and its neighboring neurons, thereby highlighting the neuron’s increased importance. Consequently, the attention weight formula for SimAM can be derived as shown below:

\hat{X} = s i g m o i d (\frac{1}{E}) ⊙ X,

(27)

The weights generated by traditional 1D or 2D attention modules are confined to a single channel or spatial region, treating other spatial or channels uniformly, without differentiation. In contrast, SimAM overcomes these spatial and channel-based constraints. The three-dimensional weights it calculates can be applied to each neuron individually, thereby enhancing the model’s resistance to interference and improving its overall accuracy.

2.2.2. Improved SimAM-YOLOv8

As a lightweight attention module that does not introduce additional parameters, SimAM is ideally suited for integration into the industrial inspection model discussed in this study. It effectively addresses the issue of multi-scale feature accuracy degradation observed in the model. This study enhances the sensitivity of YOLOv8 to multi-scale feature maps by incorporating the SimAM attention module at the neck of the model. The architecture of the enhanced SimAM-YOLOv8 model is illustrated in Figure 3.

In this study, the image to be detected is reshaped to a size of 640 × 640 × 3. Following feature extraction by the backbone, five feature maps of varying sizes are obtained: 320 × 320 × 64, 160 × 160 × 128, 80 × 80 × 256, 40 × 40 × 512, and 20 × 20 × 1024, of which the deeper P3, P4, and P5 layers are routed to the neck end for feature fusion and subsequent stitching processing. In this study, a parameter-free SimAM attention module was added to the three output paths at the neck end, to address the misdetection of multi-scale defective features after fusion and up to the step of detection. This occurs when the intermediate sizes are incorrectly recognized on both sides. The aim was to facilitate the recognition of the fused feature maps by the detector head with the help of the three-dimensional weighting values.

3. Experiments and Experimental Results

This section will introduce the relevant datasets used in this study and some experimental parameter settings and provide some ablation experiments that support improvement points, as well as comparative experiments with the current mainstream detection algorithms such as Faster-RCNN, SSD, and YOLO.

3.1. Introduction of the Dataset

In this study, two datasets were employed: the publicly available NEU-DET [23] dataset, the GC10-DET [24] dataset, and the laboratory-created ECTI dataset. The NEU-DET dataset consists of 1800 grayscale images, utilized as the source domain sample. The ECTI dataset contains 1280 eddy current infrared thermography images, serving as the target domain sample. From this dataset, a curated selection of 50 images has been specifically chosen to form the typical defective sample library, designated as ECTI-50. This study employs the Twin-NMF model, which has been designed for transfer learning, to perform twin NMF on the source domain samples NEU-DET and the target domain typical defect sample library ECTI-50. Subsequently, the source domain samples NEU-DET are selected based on the threshold of cosine similarity. Experimental results with GC10-DET as source domain data were supplemented in the same context.

The NEU-DET dataset contains six types of surface defects of hot-rolled steel strip: including Crazing (Cr), Inclusion (In), Patches (Pa), Pitted Surface (Ps), Rolled-in Scale (RS), and Scratches (Sc), with a total of 1800 grey-scale images. The GC10-DET dataset contains 10 types of defects, with a total of 3570 images. The ECTI dataset contains three types of surface defects of steel plates captured by eddy current thermography: Small, Medium, and Large, which represent different sizes of crack defects, with a total of 1280 infrared images. Partial images from the dataset are illustrated in Figure 4:

3.2. Experimental Settings

The experiments for this study were conducted on the AutoDL server rental platform. The server was equipped with an AMD EPYC 9654 CPU and an NVIDIA GeForce RTX4090 GPU with 24 GB of memory. The deep learning environment consisted of CUDA version 11.8, PyTorch 2.0.0, Python 3.8, and PyCharm 2023. The data were split into a training set and a validation set with a ratio of 8:2. The training process involved 200 epochs and a batch size of 32.

3.3. Ablation Experiments

3.3.1. Analysis of Twin-NMF Transfer Effectiveness

This section outlines six experimental groups designed to assess the efficacy of the Twin-NMF transfer method. Experiment 1, termed ‘No-transfer’, involves training directly on the ECTI dataset without loading any pre-training weights, with 200 epochs of training starting from initialization. Experiment 2 involves using YOLOv8n, which is loaded with universal detection weights (v8n) obtained by pre-training on the large-scale COCO dataset. This experiment includes 200 epochs of fine-tuning on the ECTI dataset using the v8n pre-training weights. Experiment 3, conducted without employing the Twin-NMF method, referred to as No-TNMF^(NEU), involves using all 1800 NEU-DET images for pre-training, which includes 50 epochs to develop the pre-training weights. Subsequently, these weights are fine-tuned over 200 epochs using the ECTI dataset. Experiment 4 utilizes the Twin-NMF method to perform NMF on both the NEU-DET and ECTI-50 datasets. Subsequently, the NEU-DET samples are screened using a cosine similarity. Following this, the selected NEU-DET source domain samples undergo 50 epochs of training to generate pre-training weights. Finally, these pre-training weights are loaded for 200 epochs of fine-tuning on the ECTI dataset, referred to as TNMF^(NEU). Experiments 5 and 6 were designed using GC10-DET as the source domain, referred to as No-TNMF^(GC10) and TNMF^(GC10), respectively.

The initial premise of the six experimental settings is that there is a discrepancy in the distribution of features present in the source domain samples and the target domain samples ECTI. Not all images from the source domain contain relevant prior knowledge for the target task associated with ECTI. This discrepancy, characterized as redundant deviation, has the potential to negatively affect the model’s performance, especially in cases where the model is not effectively adapted to the target task. The TNMF method is designed to identify the most pertinent aspect of the source domain samples between the source domain and the target domain, facilitating a more direct and precise transfer of domain knowledge between the source and target domains. Table 1 presents the data from ablation experiments that evaluate the effectiveness of this transfer method on the ECTI dataset.

As illustrated in Table 1, Experiment 2, which utilized general detection weights (v8n), was compared to Experiment 1 (which did not use pre-trained weights and trained the model from scratch), exhibited a 0.1% decrease in precision and a 1.1% decrease in recall. This suggests that the COCO dataset, from which v8n weights are derived, is not closely aligned with the ECTI dataset. In contrast, Experiment 4, which employed the Twin-NMF method to screen NEU-DET, demonstrated a 0.8% increase in precision and a 0.7% increase in [email protected] compared to Experiment 3, where Twin-NMF was not used. Notably, the recall in Experiment 3 was 1.6% lower than in Experiment 1, indicating a significant rate of false detection. This issue may be attributed to the redundant bias in NEU-DET, affecting the model’s performance in parameter transfer. Using GC10-DET as the source domain dataset, Experiment 6, which applied TNMF, shows a more consistent trend compared to Experiment 5, where TNMF was not used, with precision metrics being 1.2% higher and [email protected] also 1.2% higher. As indicated by Table 1, the application of the Twin-NMF method has led to a notable improvement in model performance on the target domain task, particularly in detection accuracy, where TNMF^(NEU) reached 97.9% and TNMF^(GC10) reached 97.5%.

Figure 5 displays the loss and precision curves for the six experimental groups. As shown in Figure 5, Experiment 2, which utilizes pre-trained v8n weights from the COCO dataset, exhibits significantly lower loss compared to the other five experiments. This advantage of low loss is a consequence of deep learning utilizing generic large datasets that can be trained. Compared to loading the pre-training weights, the convergence of the directly trained Experiment 1 is significantly more lagging in the precision curve, and the loss curve also starts falling from the highest loss value. The distinction between Experiment 3 and 4 lies in the utilization of the TNMF method. Although the discrepancy in the loss curve is marginal, the divergences in the precision curves are more pronounced. Experiment 4, which employs the TNMF approach, demonstrates superior performance compared to Experiment 3, particularly noticeable in the early oscillations and the smoother convergence observed in the later stages. In using GC10-DET as the source domain dataset, TNMF^(GC10) with the application of TNMF likewise outperforms No-TNMF^(GC10) in terms of accuracy profile. The performance of both source domain datasets corroborates the effectiveness of the TNMF method in migration.

In screening the source domain NEU-DET samples using cosine similarity, the distribution of the screened samples within the NEU-DET dataset is plotted by extracting relevant cosine similarity values. This distribution is illustrated in Figure 6 below.

After completing the joint Twin-NMF process with the NEU-DET and ECTI-50 datasets, cosine similarity is utilized in the screening stage. This involves calculating the cosine similarity for each image in ECTI-50 against all images in NEU-DET, ordering the results by the magnitude of the cosine similarity values. For a single image from ECTI-50, the cosine similarities with all NEU-DET images are calculated, and the top 30 images exceeding the threshold

η_{c s} = 0.875

are selected as samples strongly associated with the ECTI target domain. From the NEU-DET dataset, a total of 1181 images were included in the top 30 cosine similarity rankings, with 552 images identified as having strong correlations above the threshold value.

The distribution of visualized data of NEU-DET before and after applying TNMF is obtained by t-SNE [25] (T-distributed Stochastic Neighbor Embedding) algorithm as shown in Figure 7 below.

In three-dimensional space, it can be observed that the green ECTI dataset forms a relatively centralized cluster and overlaps with the other two datasets (NEU-DET and TNMF-NEU, which apply the TNMF screening process) in certain areas. The TNMF-NEU dataset, under the TNMF method, retains some of the feature distributions of the original NEU-DET dataset. At the same time, when examining the degree of overlap between different clusters and the tightness within each cluster, it can be seen that TNMF-NEU is converging toward the ECTI dataset. This suggests that TNMF introduces new features while preserving the original data features, making the data distribution closer to that of the target domain. However, feature tuning alone may not be sufficient to fully compensate for the distributional differences between the source and target domains. Thus, while TNMF can facilitate domain adaptation from the source domain to target domain tasks to some extent, it does not completely eliminate the variability between the two.

3.3.2. Analysis of SimAM Effectiveness

This section evaluates the addition of the SimAM module to the six previously described experimental groups and analyzes its effectiveness on the model. The results are detailed in Table 2. It is notable that the incorporation of SimAM into the neck of the model yielded significant performance improvements across all models, with the most substantial enhancement observed in the final model, which combines Twin-NMF transfer and SimAM. The TNMF(NEU) + SimAM combination demonstrated an improvement of 1% in precision, 0.5% in [email protected], 1% in [email protected], and achieved an F1 score of 0.98. Similarly, TNMF(GC10) + SimAM showed a 1.5% improvement in precision and a 0.4% improvement in [email protected] compared to the baseline.

The introduction of SimAM aims to mitigate the phenomenon of accuracy collapse observed in the model when detecting intermediate-scale defects. More detailed experimental results across various defect types are presented in Table 3. Table 3 shows that medium defects exhibit reduced accuracy compared to larger and smaller categories, a phenomenon referred to in this paper as “accuracy collapse”. This issue may arise from the model’s inability to accurately differentiate medium defects, often misclassifying them as small or large, which in turn impacts the overall model accuracy. Compared to the original model, the integration of SimAM has notably improved detection rates for medium defects, thereby strengthening the model’s most vulnerable aspect. We compared the results before and after adding SimAM to the medium types. The red marks indicate improvements, while the blue marks indicate decreases.

Meanwhile, a comparative analysis between SimAM and other attention methods was performed, with the findings detailed in Table 4. We bold the data where our method performs best. Table 4 compares the performance metrics and parameter increases of SimAM with several other common attention mechanisms (such as SE, CBAM, and ECA). In terms of performance, the precision and recall of SimAM are slightly higher than those of the other mechanisms. Additionally, unlike the other attention mechanisms that require additional parameters, SimAM does not introduce any extra parameters. As a parameter-free attention mechanism, SimAM offers significant advantages over CBAM, SE, and similar methods, particularly in terms of reduced parameterization. Additionally, SimAM’s approach to calculating three-dimensional weight values for neurons significantly enhances the overall performance of the model. SimAM strikes an ideal balance between performance and efficiency, making it more attractive than mechanisms such as SE, CBAM, and ECA.

3.4. Comparison Experiment

This section provides a comprehensive comparison between the proposed method and several mainstream target detection algorithms, including SSD [29], Faster R-CNN [30], YOLO, and others, with the experimental results detailed in Table 5. We bold the data where our method performs best. Where TNMF^(NEU) is the pre-training weights obtained from 50 rounds of training after migrating and screening NEU-DET as the source domain in this paper, TNMF^(GC10) is the pre-training weights obtained by using GC10-DET as the source domain, and the rest of the pre-training weights come from some generalized large datasets. The proposed method excels across all major performance evaluation metrics, demonstrating significant advantages. TNMF^(NEU) achieves optimal values in precision of 98%, recall of 97.1%, [email protected] of 99.3%, and [email protected] of 78.1%. TNMF^(GC10) achieves a precision of 98.5% and an [email protected] of 99.2%. Moreover, the use of TNMF-based pre-trained weights allows the proposed approach to leverage the benefits of transfer learning, thus accelerating the model’s training process on specific tasks while maintaining robust generalization capabilities.

4. Conclusions

This paper puts forth a steel surface defect detection model based on Twin-NMF transfer learning. Initially, the devised Twin-NMF algorithm is employed to reconstruct the feature space of the source domain, NEU-DET, and the target domain, ECTI. Subsequently, the source domain samples with high correlation with the target domain are selected based on cosine similarity. Finally, the parameter-free SimAM introduced at the neck end of YOLOv8 serves to mitigate the accuracy collapse phenomenon present in the recognition process while simultaneously completing the transfer improvement of the detection model for the target domain task. The results show that:

(1): The Twin-NMF algorithm enhances the similarity between the two domains through twin constraints, effectively overcoming the limitations of traditional NMF in processing multi-domain data. This enhancement facilitates more accurate knowledge transfer from the source domain to the target domain.
(2): The strategy of filtering source domain samples based on cosine similarity enhances the model’s adaptability to the target domain. Utilizing source domain data that exhibits a strong correlation with the target domain helps mitigate the impact of negatively correlated information, thereby boosting the model’s ability to adapt to the target domain task.
(3): The introduction of the parameter-free SimAM module at the neck of YOLOv8 does not substantially increase the cost of model training, yet it effectively alleviates the phenomenon of accuracy collapse during recognition tasks. This enhancement significantly improves the accuracy and robustness of the target domain task.

Overall, the steel surface defect detection model proposed in this study has improved accuracy, recall, and [email protected] over the baseline unmigrated approach. This model not only enhances the efficiency of transfer and the accuracy of recognition but also demonstrates the potential of integrating transfer learning and attention mechanisms in complex industrial applications. Not limited to steel surface defect detection, the method can be extended to other industrial visual inspection tasks, offering new ideas and methods for advancing and optimizing deep learning in practical applications. However, the TNMF migration method presented in this paper is not sufficient to fully compensate for the data distribution differences between the source and target domains. It primarily eliminates redundant data for the target domain. Further research may require more sophisticated transfer learning algorithms to more accurately address the distribution differences between the source and target domains.

Author Contributions

Conceptualization, Y.Z. and Y.F.; methodology, Y.Z. and Y.F.; software, Y.Z.; validation, Y.Z. and Y.F.; formal analysis, Y.Z. and Y.F.; investigation, G.Z. and Y.F.; resources, G.Z. and Y.F.; data curation, Y.Z. and G.Z.; writing—original draft preparation, Y.Z. and Y.F.; writing—review and editing, G.Z. and Y.F.; visualization, Y.Z. and Y.F.; supervision, G.Z. and Y.F.; project administration, G.Z. and Y.F.; funding acquisition, G.Z. and Y.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Yunnan Provincial Major Science and Technology Project (Research on the blockchain-based agricultural product traceability system and demonstration of platform construction. Project No. 202102AD080002).

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, H. Research on Fatigue Crack Growth Prediction Method of Metal Components Based on Eddy Current Nondestructive Testing. Ph.D. Thesis, University of Electronic Science and Technology of China, Chengdu, China, 2021. [Google Scholar]
Wu, F.; Yao, W. A New Model of the Fatigue Life Curve of Materials. CJME 2008, 19, 1634–1637. [Google Scholar]
Tian, G.; Gao, B.; Gao, Y.; Wang, P.; Wang, H.; Shi, Y. Review of Railway Rail Defect Non-destructive Testing and Monitoring. Chin. J. Sci. Instrum. 2016, 37, 1763–1780. [Google Scholar]
Zhao, B.T.; Chen, Y.R.; Jia, X.F.; Ma, T.B. Steel Surface Defect Detection Algorithm in Complex Background Scenarios. Measurement 2024, 237, 115189. [Google Scholar] [CrossRef]
Chen, B.; Wei, M.; Liu, J.; Li, H.; Dai, C.; Liu, J.; Ji, Z. EFS-YOLO: A Lightweight Network Based on Steel Strip Surface Defect Detection. Meas. Sci. Technol. 2024, 35, 116003. [Google Scholar] [CrossRef]
Xie, J.; Xu, C.; Chen, G.; Huang, W. Improving Visibility of Rear Surface Cracks during Inductive Thermography of Metal Plates Using Autoencoder. Infrared Phys. Technol. 2018, 91, 233–242. [Google Scholar] [CrossRef]
Yang, J.; Wang, W.; Lin, G.; Li, Q.; Sun, Y.; Sun, Y. Infrared Thermal Imaging-Based Crack Detection Using Deep Learning. IEEE Access 2019, 7, 182060–182077. [Google Scholar] [CrossRef]
Hu, J.; Xu, W.; Gao, B.; Tian, G.Y.; Wang, Y.; Wu, Y.; Yin, Y.; Chen, J. Pattern Deep Region Learning for Crack Detection in Thermography Diagnosis System. Metals 2018, 8, 612. [Google Scholar] [CrossRef]
He, Y.; Deng, B.; Wang, H.; Cheng, L.; Zhou, K.; Cai, S.; Ciampa, F. Infrared Machine Vision and Infrared Thermography with Deep Learning: A Review. Infrared Phys. Technol. 2021, 116, 103754. [Google Scholar] [CrossRef]
Li, C.; Yan, H.; Qian, X.; Zhu, S.; Zhu, P.; Liao, C.; Tian, H.; Li, X.; Wang, X.; Li, X. A Domain Adaptation YOLOv5 Model for Industrial Defect Inspection. Measurement 2023, 213, 112725. [Google Scholar] [CrossRef]
Yan, J.; Yue, Y.; Yu, K.; Zhou, X.; Liu, Y.; Wei, J.; Yang, Y. Multi-Representation Joint Dynamic Domain Adaptation Network for Cross-Database Facial Expression Recognition. Electronics 2024, 13, 1470. [Google Scholar] [CrossRef]
Ge, P.; Sun, Y. Gaussian Process-Based Transfer Kernel Learning for Unsupervised Domain Adaptation. Mathematics 2023, 11, 4695. [Google Scholar] [CrossRef]
Li, J.; Li, Y.; Li, C. Dual-Graph-Regularization Constrained Nonnegative Matrix Factorization with Label Discrimination for Data Clustering. Mathematics 2023, 12, 96. [Google Scholar] [CrossRef]
Hao, B.; Fan, Y.; Song, Z. Deep Transfer Learning-Based Pulsed Eddy Current Thermography for Crack Defect Detection. Acta Opt. Sin. 2023, 43, 0415002. [Google Scholar]
Jia, X.; Sun, F. Vehicle Face Re-identification Algorithm Based on Siamese Nonnegative Matrix Factorization. Chin. J. Sci. Instrum. 2020, 41, 132–139. [Google Scholar]
Lee, D.D.; Seung, H.S. Learning the Parts of Objects by Non-negative Matrix Factorization. Nature 1999, 401, 788–791. [Google Scholar] [CrossRef]
Kaya, M.; Bilge, H.Ş. Deep Metric Learning: A Survey. Symmetry 2019, 11, 1066. [Google Scholar] [CrossRef]
Kirişci, M. New Cosine Similarity and Distance Measures for Fermatean Fuzzy Sets and TOPSIS Approach. Knowl. Inf. Syst. 2023, 65, 855–868. [Google Scholar] [CrossRef]
Varghese, R.; Sambath, M. YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness. In Proceedings of the 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), Chennai, India, 18–19 April 2024; pp. 1–6. [Google Scholar]
Jiang, P.; Ergu, D.; Liu, F.; Cai, Y.; Ma, B. A Review of YOLO Algorithm Developments. Procedia Comput. Sci. 2022, 199, 1066–1073. [Google Scholar] [CrossRef]
Yang, L.; Zhang, R.-Y.; Li, L.; Xie, X. Simam: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks. In Proceedings of the 38th International Conference on Machine Learning PMLR, Virtual, 18–24 July 2021; pp. 11863–11874. [Google Scholar]
Webb, B.S.; Dhruv, N.T.; Solomon, S.G.; Tailby, C.; Lennie, P. Early and Late Mechanisms of Surround Suppression in Striate Cortex of Macaque. J. Neurosci. 2005, 25, 11666–11675. [Google Scholar] [CrossRef]
He, Y.; Song, K.; Meng, Q.; Yan, Y. An End-to-End Steel Surface Defect Detection Approach Via Fusing Multiple Hierarchical features. IEEE Trans. Instrum. Meas. 2019, 69, 1493–1504. [Google Scholar] [CrossRef]
Lv, X.; Duan, F.; Jiang, J.-J.; Fu, X.; Gan, L. Deep Metallic Surface Defect Detection: The New Benchmark and Detection Network. Sensors 2020, 20, 1562. [Google Scholar] [CrossRef] [PubMed]
Van der Maaten, L.; Hinton, G. Visualizing Data Using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single Shot Multibox Detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Wang, C.Y.; Yeh, I.H.; Liao, H.Y.M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv 2024, arXiv:2402.13616. [Google Scholar]
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]

Figure 1. Flowchart of Twin-NMF transfer combined with SimAM-YOLOv8.

Figure 2. The network structure of the Twin-NMF model.

Figure 3. SimAM-YOLOv8 model network structure.

Figure 4. Partial NEU-DET, GC10-DET, and ECTI defect images.

Figure 5. Loss and precision curves of the six groups of experiments.

Figure 6. Distribution of TNMF selecting samples in source domains.

Figure 7. 3D distribution of source and target domain data under the TNMF.

Table 1. Ablation experiments considering TNMF on ECTI.

	Model	P%	R%	[email protected]%	[email protected]%	F1
Experiment 1	No-transfer	97.1	96.9	98.1	77.5	0.97
Experiment 2	YOLOv8n	97	95.8	97.6	77.6	0.96
Experiment 3	No-TNMF^(NEU)	97.1	95.3	98.3	77.6	0.96
Experiment 4	TNMF^(NEU)	97.9	96.9	99	77.6	0.97
Experiment 5	No-TNMF^(GC10)	96.3	98.4	99	76.8	0.97
Experiment 6	TNMF^(GC10)	97.5	96.9	98.9	78	0.97

Table 2. The ablation experiments considering SimAM on ECTI.

Model	P%	R%	[email protected]%	[email protected]%	F1
No-transfer + SimAM	97	97.2	98.8	77.1	0.97
YOLOv8n + SimAM	97.3	96.8	98.8	77.6	0.97
No-TNMF^(NEU) + SimAM	97.6	97.4	98.9	77.6	0.98
TNMF^(NEU) + SimAM	98	97.1	99.3	78.1	0.98
No-TNMF^(GC10) + SimAM	98	96.7	98.6	77.3	0.97
TNMF^(GC10) + SimAM	98.5	96.9	99.2	76.6	0.97

Table 3. Experimental results of various defects after the introduction of SimAM.

Method	Defect Types	P%		R%		[email protected]%		[email protected]%
Method	Defect Types	Original	Simam	Original	Simam	Original	Simam	Original	Simam
No-transfer	Small	99.1	99	1	1	99.5	99.5	74.1	73.4
	Medium	92.1	93.7	98.3	98.3	97.1	98.2	77.8	76.5
	Large	1	98.4	92.3	93.4	97.7	98.7	80.5	81.7
YOLOv8n	Small	99.5	99.2	1	1	99.5	99.5	75.6	73.4
	Medium	91.4	92.9	95	95	95.2	98.2	75.9	76.5
	Large	1	99.8	92.3	95.4	98.1	98.7	81.5	81.3
No-TNMF^(NEU)	Small	99.7	99.5	1	1	99.5	99.5	75.3	73.3
	Medium	91.7	93.4	96.7	98.3	98	98.1	77.2	77.6
	Large	99.8	99.9	89.2	93.8	97.5	99	80.3	81.9
TNMF^(NEU)	Small	99.5	99.5	1	1	99.5	99.5	74	76
	Medium	94.1	94.6	98.3	98.3	98.5	98.8	78.1	78.5
	Large	99.9	1	92.3	92.8	99.2	99.4	80.7	79.9
No-TNMF^(GC10)	Small	96.5	99.5	1	1	99.5	99.5	71.4	75.6
	Medium	92.3	94.5	99.7	98.3	98.5	97.9	77.7	76.1
	Large	1	1	95.6	91.7	98.9	98.3	81.4	80.1
TNMF^(GC10)	Small	99.8	99.2	1	1	99.5	99.5	74.3	70
	Medium	92.7	96.4	98.3	96.7	98.5	98.7	77.6	78.1
	Large	1	1	92.5	94	98.8	99.4	82	81.8

Table 4. Comparison between SimAM and other attention mechanisms.

Model	P%	R%	[email protected]%	[email protected]%	Increased Parameters	Parameters Formula
TNMF^(NEU) + CBAM [26]	97.2	96.6	98.2	77.1	1250 + 4706 + 18530	2C²/r + 2k²
TNMF^(GC10) + CBAM	97.9	96.9	98.7	77.3	1250 + 4706 + 18530	2C²/r + 2k²
TNMF^(NEU) + SE [27]	96.9	96.4	98	76.7	512 + 2048 + 8192	2C²/r
TNMF^(GC10) + SE	97.6	96.9	98.6	78	512 + 2048 + 8192	2C²/r
TNMF^(NEU) + ECA [28]	97.7	97.4	99.1	77.9	3 + 3 + 3	k
TNMF^(GC10) + ECA	97.3	96.8	98.6	77.4	3 + 3 + 3	k
TNMF^(NEU) + SimAM [21]	98	97.1	99.3	78.1	0	0
TNMF^(GC10) + SimAM	98.5	96.9	99.2	76.6	0	0

Table 5. Comparison experiments with other object detection models on ECTI.

Model	Pre-Training Weights	P%	R%	[email protected]%	[email protected]%
SSD [29]	Efficient Net	72.1	57.3	72.1	44.8
Faster-RCNN [30]	Resnet50	80.4	64.6	80.4	52.6
YOLOv5	v5n	97.1	96.4	98	77.7
YOLOv8 [19]	v8n	97	95.8	97.6	77.6
YOLOv9 [31]	v9t	97.2	96.7	98.8	76.7
YOLOv10 [32]	v10n	96.3	95.1	98.5	76.3
Our Method	TNMF^(NEU)	98	97.1	99.3	78.1
Our Method	TNMF^(GC10)	98.5	96.9	99.2	76.6

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zou, Y.; Zhang, G.; Fan, Y. Research on the Detection of Steel Plate Defects Based on SimAM and Twin-NMF Transfer. Mathematics 2024, 12, 2782. https://doi.org/10.3390/math12172782

AMA Style

Zou Y, Zhang G, Fan Y. Research on the Detection of Steel Plate Defects Based on SimAM and Twin-NMF Transfer. Mathematics. 2024; 12(17):2782. https://doi.org/10.3390/math12172782

Chicago/Turabian Style

Zou, Yongqiang, Guanghui Zhang, and Yugang Fan. 2024. "Research on the Detection of Steel Plate Defects Based on SimAM and Twin-NMF Transfer" Mathematics 12, no. 17: 2782. https://doi.org/10.3390/math12172782

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on the Detection of Steel Plate Defects Based on SimAM and Twin-NMF Transfer

Abstract

1. Introduction

2. Proposed SimAM Attention and Twin-NMF Transfer Methods

2.1. Overall Network Structure and TNMF Transfer Methodology

2.1.1. The Design of Twin-NMF

2.1.2. Cosine Similarity Filtering on Source Domains

2.2. SimAM-YOLOv8 Model

2.2.1. SimAM Attention Module

2.2.2. Improved SimAM-YOLOv8

3. Experiments and Experimental Results

3.1. Introduction of the Dataset

3.2. Experimental Settings

3.3. Ablation Experiments

3.3.1. Analysis of Twin-NMF Transfer Effectiveness

3.3.2. Analysis of SimAM Effectiveness

3.4. Comparison Experiment

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI