A Fabric Defect Segmentation Model Based on Improved Swin-Unet with Gabor Filter

Xu, Haitao; Liu, Chengming; Duan, Shuya; Ren, Liangpin; Cheng, Guozhen; Hao, Bing

doi:10.3390/app132011386

Open AccessArticle

A Fabric Defect Segmentation Model Based on Improved Swin-Unet with Gabor Filter

¹

School of Cyber Science and Engineering, Zhengzhou University, Zhengzhou 450002, China

²

Institute of Information Technology, Information Engineering University, Zhengzhou 450007, China

³

Songshan Laboratory, Zhengzhou 450002, China

^*

Author to whom correspondence should be addressed.

^†

Current address: No. 97 of Wenhua Road, Jinshui District, Zhengzhou 450002, China.

^‡

These authors contributed equally to this work.

Appl. Sci. 2023, 13(20), 11386; https://doi.org/10.3390/app132011386

Submission received: 12 September 2023 / Revised: 22 September 2023 / Accepted: 24 September 2023 / Published: 17 October 2023

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Volume)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Fabric inspection is critical in fabric manufacturing. Automatic detection of fabric defects in the textile industry has always been an important research field. Previously, manual visual inspection was commonly used; however, there were drawbacks such as high labor costs, slow detection speed, and high error rates. Recently, many defect detection methods based on deep learning have been proposed. However, problems need to be solved in the existing methods, such as detection accuracy and interference of complex background textures. In this paper, we propose an efficient segmentation algorithm that combines traditional operators with deep learning networks to alleviate the existing problems. Specifically, we introduce a Gabor filter into the model, which provides the unique advantage of extracting low-level texture features to solve the problem of texture interference and enable the algorithm to converge quickly in the early stages of training. Furthermore, we design a U-shaped architecture that is not completely symmetrical, making model training easier. Meanwhile, multi-stage result fusion is proposed for precise location of defects. The design of this framework significantly improves the detection accuracy and effectively breaks through the limitations of transformer-based models. Experimental results show that on a dataset with one class, a small amount of data, and complex sample background texture, our method achieved 90.03% and 33.70% in ACC and IoU, respectively, which is almost 10% higher than other previous state of the art models. Experimental results based on three different fabric datasets consistently show that the proposed model has excellent performance and great application potential in the industrial field.

Keywords:

defect detection; fabric defect segmentation; Swin UNet; neural network

1. Introduction

The textile manufacturing industry, as a representative light industry, produces millions of tons of fabric annually. It operates on a large scale and involves a complex production process, including spinning, weaving, dyeing, printing, finishing, and clothing manufacturing. Fabric defects in this industry are mainly caused by machine failures, yarn issues, poor finishing, and excessive stretching, leading to profit losses of 45–65%. Early detection of fabric defects is crucial for minimizing losses [1]. Previously, manual visual inspection was used; however, it had drawbacks such as high labor costs, slow detection speed, and high error rates. With the advent of computer vision, automated fabric defect detection has significantly improved detection efficiency, reduced costs, and enhanced productivity in the textile industry.

Since the mid-1990s, the textile industry’s automation of inspection for fabric defects has been a vital research area [2]. This process aims to identify and locate fabric surface defects to cut costs, reduce time, and boost productivity. Numerous researchers and engineers have worked on developing robust and efficient algorithms over the past few decades to meet industry needs effectively.

Typically, in classic machine vision methods, features must be handcrafted to adapt to specific domains. Hand engineering of features is essential to classical approaches; however, such features are not suited for different tasks, which leads to the need to manually set a large number of parameters for different textures and defects. Therefore, these traditional methods are difficult to generalize to other types of fabrics for general use. In recent years, deep learning methods have become the most common method in the field of computer vision. Compared with classical machine vision methods, deep learning methods can automatically learn features directly from low-level data, and have stronger feature representation capabilities. This completely replaces the engineering of manually designed features with automated learning processes [3], achieving the effect of adapting to different types of fabric defects. Jing et al. [4] introduced depthwise separable convolution on Mobile-UNet, which can extract multi-scale features while reducing complexity cost and model size and achieving state of the art performance. One recently proposed work [5] used a GAN-based image generation mechanism to repair defective sample images in order to obtain the difference between defective samples and repaired samples before segmenting the defective area. Li et al. [6] proposed a fabric defect detection model based on Cascade R-CNN and integrated block recognition and detection box merging algorithms to achieve defect detection in high-resolution images.

Currently, Swin-Unet has shown strong performance in medical image segmentation; however, when applied to fabric defect detection, it has been found that it converges more slowly than traditional convolutional neural networks (CNNs) in the early stages of training, and can easily fit a locally optimal solution. This is because the texture of the fabric is often composed of periodic patterns which can be quickly learned by convolution but are more difficult for Transformers. To address this challenge, we use Swin-Unet as the benchmark method to design a more efficient network, as it has achieved good results in both accuracy and robustness. In summary, our proposed method of fabric defect detection makes the following contributions:

1.: A Transformer-based fabric defect segmentation network is proposed to solve the local optimal fitting problem of the previous transformer model.
2.: We embed traditional spectral methods into the model and use a fixed Gabor filter to eliminate the influence of complex textured backgrounds on defect segmentation.
3.: We design a U-shaped architecture that is not completely symmetrical, making model training easier. Meanwhile, multi-stage result fusion is proposed for precise locating of defects.
4.: A new loss function is proposed to solve the imbalance problem between defect samples and non-defect samples. The loss function proposed in this paper uses a combination of the weighted focal loss and Dice loss to improve the convergence speed and detection accuracy of the model.

2. Related Work

Generally speaking, fabric defect detection algorithms can be divided into two main categories: traditional image processing and deep learning based on convolutional neural networks (CNNs).

2.1. Traditional Approaches

Traditional image processing methods can be further subdivided into statistical methods [7,8,9], spectral approaches [10,11,12], and model-based methods [13,14,15,16].

The statistical methods typically include mathematical morphology, co-occurrence matrices, autocorrelation, local contrast enhancement, histogram feature analysis, and other algorithms [17,18]. Anitha and Radha [7] used independent component analysis and vector quantized principal component analysis to extract the texture and structure features from the template image, then compared these features with the input image to identify defect areas. Raheja et al. [8] proposed an automated fabric defect detection system using the Gray-Level Co-occurrence Matrix (GLCM); however, its universality is limited to constant background texture. On the other hand, Kang et al. [9] introduced a universal defect detection method that combines the Integral Image concept with the Elo Rating algorithm (IIER), enabling quick detection of defects in various fabrics. However, this method may not be effective for segmenting large periodic texture fabrics.

The spectral method is based on spatial frequency domain features, which are less sensitive to noise and intensity changes compared to features extracted from the spatial domain. It includes well-known techniques such as Fourier transform [19], wavelet transform [20], and Gabor transform [21], which have been extensively studied for fabric defect detection to eliminate image texture. In [10], the authors developed the Wavelet Preprocessing Gold Image Subtraction (WGIS) method for defect detection of patterned fabrics or repetitive pattern textures. In [11], the Optical Fourier Transform (OFT) method was applied to detect defects in plain cotton fabric. Jing [12] presented a GIS algorithm based on the Gabor filter and used a genetic algorithm to optimize its parameters. The Gabor filter is a commonly used technique in fabric defect detection, offering more accurate separation of fabric features such as texture and background compared to wavelet transform, and appears in several learning-based methods. However, spectral methods require a high degree of periodicity, and are not suitable for fabrics with random textures.

Model-based methods overcome the difficulty of fabric defect detection with random surface variations. This method constructs an image model and generates fabric textures matching the observed ones. In [13], the authors used Gaussian Markov Random Fields (GMRA) to model the texture image of a non-defective fabric. In [16], L0 gradient minimization was applied to eliminate background texture interference, then the processed image was clustered using fuzzy c-means. Finally, the clustering centers of fabric defects and non-defects were calculated and updated to segment the defect area. However, manually verifying and adjusting numerous parameters is often necessary due to the need to capture specific details in different textures.

2.2. Learning-Based Approaches

In summary, traditional machine vision methods have certain limitations regarding detection performance and model complexity. In recent years, automatic learnable detection has gained popularity for fabric defect detection, with learning-Based methods, particularly those utilizing CNN models, showing natural advantages in handling complex fabric images.

Jing et al. [22] proposed a fabric defect classification method based on improved AlexNet. They extracted defect features gradually, fusing edge details through multiple convolution operations to obtain essential image features. An improved YOLO fabric defects detection algorithm [23] was proposed. This method adds a new prediction layer to yolo_head to improve the effect of small defects detection and uses the CEIOU loss function to achieve accurate classification and location of defects. Tabernik et al. [3] proposed a two-stage segmentation plus decision network suitable for few-sample scenarios. Huang et al. [24] followed this two-stage network and improved the pixel accuracy under the premise of few-sample scene detection. Several researchers have combined traditional methods with deep learning models to overcome complex surface texture interference in defect detection. In [25], the authors embedded a Gabor filter into a Faster R-CNN network to eliminate the texture background and used a genetic algorithm to determine the optimal Gabor parameters. In the field of fabric defect detection, another common solution is to model the defect detection problem as a semantic segmentation model. Fully Convolutional Networks (FCN) [26] and UNet [27] are the most classical image segmentation networks, both utilizing full convolutional architecture for advanced feature extraction and fusion. FCN classifies images at the pixel level, achieving semantic-level image segmentation. UNet first introduced an encoder–decoder structure with skip connections to enhance feature fusion at different levels. Oktay et al. [28] proposed a novel self-attention gating module embedded into standard UNet to better pay attention to foreground pixels in segmentation tasks. Zhou et al. [29] introduced nested and dense skip connections into UNet to reduce the semantic gap between the feature maps of the encoder and decoder subnetworks.

On the other hand, the first transformer-based model revolutionized Natural Language Processing(NLP) text translation tasks, achieving state of the art result. Recent works have attempted to apply the transformer-based model to vision tasks. Vision Transformer (ViT) [30] solves the computational complexity problem by dividing the image into blocks for self-attention calculations rather than between all pixels of the entire image. ViT’s development has challenged the dominance of CNNs, as transformers focus on global relationships to capture more contextual information, unlike CNNs’ limited local information capture ability. However, transformers are prone to overfitting in the early stages of training, especially on small datasets. In [31], the authors provided a powerful segmentation model named SEgmentation TRansformer (SETR) based on a pure transformer design. However, its self-attention computational complexity is quadratic with the image size, impacting efficiency. Swin Transformer has emerged as an answer to this problem, as it computes attention within local windows, resulting in linear computational complexity relative to input size. In addition, it can replace classic CNN architectures, serving as a universal backbone in computer vision. Li et al. [32] proposed UniFormer, which aims to organically unify convolution and self-attention in the style of transformer, giving full play to the advantages of both and simultaneously solving the two major problems of local redundancy and global dependency to achieve efficient feature learning. Wang et al. [33] proposed CrossFormer, in which a Cross-scale Embedding Layer (CEL) and a Long Short Distance Attention (LSDA) module realize the fusion of features of different scales to provide cross-scale features for self-attention. LSDA divides self-attention into two parts, short-distance and long-distance, which reduces computational burden while retaining large-scale and small-scale characteristics.

Compared with these related methods, in the present paper we propose an efficient transformer-based fabric defect segmentation neural network with a Gabor filter preprocessing operation. Swin-Unet [34], based on the Swin Transformer block, is the first pure transformer-based U-shaped architecture, and excels in medical image segmentation. Its structure is applicable to the fabric defect segmentation problem as well; thus, we leverage Swin-Unet as our benchmark method, designing a more efficient network for fabric defect segmentation and achieving impressive results in terms of both accuracy and robustness.

3. Proposed Method

3.1. Gabor Filter

The texture of the fabric is usually composed of periodic patterns; by identifying and analyzing these periodic textures, defects in the fabric can be easily detected. A Gabor filter is a specialized wavelet filter that decomposes images into features corresponding to different scales and directions. By selecting specific frequencies and directions, the local features of textures can be extracted. Luan et al. [35] proposed that many convolutional kernels similar to the Gabor filter appear in the low-level filters of AlexNet. Additionally, Hubel et al. [36] found that the receptive profiles of simple cells in the mammalian visual system resemble those of Gabor filters. Therefore, Gabor filters are widely used to simulate simple cells in the visual cortex, providing unique advantages in extracting low-level texture features. In traditional algorithms for fabric defect detection, a series of wavelet filters, particularly Gabor filters, have demonstrated significant effectiveness in fabric texture extraction.The 2D Gabor filter definition can be expressed as follows:

\begin{matrix} G_{λ, θ, φ, σ, γ} (x^{'}, y^{'}) & = & exp (- \frac{{x^{'}}^{2} + γ^{2} {y^{'}}^{2}}{2 σ^{2}}) exp (i (2 π \frac{x^{'}}{λ} + φ)) \\ x^{'} & = & x cos θ + y sin θ \\ y^{'} & = & - x sin θ + y cos θ \end{matrix}

(1)

where x and y represent the two directions (horizontal and vertical) of a pixel,

λ

and

θ

control the wavelength and orientation of the Gabor filters, respectively,

σ

is the Gaussian standard deviation of the filter,

φ

represents the phase, and

γ

is the aspect ratio of the space.

3.2. Swin-Unet

In recent years, transformer blocks have shown superior performance in visual tasks as compared to equivalent CNN modules. Microsoft utilized CNN design principles to design the Swin Transformer [37] block, which divides images into smaller patches based on a shifting window. Computing only the self-attention within each window is much less complex than calculating the self-attention between all pixels. Moreover, using the design principle of shifting windows, attention information can be exchanged between adjacent windows, effectively utilizing the relative position information. Unlike the conventional multi-head self attention (MSA) module used in ViT, the Swin Transformer block can be thought as a series of two modules: a regular window-based MSA (W-MSA) module, and a shifted window-based MSA (SW-MSA) module.In model design, W-MSA and SW-MSA are usually applied consecutively to ensure that attention between windows is not limited to a single area. Figure 1 illustrates a common configuration of two consecutive Swin Transformer blocks.

UNet [27] is a classic segmentation network that has profoundly impacted subsequent network design due to its symmetric U-shaped structure with skip-connections. It utilizes multi-layer convolution and upsampling in the decoder, forming a U-shaped structure symmetrical to the encoder. The skip connections help to fuse high-resolution features at different scales from the encoder, reducing spatial information loss caused by downsampling and significantly improving segmentation performance. This influential U-shaped design has inspired many researchers to propose improved models based on it. The Swin-Unet [34] architecture used in this paper is obtained by turning the U-Net backbone into a Swin Transformer block. Swin-Unet adopts a transformer-based U-shaped Encoder–Decoder architecture with skip connections.

Regarding image downsampling, Swin Transformer uses patch merging to compress the width and height of image features instead of average pooling or maximum pooling, thereby retaining feature information to a greater extent. Based on this method, Swin-Unet uses a patch-expanding upsampling method, enabling the U-shaped network to utilize more information (including uncompressed image features at the same level) during upward image reconstruction. However, Swin-Unet cannot avoid the problem, common in transformer-based models, of the transformer block being unable to quickly capture the texture features of the fabric image in the early stages of training, which causes the model output to quickly fit to a local optimal solution.

3.3. Model Design

To solve the local optimal fitting problem of transformer-based models in fabric defect segmentation, we designed a network based on Swin-UNet. The overall architecture of the model is shown in Figure 2. In this section, we introduce the three stages of the model separately: (1) the fixed Gabor filter for preprocessing fabric texture features; (2) the U-shaped Swin Transformer Nntwork for defect segmentation; and (3) multi-stage result fusion for precisely locating defects.

Stage 1. First, the input of the network is an RGB image. Considering that the Gabor filter is designed as a 5 × 5 kernel size convolution layer, the d attribute is the number of directions. When d = 4, the value of

θ

of the Gabor filter is [0,45,90,135]; the schematic diagram of the filter kernel with different parameters

θ

is shown in Figure 3. In addition, the Gabor filter processes RGB channels separately; thus, setting the convolution group as the number of input channels can ensure that RGB do not affect each other through Gabor filtering. In addition, the unfiltered original image is merged together as part of the output. Overall, the number of output channels can be calculated as

c_{o u t} = c_{i n} * d + c_{i n}

. Although the Gabor filter is part of the network, its parameters are fixed, and do not participate in model training.

Stage 2. After preprocessing, the U-shaped network is the main part of the model. After four times downsampling and upsampling, the features of the fabric are compressed and reconstructed. When Swin-Unet is used for fabric defect segmentation, two problems should be noted:

1. The repetitive texture feature learning is much harder for Swin Transformer than CNN;

2. Due to imbalanced positive and negative samples, Swin-Unet is more likely to stay at the local optimal solution, identifying all results as background, when using the BCE–Dice loss as the loss function.

Through our experiments, we found that in order to maintain a basically consistent structure of the Swin Transformer in the downsampling stage, using a lighter structure in the upsampling stage can make model training easier. Therefore, we improved Swin-Unet by making it not completely symmetrical. The architecture hyperparameters of these model variants are:

Swin-Unet-T: C = 96, N = 6
Swin-Unet-S: C = 96, N = 18
Swin-Unet-B: C = 128, N = 18
Swin-Unet-L: C = 192, N = 18

where C is the channel number of the hidden layers in the first block and N is the layer number of the third block. With this setting, the pretraining parameters of the Swin Transformer can be loaded for transfer learning. The patch expanding layer is the same as in Hu’s work on Swin-Unet [34]. It is very similar to the reverse patch merging operation. The difference is that we concatenate the features from the bottom and left in the skip connections in order to reduce the loss of spatial information caused by downsampling. In the U-shaped network, the output of each layer is used as the basis for the final decision.

Stage 3. From a simple point of view, each Fusion Head is an upsampling module with different upsampling ratios. This multi-scale output prediction is common in target detection algorithms such as Faster RCNN. However, the difference is that we use the segmentation results of each layer as a reference instead of a simple superposition to select the final segmentation results.

3.4. Loss Function

Due to the extremely uneven distribution of positive and negative samples during the training process, we use the Dice loss [38] to increase the loss weight of certain misclassifications, increasing the loss when the network predicts positive samples as negative samples. The Dice Loss (

L_{d i c e}

) can be written as

\begin{matrix} L_{d i c e} = 1 - \frac{2 \sum_{i = 1}^{N} p_{i} g_{i}}{\sum_{i = 1}^{N} p_{i} + \sum_{i = 1}^{N} g_{i} + ε} \end{matrix}

(2)

where N represents the total number of pixels in the image,

g_{i}

belongs to

{0, 1}

, specifying the ground truth label of pixel i,

p_{i} \in

[0, 1] denotes the predicted output probability, and

ϵ

is a term set to prevent the denominator from being 0.

The Dice loss value decreases when there is more overlap between the actual distribution and the network’s predicted distribution; thus, it is often used to push the model to learn in a direction where the predicted distribution is more similar to the real distribution. Nonetheless, the pure Dice loss is not a convex function, and may stay at the local optimal solution instead of the global optimal solution.

When the data distribution is unbalanced, using the focal loss [39] can cause the transformer model to quickly converge, for which it is much faster than traditional Binary Cross-Entropy (BCE) loss. The focal loss is defined as

\begin{matrix} L_{f o c a l} (p_{t}) = - {(1 - p_{t})}^{γ} log (p_{t}) \end{matrix}

(3)

where

p_{t}

is defined as

\begin{matrix} p_{t} = \{\begin{matrix} p & if y = 1 \\ 1 - p & otherwise \end{matrix} \end{matrix}

(4)

where

y \in {\pm 1}

specifies the ground truth class and

p \in [0, 1]

is the model’s estimated probability for the class with label

y = 1

.

Compared with the cross-entropy loss, the focal loss has an additional modulating factor

{(1 - p_{t})}^{γ}

, with a tunable focusing parameter

γ \geq 0

. For a sample with accurate classification (

p_{t} \to 1

), the modulating factor approaches 0. For samples with inaccurate classification (

1 - p_{t} \to 1

), the modifying factor comes to 1. That is, compared with the cross-entropy loss, the focal loss does not change for samples with inaccurate classification, while decreasing for samples with accurate classification. Overall, it is equivalent to increasing the weight of inaccurate classification samples in the loss function.

Therefore, we introduce a combination of the focal loss and Dice loss to fully utilize the advantage of focal loss in focusing on difficult samples while increasing the loss of difficult samples. The loss function used in this article is formulated as follows:

\begin{matrix} F D L o s s = w_{f} * L_{f o c a l} + (1 - w_{f}) * L_{d i c e} \end{matrix}

(5)

where

w_{f}

is used to adjust the proportion between the focal loss and Dice loss. We set

w_{f}

to 0.01 to avoid the problem of the focal loss value being too large relative to Dice loss.

4. Experiments

In this section, we introduce in detail the datasets, the evaluation index used in the experiment, the performance of the proposed method on different datasets, and the effectiveness of the method, which are verified by designing a series of comparison and ablation experiments. Our experiments were conducted on a GTX 3090 (20G) and Pytorch (1.8.1) platform with Python 3.7.

4.1. Datasets

We used three different datasets to validate the effectiveness of our proposed method.

(1): Cropped AITEX: the AITEX [40] textile fabric dataset consists of 245 images of seven different fabrics. We only used the defective images, and cropped the original images with a resolution of 4096 × 256 to sixteen images with a resolution of 256 × 256 to increase the number of samples. The dataset contains twelve different types of defects, such as broken end, broken yarn, broken pick, and weft curling, which are common types of fabric defects in the textile industry. Several representative samples are shown in Figure 4.
(2): Colored Fabric: this dataset was created based on Roboflow Universe [41]. We manually selected fabric images with precise defect segmentation; the chosen dataset had 677 images for training and 158 images for validation. This dataset contains fabrics with different background colors and texture roughness levels, and there is interference from labeling traces in the image, which further increases the difficulty of detection. A representative sample is shown in Figure 5.
(3): One Class Fabric: this dataset was sourced from Hao Zhou et al. [42], and contains 150 fabric images with more complex background textures than the previous two datasets. The dataset covers different fabric backgrounds, defects, and colors. Figure 6 shows typical defect images.

4.2. Evaluation Index

In order to quantitatively evaluate the performance of the method proposed in this paper, we introduce four different evaluation indexes: accuracy (ACC), sensitivity (SE), precision (PC), and intersection over union (IoU). ACC represents the correct classification rate of all pixels, SE represents the correct classification rate of defective pixels, PC represents the proportion of ground truth in the segmented pixels, with a focus on determining whether the predicted mask accurately covers the ground truth, and IoU calculates the overlap rate between the segmentation results and the ground truth, which is the ratio of their intersection and union. Their formulae are as follows:

\begin{matrix} ACC = \frac{TP + FN}{TP + FN + FP + TN} \end{matrix}

(6)

\begin{matrix} SE = \frac{TP}{TP + FN} \end{matrix}

(7)

\begin{matrix} IoU = \frac{TP}{TP + FN + FP} \end{matrix}

(8)

\begin{matrix} PC = \frac{TP}{TP + FP} \end{matrix}

(9)

where TP (True Positive) is the number of samples that the detection model finds to be positive and are indeed positive, FP (False Positive) refers to the number of samples that the detection model considers positive but are actually negative, and FN (False Negative) indicates the number of samples that the detection model finds negative but are actually positive. The IoU used in segmentation is illustrated in Figure 7.

4.3. Quantitative Comparison

4.3.1. Overall Performance Comparison

Table 1 shows experimental results for our method and various state of the art methods on the three datasets mentioned above. All experiments were based on learning rate =

1 \times 10^{- 4}

, epochs = 1000, and lr_decay_epoch = 900 parameters.

Comparing Table 1 and following experimental results, it can be seen that the performance of Swin-Unet on the Cropped AITEX dataset is even worse than that of UNet, which is why we improved it. Overall, our model is biased towards higher SEs and IoUs to a certain extent, with slightly lower PC values, which is consistent with the fact that our FDL loss function pays more attention to defective parts in training, making it possible for our model to judge the background as a defect. For defect detection, we pay more attention to whether there are missed detections; thus, a slightly lower PC value is acceptable under the condition of ensuring high SE. In addition, it can be seen that our model performs better than the other models on the Colored Fabric dataset with a large amount of data as well as on the One Class Fabric dataset with less data and more complex textures. Figure 8 shows a comparison of the segmentation results between the proposed method and other state-of-the-art methods. Referring to Figure 8, it can be seen that our model’s segmentation results have a larger area compared to the ground truth, which confirms that our IoU value is higher; the reason for the lower PC value is that our model predicts edge areas where defects intersect with the background as defect areas, which to an extent leads to a lower PC value. The last two types of fabric images with complex patterns are from the training set, showing that the other models cannot them fit well due to their complex background textures, and may even learn the background texture as a defect. However, our model can learn the features of defects well even in cases with complex background textures. Overall, the method proposed in this article has excellent detection results, especially with more complex texture backgrounds.

4.3.2. Comparison of Models of Different Sizes

We designed three model structures with different sizes, from small to large, in order: tiny, small, and big. Their respective performance results are shown in Table 2. The results reveals that the performance of the larger model falls short of the smaller one. It can be seen that merely augmenting the model parameters does not necessarily enhance the results on the verification set. This insight steered our approach towards retaining the original downsampling structure of Swin-Unet while crafting an exceptionally lightweight design for the upsampling segment. In this way, the performance of our model was improved even further.

4.4. Ablation Experiments

Our method with only Gabor filter, only FDL, or both components. We compared four different combinations to explore the roles of the two components. In Figure 9, Our(0) refers to our complete model. By comparing the blue and green curves, it can be seen that the Gabor filter with fixed parameters alleviates the model training issue to a certain extent at the beginning, as the model takes more time to learn and train the fabric texture; during this period, although the loss is constantly declining the relevant evaluation criteria have not changed even in a level state. Because of the innovation and strength of our model, there is no training pause during the early stages of training.

From the red curve, it is apparent that adding FDL can make the model pay more attention to the defective part, rather than the texture of the background, during the early stage of training, which makes the training of the model in the early stage smoother. From the yellow Our(2) curve, it can be seen that when the Gabor filter and FDL are combined they do not interfere with each other. The model is close to the state of fitting when the training is close to 200 epochs, and this does not become overfitted to the verification set.

To verify the effectiveness of the proposed innovative modules and loss functions, we conducted experiments on different components of the proposed method on the Cropped AITEX dataset. The results are shown in Table 3. The data saved in the table are the model with the best ACC in the verification set using BCE–Dice and FDL, two different loss functions for the original Swin-Unet. It can be seen that to a certain extent the PC value of the BCE–Dice loss is lower, which is due to the high weight of DICE. Compared with BCE, FDL tends to have higher SE and lower PC performance, while compared to BCE–Dice loss, FDL tends to have higher PC and lower SE performance. In addition, Our(1) without using the Gabor filter performs better than the original Swin-Unet on the Cropped AITEX dataset in all aspects. When using fixed parameters for the Gabor filter, the performance of the model on SE is more prominent, which reflects the ability of the Gabor filter to accelerate the convergence of the model while enabling the model to learn more defect features. In this way, the model does not merely fit to the training set, thereby improving the performance of the model on the verification set.

5. Conclusions

In this paper, an efficient transformer-based fabric defect segmentation neural network with a Gabor filter preprocessing operation is proposed. Specifically, the proposed model solves the local optimal fitting problem of transformer models in fabric defect segmentation. The proposed method can be roughly divided into three stages. First, a Gabor filter is embedded in the front of the model, with the aim of eliminating the influence of complex surface textures in the fabric image. The U-shaped structure in the second stage of the network is designed to not be quite completely symmetrical; the lighter upsampling operation compared to downsampling leads to better performance of the model. Finally, multi-stage result fusion is used to obtain the location of defects with high accuracy. Additionally, we propose a new loss function which uses a combination of the weighted focal loss and Dice loss to improve the convergence speed and detection accuracy of the model. Extensive experiments were conducted to evaluate the effectiveness of our model on three datasets. The proposed method not only significantly solves the limitations of transformer-based models, it has excellent performance and potential applications in the industrial field.

In future work, we plan to seek a more efficient method to speed up our model in order to use fewer resources. Due to the difficulty of collecting defect samples in industrial scenarios and the high cost of manual annotation, we intend to focus on detection problems in scenarios where there are only a small number of labeled defect samples and to further enhance detection capabilities in these scenarios.

Author Contributions

Conceptualization and methodology, H.X. and C.L.; software, H.X. and S.D.; validation, C.L., S.D. and L.R.; writing—original draft preparation, H.X. and S.D.; writing—review and editing, C.L., L.R. and G.C.; project administration, L.R.; funding acquisition, C.L., G.C. and B.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Key R&D Program of China (No. 2020YFB1712401), the Natural Science Foundation of China (62006210, 62206252), the Key Project of Public Benefit in Henan Province of China (201300210500), the Key Scientific and Technology Project in Henan Province of China (221100210100), and the Chinese Scholarship Council (No. 202007045007).

Data Availability Statement

Data sharing is not applicable to this paper, as only public datasets were used during the current study and no new datasets were generated.

Conflicts of Interest

The authors declare no conflict of interest.

References

Singh, K.; Kaleka, J.; Kaleka, J. Identification and classification of fabric defects. Int. J. Adv. Res. 2016, 4, 1137–1141. [Google Scholar] [CrossRef]
Srinivasan, K.; Dastoor, P.; Radhakrishnaiah, P.; Jayaraman, S. FDAS: A knowledge-based framework for analysis of defects in woven textile structures. J. Text. Inst. 1992, 83, 431–448. [Google Scholar] [CrossRef]
Tabernik, D.; Šela, S.; Skvarč, J.; Skočaj, D. Segmentation-based deep-learning approach for surface-defect detection. J. Intell. Manuf. 2020, 31, 759–776. [Google Scholar] [CrossRef]
Jing, J.; Wang, Z.; Rätsch, M.; Zhang, H. Mobile-Unet: An efficient convolutional neural network for fabric defect detection. Text. Res. J. 2022, 92, 30–42. [Google Scholar] [CrossRef]
Li, B.; Zou, Y.; Zhu, R.; Yao, W.; Wang, J.; Wan, S. Fabric defect segmentation system based on a lightweight GAN for industrial Internet of Things. Wirel. Commun. Mob. Comput. 2022, 2022, 9680519. [Google Scholar] [CrossRef]
Li, L.; Li, Q.; Liu, Z.; Xue, L. Effective Fabric Defect Detection Model for High-Resolution Images. Appl. Sci. 2023, 13, 10500. [Google Scholar] [CrossRef]
Anitha, S.; Radha, V. Evaluation of defect detection in textile images using Gabor wavelet based independent component analysis and vector quantized principal component analysis. In Proceedings of the Fourth International Conference on Signal and Image Processing 2012, (ICSIP 2012); Springer: Berlin/Heidelberg, Germany, 2013; Volume 2, pp. 433–442. [Google Scholar]
Raheja, J.L.; Kumar, S.; Chaudhary, A. Fabric defect detection based on GLCM and Gabor filter: A comparison. Optik 2013, 124, 6469–6474. [Google Scholar] [CrossRef]
Kang, X.; Zhang, E. A universal defect detection approach for various types of fabrics based on the Elo-rating algorithm of the integral image. Text. Res. J. 2019, 89, 4766–4793. [Google Scholar] [CrossRef]
Ngan, H.Y.; Pang, G.K.; Yung, S.P.; Ng, M.K. Wavelet based methods on patterned fabric defect detection. Pattern Recognit. 2005, 38, 559–576. [Google Scholar] [CrossRef]
Hoffer, L.M.; Francini, F.; Tiribilli, B.; Longobardi, G. Neural networks for the optical recognition of defects in cloth. Opt. Eng. 1996, 35, 3183–3190. [Google Scholar] [CrossRef]
Jing, J.F.; Chen, S.; Li, P.F. Fabric defect detection based on golden image subtraction. Color. Technol. 2017, 133, 26–39. [Google Scholar] [CrossRef]
Cohen, F.S.; Fan, Z.; Attali, S. Automated inspection of textile fabrics using textural models. IEEE Trans. Pattern Anal. Mach. Intell. 1991, 13, 803–808. [Google Scholar] [CrossRef]
Wang, W.; Deng, N.; Xin, B. Sequential detection of image defects for patterned fabrics. IEEE Access 2020, 8, 174751–174762. [Google Scholar] [CrossRef]
Allili, M.S.; Baaziz, N.; Mejri, M. Texture modeling using contourlets and finite mixtures of generalized Gaussian distributions and applications. IEEE Trans. Multimed. 2014, 16, 772–784. [Google Scholar] [CrossRef]
Zhang, H.; Ma, J.; Jing, J.; Li, P. Fabric defect detection using L0 gradient minimization and fuzzy C-means. Appl. Sci. 2019, 9, 3506. [Google Scholar] [CrossRef]
Li, M.; Wan, S.; Deng, Z.; Wang, Y. Fabric defect detection based on saliency histogram features. Comput. Intell. 2019, 35, 517–534. [Google Scholar] [CrossRef]
Li, F.; Yuan, L.; Zhang, K.; Li, W. A defect detection method for unpatterned fabric based on multidirectional binary patterns and the gray-level co-occurrence matrix. Text. Res. J. 2020, 90, 776–796. [Google Scholar] [CrossRef]
Chan, C.h.; Pang, G.K. Fabric defect detection by Fourier analysis. IEEE Trans. Ind. Appl. 2000, 36, 1267–1276. [Google Scholar] [CrossRef]
Tsai, D.M.; Chiang, C.H. Automatic band selection for wavelet reconstruction in the application of defect detection. Image Vis. Comput. 2003, 21, 413–431. [Google Scholar] [CrossRef]
Wang, Q.; Jing, J.; Zhang, L.; Wang, X.; Li, P. Denim defect detection based on optimal Gabor filter. Laser Optoelectron. Prog. 2018, 55, 357–363. [Google Scholar]
Jing, J.F.; Ma, H.; Zhang, H.H. Automatic fabric defect detection using a deep convolutional neural network. Color. Technol. 2019, 135, 213–223. [Google Scholar] [CrossRef]
Yue, X.; Wang, Q.; He, L.; Li, Y.; Tang, D. Research on Tiny Target Detection Technology of Fabric Defects Based on Improved YOLO. Appl. Sci. 2022, 12, 6823. [Google Scholar] [CrossRef]
Huang, Y.; Jing, J.; Wang, Z. Fabric defect segmentation method based on deep learning. IEEE Trans. Instrum. Meas. 2021, 70, 1–15. [Google Scholar] [CrossRef]
Chen, M.; Yu, L.; Zhi, C.; Sun, R.; Zhu, S.; Gao, Z.; Ke, Z.; Zhu, M.; Zhang, Y. Improved faster R-CNN for fabric defect detection based on Gabor filter with Genetic Algorithm optimization. Comput. Ind. 2022, 134, 103551. [Google Scholar] [CrossRef]
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.Y.; Kainz, B.; et al. Attention u-net: Learning where to look for the pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar]
Zhou, Z.; Rahman Siddiquee, M.M.; Tajbakhsh, N.; Liang, J. Unet++: A nested u-net architecture for medical image segmentation. In Proceedings of the Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Granada, Spain, 20 September 2018; Proceedings 4. Springer: Berlin/Heidelberg, Germany, 2018; pp. 3–11. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Zheng, S.; Lu, J.; Zhao, H.; Zhu, X.; Luo, Z.; Wang, Y.; Fu, Y.; Feng, J.; Xiang, T.; Torr, P.H.; et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 6881–6890. [Google Scholar]
Li, K.; Wang, Y.; Zhang, J.; Gao, P.; Song, G.; Liu, Y.; Li, H.; Qiao, Y. Uniformer: Unifying convolution and self-attention for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2023. [CrossRef]
Wang, W.; Chen, W.; Qiu, Q.; Chen, L.; Wu, B.; Lin, B.; He, X.; Liu, W. Crossformer++: A versatile vision transformer hinging on cross-scale attention. arXiv 2023, arXiv:2303.06908. [Google Scholar]
Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-unet: Unet-like pure transformer for medical image segmentation. In Proceedings of the Computer Vision–ECCV 2022 Workshops, Tel Aviv, Israel, 23–27 October 2022; Proceedings, Part III. Springer: Berlin/Heidelberg, Germany, 2023; pp. 205–218. [Google Scholar]
Luan, S.; Chen, C.; Zhang, B.; Han, J.; Liu, J. Gabor convolutional networks. IEEE Trans. Image Process. 2018, 27, 4357–4366. [Google Scholar] [CrossRef]
Hubel, D.H.; Wiesel, T.N. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J. Physiol. 1962, 160, 106. [Google Scholar] [CrossRef]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–18 November 2021; pp. 10012–10022. [Google Scholar]
Li, X.; Sun, X.; Meng, Y.; Liang, J.; Wu, F.; Li, J. Dice loss for data-imbalanced NLP tasks. arXiv 2019, arXiv:1911.02855. [Google Scholar]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Silvestre-Blanes, J.; Albero-Albero, T.; Miralles, I.; Pérez-Llorens, R.; Moreno, J. A public fabric database for defect detection methods and results. Autex Res. J. 2019, 19, 363–374. [Google Scholar] [CrossRef]
Visitesse. D Dataset. 2023. Available online: https://universe.roboflow.com/visitesse/d-h6tqc (accessed on 19 July 2023).
Zhou, H.; Chen, Y.; Troendle, D.; Jang, B. One-Class Model for Fabric Defect Detection. arXiv 2022, arXiv:2204.09648. [Google Scholar]

Figure 1. Swin Transformer block.

Figure 2. The architecture of our model.

Figure 3. Gabor kernel under the condition of

θ = [0, 45, 90, 135], k e r n e l = 256, λ = 15, γ = 0.5

,

σ = 25

.

Figure 3. Gabor kernel under the condition of

θ = [0, 45, 90, 135], k e r n e l = 256, λ = 15, γ = 0.5

,

σ = 25

.

Figure 4. Typical fabric samples in Cropped AITEX dataset.

Figure 5. Representative fabric samples in Colored Fabric dataset.

Figure 6. Representative fabric samples in One Class Fabric dataset.

Figure 7. IoU.

Figure 8. The proposed method compared with other state of the art methods. Analysis of defect detection results: first column, original defect image; second column, ground truth; third column, UNet segmentation results [27]; fourth column, AttuNet segmentation results [28]; fifth column, NestedUNet segmentation results [29]; sixth column, segmentation results of the method proposed by Tabernik et al. [3]; seventh column, segmentation results of the method proposed in this article.

Figure 9. Evaluation index curves of different combinations. Swin-Unet refers to the original Swin-Unet using the BCE–Dice loss; Our(0) refers to our proposed model using the BCE-Dice loss; Our(1) represents the proposed model using FDL without the Gabor filter; Our(2) represents the proposed model trained with fixed parameters of the Gabor filter layer and FDL.

Table 1. Results of various state of the art methods compared with the results of the proposed method on three different datasets.

	Cropped AITEX				Colored Fabric				One Class Fabric
	ACC	SE	PC	IoU	ACC	SE	PC	IoU	ACC	SE	PC	IoU
UNet	0.9908	0.6313	0.6294	0.5219	0.9919	0.7635	0.7758	0.6149	0.8879	0.2579	0.6106	0.2244
AttUNet	0.9906	0.6791	0.6584	0.5429	0.9922	0.7736	0.7811	0.6290	0.8866	0.3307	0.6675	02102
NestedUNet	0.9914	0.5314	0.6840	0.4764	0.9912	0.6790	0.7960	0.5679	0.8866	0.3345	0.5786	0.2643
Tabernik et al. [3]	0.9897	0.5469	0.6311	0.6143	0.9918	0.5899	0.7896	0.4976	0.8273	0.3930	0.5920	0.1003
STUNet(ours)	0.9900	0.7446	0.6236	0.5493	0.9923	0.7914	0.7691	0.6323	0.9003	0.3674	0.7608	0.3370

Table 2. Comparison of models of different sizes.

	Cropped AITEX				Colored Fabric				One Class Fabric
	ACC	SE	PC	IoU	ACC	SE	PC	IoU	ACC	SE	PC	IoU
Tiny	0.9900	0.7446	0.6236	0.5493	0.9923	0.7914	0.7691	0.6323	0.9003	0.3674	0.7608	0.3370
Small	0.9891	0.6808	0.5890	0.5102	0.9923	0.7680	0.7651	0.6233	0.8988	0.2526	0.6825	0.2373
Big	0.9903	0.6563	0.6278	0.5104	0.9919	0.7705	0.7597	0.6217	0.8981	0.3031	0.6244	0.2624

Table 3. Ablation experiment results for our method on the AITEX dataset.

Method	ACC	SE	PC	IoU
Swin-Unet+BCE-Dice	0.9893	0.6060	0.5986	0.4754
Swin-UNet+FDL	0.9903	0.5572	0.6433	0.4657
Our(1)	0.9912	0.6908	0.6417	0.5448
Our(2)	0.9900	0.7446	0.6236	0.5493

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, H.; Liu, C.; Duan, S.; Ren, L.; Cheng, G.; Hao, B. A Fabric Defect Segmentation Model Based on Improved Swin-Unet with Gabor Filter. Appl. Sci. 2023, 13, 11386. https://doi.org/10.3390/app132011386

AMA Style

Xu H, Liu C, Duan S, Ren L, Cheng G, Hao B. A Fabric Defect Segmentation Model Based on Improved Swin-Unet with Gabor Filter. Applied Sciences. 2023; 13(20):11386. https://doi.org/10.3390/app132011386

Chicago/Turabian Style

Xu, Haitao, Chengming Liu, Shuya Duan, Liangpin Ren, Guozhen Cheng, and Bing Hao. 2023. "A Fabric Defect Segmentation Model Based on Improved Swin-Unet with Gabor Filter" Applied Sciences 13, no. 20: 11386. https://doi.org/10.3390/app132011386

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Fabric Defect Segmentation Model Based on Improved Swin-Unet with Gabor Filter

Abstract

1. Introduction

2. Related Work

2.1. Traditional Approaches

2.2. Learning-Based Approaches

3. Proposed Method

3.1. Gabor Filter

3.2. Swin-Unet

3.3. Model Design

3.4. Loss Function

4. Experiments

4.1. Datasets

4.2. Evaluation Index

4.3. Quantitative Comparison

4.3.1. Overall Performance Comparison

4.3.2. Comparison of Models of Different Sizes

4.4. Ablation Experiments

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI