A Novel Involution-Based Lightweight Network for Fabric Defect Detection

Ke, Zhenxia; Yu, Lingjie; Zhi, Chao; Xue, Tao; Zhang, Yuming

doi:10.3390/info16050340

Open AccessArticle

A Novel Involution-Based Lightweight Network for Fabric Defect Detection

by

Zhenxia Ke

^1,2,

Lingjie Yu

²

,

Chao Zhi

²,

Tao Xue

² and

Yuming Zhang

^1,*

¹

School of Textile, Apparel & Art Design, Shaoxing University Yuanpei College, Shaoxing 312000, China

²

School of Textile Science and Engineering, Xi’an Polytechnic University, Xi’an 710048, China

^*

Author to whom correspondence should be addressed.

Information 2025, 16(5), 340; https://doi.org/10.3390/info16050340

Submission received: 12 March 2025 / Revised: 14 April 2025 / Accepted: 18 April 2025 / Published: 23 April 2025

Download

Browse Figures

Versions Notes

Abstract

:

For automatic fabric defect detection with deep learning, diverse textures and defect forms are often required for a large training set. However, the computation cost of convolution neural networks (CNNs)-based models is very high. This research proposed an involution-enabled Faster R-CNN network by using the bottleneck structure of the residual network. The involution has two advantages over convolution: first, it can capture a larger range of receptive fields in the spatial dimension; then, parameters are shared in the channel dimension to reduce information redundancy, thus reducing parameters and computation. The detection performance is evaluated by Params, floating-point operations per second (FLOPs), and average precision (AP) in the collected dataset containing 6308 defective fabric images. The experiment results demonstrate that the proposed involution-based network achieves a lighter model, with Params reduced to 31.21 M and FLOPs decreased to 176.19 G, compared to the Faster R-CNN’s 41.14 M Params and 206.68 G FLOPs. Additionally, it slightly improves the detection effect of large defects, increasing the AP value from 50.5% to 51.1%. The findings of this research could offer a promising solution for efficient fabric defect detection in practical textile manufacturing.

Keywords:

fabric defect detection; involution; neural networks; lightweight networks; textile quality

1. Introduction

The detection of fabric surface defects is an essential step in the quality control of textile manufacturing. These defects may reduce the textile fabric price as much as 45% to 65% [1]. Traditional manual defect detection is often affected by subjective factors. Specifically, with the increase in working time, workers will produce visual fatigue, particularly facing the complex and diversified fabric textures and defect types; thus reducing the detection accuracy [2]. Therefore, it is necessary to research and develop an automatic visual detection system for textile defects detection.

At present, fabric defect detection methods can be divided into the following four categories: (1) Statistical approaches, which mainly include auto-correlation function [3], co-occurrence matrix [4], mathematical morphology [5], and fractal method [6,7,8]. These methods study the statistical properties of the relationships between the gray levels of an image [9]. Wood [10] used a two-dimensional autocorrelation function to describe and analyze the translational and rotational symmetry of carpet fabric patterns. Kwak et al. [11] detected and classified the defects based on thresholding and morphologic processing, combined with a three-stage sequential decision tree. Kang et al. [12] presented a fabric defect segmentation method using basic images and the Elo rating algorithm. This method can detect fabrics with small periodic textures well, but it cannot accurately segment fabrics with large periodic textures [2]. Song et al. [13] analyzed the regional features of fabric surface defects. The saliency of defect regions can be determined using the extreme point density map of the image and the features of the membership function region. An iterative threshold method and morphological processing are used to ensure fabric defects’ precise and accurate detection. Experimental results show that the proposed detection method can detect fabric defects more effectively while also suppressing the interference of noise and background textures. (2) Frequency domain methods-textured images in the spatial frequency domain require a high degree of periodicity [9]. These methods include Fourier transform [14], wavelet transform [15], Gabor transform [16], and the filtering approach. Hoffer et al. [17] combined optical Fourier transform with a neural network to detect defects in fabric. Sari-Sarraf and Goddard [18] developed a vision-based fabric detection system based on wavelet transformation, image fusion, and correlation dimension. Kang and Zhang [19] proposed a universal and adaptive defect detection algorithm based on sparse dictionary learning to detect various fabric texture defects. (3) The model-based method solves the defect detection problem by assuming that the fabric texture without defects follows a specific distribution and estimating the specific parameters of the model based on this distribution [20,21]. Hajimowlana et al. [22] introduced a 1D autoregressive method that can be used for texture modeling and defect detection in web inspection systems. Cohen et al. [21] modeled the defect-free fabric with a Gaussian Markov random field model and compared it to the test image fabrics to detect the fabric defects. (4) The learning-based method is also used. In recent years, due to the rapid development of computer technology, especially the improvement of hardware conditions, convolutional neural networks (CNNs) achieved great breakthroughs in various computer vision fields, including image classification [23], object detection [24,25], and semantic segmentation [26]. Nasim et al. [27] introduced deep learning into fabric defect detection. He uses an indigenous dataset directly sourced from Chenab Textiles, providing authentic and diverse images representative of actual manufacturing conditions. Later, Mei et al. [28] proposed an unsupervised learning-based automated approach by using a multi-scale convolutional denoising auto-encoder network and Gaussian pyramid. Almeida et al. [9] developed a system for detecting more than 50 fabric defects based on a convolutional neural network and studied two false negative reduction methods. The experimental results show the ability of the system to detect many different types of defects with good accuracy while being faster and computationally simple.

To sum up, although traditional machine learning methods can achieve most of the detection requirements, the high complexity of the algorithm and the limitations of detection performance make these methods not robust enough to handle various types of fabrics [2]. The method based on deep learning became a hot research topic and has good development prospects because of its short training time, high detection accuracy, and good robustness.

At present, there are still two limitations for the detection of fabric defects: (1) because of the variety of fabric texture and defect types, especially pattern fabrics whose background and texture are extremely diverse, defect detection models often require large datasets for training, which increase the amount of computation and parameters, and (2) the textures with a large size covering the fabric surface bring huge interference to fabric defect detection.

In recent years, scholars made many efforts in response to these two problems. Wang et al. [29] proposed a lightweight depth-based detection framework that trains models with false datasets and combines image segmentation networks to complete the segmentation and detection of fabric defects. Huang et al. [2] proposed a segmentation-based defect detection network with the three perspectives of sample proportion, the cost of annotation, and computational complexity. Although the method reduces a large number of useless parameters and improves timeliness, the network is only suitable for learning from small defect samples, ignoring the defect detection problem of large size. Xu et al. [30] proposed a deformation defect detection network (D4Net) to extract the differences between high-level semantic features from deep neural networks to emphasize areas where defects may exist; the network performs well in large fabric defect detection, but it cannot well deal with all the cases with missing parts of patterns and defects near areas of natural holes.

In this paper, we introduce involution [31] to solve the above problems. Involution was proposed in March 2021, and it has completely opposite characteristics to ordinary convolution. On one hand, involution could summarize the context in a wider spatial arrangement, thus overcome the difficulty of modeling long-range interactions well, which could help to better extract the features from the defect and the texture separately; on the other hand, involution could adaptively allocate the weights over different positions, so as to prioritize the most informative visual elements in the spatial domain [31]. For the fabric defect detection, theoretically, the involution design can not only reduce parameters and calculation, but is also suitable for the detection of defected fabrics with various defects and textures, especially for the defects and textures at a large size. Therefore, to figure out to what extent the convolution idea can improve the detection model, and how the convolution model can achieve the optimal performance in the fabric industry, in this paper, the involution was firstly applied to a fabric defect detection model by introducing the involution module into Faster R-CNN.

The other parts of this paper are structured as follows: Section 2 introduces the progress of the attention mechanism and target detection algorithm; Section 3 introduces the involution design, network architecture, and loss function; Section 4 provides experimental results and discussions; and finally, the work is concluded in Section 5.

2. Related Work

2.1. Attention Mechanism

The attention mechanism was first proposed in the field of visual images in the 1990s. It is inspired by the biological systems of humans that tend to focus on the distinctive parts when processing large amounts of information [32]. Until 2014, Minh et al. [33] applied the attention mechanism to the recursive neural network (RNN) model to classify images, which caused widespread concern for attention. Later, Bahdanau [34] et al. applied attention mechanisms to the field of machine translation. Then, the attention mechanism developed rapidly in the field of natural language processing [35,36]. Recently, how to use the attention mechanism in a convolutional neural network has also become a hot topic of scholars. Some scholars [37,38] combine local attention-based convolutional neural networks (pACNN) with global attention-based convolutional neural networks (gACNN) for face detection and depression recognition, pACNN focuses on local plaques, and gACNN learns global patterns from the entire facial area. Sun et al. [39] proposed an 11-layer convolutional neural network based on visual attention, placing the attention module after 10-layer convolution and automatically identifying the area of interest according to the local characteristics after convolution, so as to realize the recognition of facial expressions. Zhang et al. [40] developed the channel-wise attention module (CAM) and the spatial-wise attention module (SAM) and integrated it into the network for multispectral pedestrian detection. Inspired by the classical non-local mean method [41] in computer vision, Wang et al. [42] proposed a non-local neural network that captures long-term dependencies via non-local operation. Despite having shown excellent performance, it lacks the mechanism to model the interactions between positions across channels, which are of vital importance in recognizing fine-grained objects and actions [43]. To address this limitation, Yue et al. [43] optimized the non-local modules [42] by using the compact representation for multiple kernel functions with Taylor expansion and proposed a generalized non-local model. Experimental results illustrate the clear-cut improvements and practical applicability of the generalized non-local module on both fine-grained object recognition and video classification.

As a special case of attention mechanism, involution removes many complex things in the attention mechanism. For example, we built a very clean and efficient operation by using only a single pixel’s feature vector to generate an involution kernel (rather than relying on pixel–pixel consultation to generate an attention map) and implicitly encoding the pixel’s position information when generating the kernel (discarding the explicit position encoding). Therefore, this paper introduces involution into the ResNet50 network to extract the features of the fabric defect image.

2.2. Target Detection Framework

There are two main target detection frameworks, which are divided into two-stage target detection and one-stage target detection. Since 2012, Krizhevsky et al. [44] proposed a deep convolutional neural network (DCNN) called AlexNet, and many computer vision applications focus on deep learning methods [45]. Inspired by the convolutional neural network, Girshick et al. took the lead in developing the R-CNN [46,47], which integrated AlexNet [44] to conduct a selective search with the region suggestion method [48] and achieved high detection accuracy [45]. However, the R-CNN model also has many shortcomings, such as slow training and being not easy to optimize, not suitable for large-scale detection, slow when testing, and so on. Considering these shortcomings, He et al. [49] introduced the traditional spatial pyramid (SPP) into the CNN architecture and proposed SPPnet. Using this network, R-CNN achieved significant acceleration without sacrificing any detection quality [45]. However, the fine tuning in SPPnet [49] cannot update the convolution layer before the SPP layer, which limits the accuracy of very deep networks [45]. Later, Girshick et al. [50] proposed Fast R-CNN, which achieves higher detection speed and quality than SPPnet and R-CNN. The above target detection algorithms belong to two-stage detection.

In 2015, Redmon et al. proposed Yolo, which is the first one-stage detector in the era of deep learning [51], taking object detection as a regression problem from image pixels to spatially separated bounding boxes and related class probabilities, completely abandoning the previous detection paradigm of “proposed detection and verification”. Later, Redmon proposed V2 and V3 versions on the basis of Yolo [52,53], which improved the detection accuracy while maintaining a high detection speed. However, compared with the two-stage detector, its positioning accuracy for small targets is significantly reduced. W. Liu [54] et al. proposed SSD, his main contribution is the introduction of multi reference and multi-resolution detection technology, which significantly improves the detection accuracy of a single-stage detector, especially for some small objects. Although the primary detector has a fast speed and simple structure, its accuracy lagged behind the two-stage detector for many years [55]. Lin et al. introduced a new loss function called “focus loss” into the retinal network by reshaping the standard cross entropy loss, and proposed Retinanet [56]. The focal loss enables the one-stage detector to achieve the same accuracy as the two-stage detector while maintaining a very high detection speed [55]. In 2020, Bochkovskiy [57] proposed yolov4, which is a major update of the Yolo series. Its average accuracy (AP) and frame rate accuracy (FPS) on the coco dataset are 10% and 12% higher than yolov3, respectively.

Among the two-stage detectors, Faster R-CNN demonstrated exceptional accuracy in the field of object detection, a crucial attribute for defect detection given the intricate textures and shapes often associated with fabric defects. Secondly, its utilization of multi-scale region of interest (ROI) pooling enables effective handling of scale variations in fabric images, contributing to a robust detection capability adaptable to defects of different sizes and shapes.

Furthermore, Faster R-CNN provides an end-to-end training framework, allowing the network to autonomously learn the relationship between image features and the task of object detection. This not only streamlines the experimental process, but also aids in enhancing the model’s generalization capabilities. Additionally, the Faster R-CNN exhibits good adaptability to self-conducted datasets. The model’s adaptability and flexibility prove valuable in accommodating the specificities of such self-built datasets.

Therefore, considering the combined strengths of Faster R-CNN in terms of object detection accuracy, multi-scale adaptability, end-to-end training framework, and adaptability to self-constructed datasets, in this paper, we choose the classic two-stage target detection framework Faster R-CNN as our detector and ResNet50 as the backbone network of the model. Then, we replace the middle convolution of the bottleneck module in ResNet50 with involution for comparative experiments. The experimental results show that the involution makes the model maintain high detection accuracy and greatly reduces the training parameters and calculation of the model.

3. Method

3.1. Involution Module

Kernels in convolutional neural networks have two significant properties, namely spatial-agnostic and channel-specific. Spatial-agnostic refers to sharing the same kernel in the spatial dimension. Conversely, channel-specific refers to the exclusive ownership of the corresponding kernel in each single channel dimension, which makes the convolution kernel redundant in the channel dimension [31]. The convolution kernel is shown in Figure 1b.

As shown in Figure 1c, involution has completely opposite characteristics compared with convolution; namely, channel invariance and space specificity. In other words, the involution kernels are shared across channels, allowing dynamic parameterization of involution kernels in the spatial dimension to achieve more flexible models. Therefore, the design of involution kernels H ∈ R^H^×^W^×K×K×G (the size of kernel is H × W × K × K × G. G indicates that all channels share the same G kernels) has a totally different characteristic direction compared with the standard convolution. Specifically, involution kernel H_i_,j,…,g ∈ R^K^×^K, g = 1, 2, …, and G, are customized for pixel X_i,_j at the corresponding coordinates (i, j), but the same across all channels. G calculates the number of groups sharing the same involution kernel. By multiplying the input of the involution kernel, the output characteristic map of the involution kernel can be obtained. Involution operation expression is defined by Equation (1).

Y_{i, j, k} = \sum_{(u, v) \in Δ_{K}} H_{i, j, u + ⌊K / 2⌋, v + ⌊K / 2⌋, ⌈k G / C⌉} X_{i + u, j + v, k} .

(1)

The shape of input feature map X determines the shape of the involution kernel, thus ensuring that kernel size and input feature size can be automatically aligned in the spatial dimension.

ϕ

represents the generating function of the involution kernel, and the function of each position (i, j) is shown in Equation (2).

H_{i, j} = ϕ (X_{Ψ_{i, j}})

(2)

where Ψ _i,j is an index of the coordinate (i, j) neighborhood, so X_Ψi,j represents a patch in the feature map that contains it.

As for the design of generating function

ϕ

, let the

Ψ_{i, j}

be the set of single points. That is, X_Ψi,j is the single pixel of coordinate (i, j) on the feature map, thus obtaining an instantiation generated by the involution kernel, as in Equation (3):

H_{i, j} = ϕ (X_{i, j}) = W_{1} σ (W_{0} X_{i, j})

(3)

where W₀ ∈ R^C^/R×C and W₁ ∈ R^{(K×K×G)×C/R} are linear transformation matrices, C represents the number of channels, R refers to the channel reduction ratio, and σ indicates the intermediate batch normalization and nonlinear activation function.

The design of the involution kernel is shown in Figure 1a. For a feature vector on a coordinate point in the input feature graph X, a shape of 1 × 1 × K² is initially obtained through a series of operations such as full connection, batch standardization, and Relu activation. Then, through reshape transformation, the corresponding involution kernel is obtained with the shape of K × K × 1. The output feature graph Y was finally calculated by multiplying and adding operations between the neighborhood feature vectors of the coordinate point and the input feature graph.

3.2. Involution-Based Faster R-CNN Structure

For fabric defect detection in this paper, the target detection framework is Faster R-CNN. The ResNet50 network was used in the main part of the framework (in the experiment in Section 4, we replaced the intermediate convolution block of the bottleneck module in ResNet50 with involution with a kernel size of 7 × 7, and renamed the network to RedNet50). As shown in Figure 2, the left side of the figure is the overall framework of Faster R-CNN and the right side is the RedNet network architecture. Faster R-CNN is mainly composed of four parts, namely, conv layers, region proposal networks (RPN), Roi pooling, and classification. Conv layers are the backbone of the network, also known as RedNet, which is mainly used to extract the feature map of the fabric picture. RedNet consists of two basic blocks, namely identity block and involution (inv) block. The dimensions of the input and output of an identity block are the same, with an implemented multiple concatenation, while the dimensions of the involution block are different, without any concatenation. The RPN network then traverses the whole feature maps extracted from RedNet by using a 3 × 3 slide window. During the traverse process, nine types of anchors are generated according to the three different rates (1:2, 1:1, 2:1) and three different scales of each window. All generated anchors are processed with the following full connection and preliminary bounding box regression to extract the foreground targets from the background.

Proposals were generated and output after the RPN network. The Roi pooling layer mainly collects the input feature maps and proposals and then integrates this information to extract the proposal feature maps before sending them to the subsequent full connection layer to determine the category of fabric defects. The classification layer uses proposal feature maps to calculate the category of the proposals, while the boundary box regression is performed to obtain the final precise position of the detection frame.

3.3. Loss Function

The loss function is used to measure the distance between the prediction tag (anchor) and the real tag. In the Faster R-CNN network, its core part is the RPN network. In order to train the RPN network, the anchor is assigned a binary class tag to judge whether it is an object [23]. We use intersection over union (IoU), that is, the ratio of the intersection area and union area between ground truth and anchor, to indicate the matching degree of the anchor and ground truth (as shown in Figure 3). There are two types of anchors with positive labels [1]: (1) the anchors with the highest IoU overlapping with the ground-truth frame; and (2) the anchors with the IoU overlapping with any ground-truth frame are higher than 0.7. Note that a ground truth can assign positive labels to multiple anchors. In addition, the non-positive anchor locates the negative tag when the IoU value is less than 0.3, and it is discarded when the IoU value is between 0.3 and 0.7. The loss function is calculated by Equation (4).

L ({p_{i}}, {t_{i}} = \frac{1}{N_{c l s}} \sum_{i} L_{c l s} (p_{i}, p_{i}^{*}) + λ \frac{1}{N_{r e g}} \sum_{i} p_{i}^{*} L_{r e g} (t_{i}, t_{i}^{*})

(4)

In Equation (4), i is the index value of the anchor, and p_i is the probability of an object of anchor i. p_i^* represents the label of ground truth. If the anchor is positive, p_i^* equals 1, if the anchor is negative, p_i^* equals 0. t_i represents the four parameterized coordinates of the prediction boundary box, and t_i^* is the ground-truth box related to the positive anchor. Classification loss L_cls is the loss between objects and non-objects, and regression loss L_reg is the loss between the predicted boundary box and the ground-truth box related to positive anchor.

p_{i}^{*} L_{r e g}

indicates that regression losses are only activated at positive anchor points (p_i^* = 1). The regression loss L_reg can be calculated by Equation (5).

L_{r e g} (t_{i}, t_{i}^{*}) = \sum_{i \in {x, u, w, h}} S m o o t h_{L_{1}} (t_{i} - t_{i}^{*})

(5)

where x, y, w, and h denote the box’s center coordinates and its width and height. The function Smooth_L₁ is defined by Equation (6).

S m o o t h_{L_{1}} (x) = \{\begin{matrix} 0.5 x^{2}, & i f | x | < 1 \\ | x | - 0.5, & o t h e r w i s e \end{matrix}

(6)

In addition, the parameter λ is used to balance N_cls and N_reg. The larger λ is, the more important regression is, and the smaller λ is, the more important classification is. Generally, the default value of 10 indicates that the weights of classification and regression are roughly equal.

4. Results

4.1. Dataset and Implementation Platform

In this paper, a total number of 6308 fabric images, including four common fabric defects, holes, stains, yarn defects, and floats, were collected, which were curated from various sources, including apparel factories, literature references, and the “XueLang Manufacturing AI Challenge”. In order to improve the recognition accuracy and robustness of the model, the dataset was expanded using color conversion and rotation techniques on the original collected images.

In the whole dataset, 5608 samples were selected as the training set and validation set (in a 7:3 ratio), and the remaining 10%, namely 700 samples were selected as the test set. The pixel size of all sample images is 512 × 512. Figure 4 shows the parts of the defective fabric image samples.

The experimental environment consists of a Tesla V100 GPU, 62GB RAM, and a Xeon Gold 6139 CPU. The software environment consists of MMdetection. version 2.7, Python 3.7, Pytorch-GPU. version 1.6, and CUDA 10.1.

4.2. Evaluation Metrics

A group of metrics was used to help evaluate the performance of the detection models. Three indexes including Params, floating-point operations per second (FLOPs), and average precision (AP) are adopted to evaluate the detection performance.

The AP value was adopted with the help of recall and precision. IoU refers to the proportion of the overlapping area and union area between the real defect box and the detected defect box. Recall and precision can be calculated by Equation (7).

\{\begin{cases} r e c a l l = \frac{T P}{T P + F N} \\ p r e c i s i o n = \frac{T P}{T P + F P} \end{cases}

(7)

where true positive (TP) is the number of samples that the detection model finds to be positive and indeed positive, false positive (FP) refers to the number of samples that the detection model considers positive but negative, and true negative (TN) is the number of samples that the detection model finds to be negative and indeed negative, and false negative (FN) indicates the number of samples that the detection model finds to be negative but positive.

Therefore, recall is defined as the number of true positives divided by the total number of samples that the model correctly labeled. In other words, recall indicates how many fabric defects the model could find, ignoring the false rate. Precision refers to the proportion of the true positive sample numbers of the total number of samples that the detection model considers positive. Precision indicates to what extent we can trust the model results. That is, among all defects that the model declares, how many of them are true defects.

There is an inverse relationship between precision and recall, where it is possible to increase one at the cost of reducing the other. In most cases and this paper, the AP value is used to measure the comprehensive detection ability of the model. The precision and recall values can draw a curve, and the area enclosed by the precision–recall curve is used as the value of AP (displayed in Equation (8)).

A P = \int_{0}^{1} P (R) d R

(8)

where P indicates the precision and R represents the recall.

Params and FLOPs were utilized to assess the computation cost, that is, the lightweight property of the detection model. Params refers to the total number of parameters needed in the network model, and FLOPs stands for floating point arithmetic, indicating the complexity of the model.

In addition, to validate the repeatability of the results, the 5-fold cross-validation was performed. Table 1 indicates the results of the 5-fold cross-validation for Faster R-CNN (with the RedNet50 backbone) mode.

Batch size is a critical parameter that determines the magnitude of each gradient update. A larger batch size, while providing more accurate gradient directions, reduced oscillations, and faster data processing, can strain memory capacity. Conversely, a smaller batch size conserves memory but increases the risk of becoming trapped in local optima. After a meticulous comparative analysis and fine-tuning process, we opted for the Adam optimizer in conjunction with a batch size of 64.

The choice of learning rate is a delicate balance. If set too small, convergence becomes sluggish, while a learning rate that is too large can lead to non-convergence. Our training strategy involved an initial 100-epoch phase, using a learning rate of 1 × 10⁻⁴. Subsequently, we implemented a transfer learning strategy, preserving the weights acquired during the first 100 epochs and continuing the training process with a learning rate of 1 × 10⁻⁵ for the remaining epochs. This approach not only enhances training efficiency, but also safeguards against detrimental weight adjustments, ensuring the stability and reliability of the model’s performance.

4.3. Experimental Results

To verify the effectiveness of the proposed method for fabric defect detection, two groups of comparative experiments were conducted in this paper. In the first experiment, the effects of involution and convolution on model performance were discussed, respectively. The primitive Faster R-CNN framework and ResNet50 network were used as a contrast. The experimental results are shown in Table 2. It can be seen from Table 2 that the training parameters and calculation amount of the model were reduced by 24% and 15%, respectively; the AP_L of large target bounding box was enhanced from 50.5 to 51.1. The result indicates that involution has obvious advantages over convolution in the fabric defect detection performance. Examples of the detection results are shown in Figure 5.

In addition to the original convolution-based Faster R-CNN, a comparative experiment between the proposed approach and the other two representative detection models, RetinaNet and Mask R-CNN, was also carried out. The comparison results are displayed in Table 2. As shown in Table 2, the Params and FLOPs of the proposed model are 31.21 M and 176.19 G, respectively. These values were reduced by 31.8%/13.7% and 14.3%/28.7% compared with Mask R-CNN (43.76 M/258.22 G) and RetinaNet (36.17 M/205.69 G), respectively. In addition, the AP₅₀ and AP_L are increased by 4.4%/12.9% and 0.5%/2.1%, respectively. This shows that the computational cost, model Params, and value of the proposed method are all the best among the three models, indicating that the proposed method has a good comprehensive ability.

In addition, we conducted experimental comparisons with the latest relevant papers by reviewing relevant articles published in the past years. As shown in Table 3, these studies primarily used the Faster R-CNN network as a benchmark to evaluate the effectiveness of their proposed methods. It can be seen from Table 3 that the AP value of our proposed method outperforms those in relevant studies, demonstrating the superiority of our approach.

The optimal kernel size of involution achieving the best model performance was also explored. In this group of experiments, the kernel size of involution in RedNet50 network was set to 3 × 3, 5 × 5, 7 × 7, and 9 × 9, respectively. The experimental results shown in Table 4 indicate that the network performs best at the kernel size of involution 7 × 7. However, with the increasing kernel size, the Params and FLOPs grow. In conclusion, the kernel size of 7 × 7 obtains the highest AP value, while the kernel size of 7 × 7 achieves the minimized computation cost. Among the kernel sizes, the difference in Params and FLOPs is much smaller than the difference in terms of the AP value, making 7 × 7 the optimal kernel size. The comparison of the model detection results of different kernel sizes is shown in Figure 6.

5. Conclusions

In this paper, to achieve a satisfactory defect detection accuracy for pattern fabrics at relatively low-level computational cost and parameters, the involution kernels were introduced into the fabric defect detection model by replacing the convolutional kernels in Faster R-CNN. Experiments were carried out on the real-world fabric image dataset to compare the defect detection performance of the proposed model with several other detection models. The results reveal that the proposed model can greatly reduce the parameters and computational cost of the model, while slightly improving the detection accuracy, especially for narrow and long defects. The 5-fold cross-validation results also indicate good repeatability on fabric defects.

It is worth mentioning that involution can be widely applicable to most neural network models not limited to Faster R-CNN. For other detection networks, the effect proven in this paper (cutting down the parameters and computational cost of the model without reducing the detection accuracy) is also possible to achieve with involution. The main contributions of this paper can be concluded as follows: (1) the involution is introduced into the fabric defect detection model to enhance the receptive fields while saving computation costs; (2) the results show that the proposed model can ensure low computation cost and high average precision at the same time; and (3) this research offers a possible solution to turn “heavy” convolution-based detection networks into “lightweight” involution-based networks while maintaining accuracy.

6. Future Work

Beyond its academic contributions, our research holds substantial promise for practical applications within the textile industry. The importance of quality control in fabric manufacturing cannot be overstated, as it is pivotal in ensuring the delivery of flawless products to consumers. Future work will focus on three tasks: firstly, fused images will be studied from different sources (optical images, multispectral images, multiple laser images, etc.) to achieve holistic fabric quality inspection; secondly, considering the numerous types of fabrics and defects, future work will focus on developing the fabric defect generation model based on generative adversarial networks, which could help to collect the training dataset of fabric images quickly and automatically, thus making the detection model more flexible; finally, we tend to implement and update the detection model in a real-time platform to make it more suitable for the real-world production line.

Author Contributions

Conceptualization, L.Y.; methodology, Z.K. and L.Y.; software, Z.K. and L.Y.; validation, C.Z., and T.X.; formal analysis, T.X.; investigation, Z.K. and C.Z.; resources, C.Z., T.X., and Y.Z.; data curation, Y.Z.; writing—original draft preparation, Z.K.; writing—review and editing, Y.Z.; visualization, L.Y.; supervision, Y.Z.; project administration, Y.Z.; funding acquisition, L.Y., C.Z., and T.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Young Talent Fund of Association for Science and Technology in Shaanxi, China (Program No. 20230139), Young Talent Fund of Xi’an Association for Science and Technology (Program No. 959202313055), Innovation Capability Support Program of Shaanxi (Program No. 2022KJXX-40), Key Research and Development Program of Shaanxi (Program No. 2023-YBGY-490 and 2020GY-312), Outstanding Young Talents Support Plan of Shaanxi Universities (2020), and Scientific Research Program Funded by Shaanxi Provincial Education Department (Program No. 23JP054).

Data Availability Statement

The data is available upon requirements.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Srinivasan, K.; Dastoor, P.H.; Radhakrishnaiah, P.; Jayaraman, S. FDAS: A knowledge-based framework for analysis of defects in woven textile structures. J. Text. Inst. 1990, 83, 431–448. [Google Scholar] [CrossRef]
Huang, Y.; Jing, J.; Wang, Z. Fabric defect segmentation sethod based on deep learning. IEEE Trans. Instrum. Meas. 2021, 70, 5005715. [Google Scholar] [CrossRef]
Haralick, M.R. Statistical and structural approaches to texture. Proc. IEEE 2005, 67, 786–804. [Google Scholar] [CrossRef]
Tsai, I.S.; Lin, C.H.; Lin, J.J. Applying an artificial neural network to pattern recognition in fabric defects. Text. Res. J. 1995, 65, 123–130. [Google Scholar] [CrossRef]
Serra, J. Image analysis and mathematical morphology. Biometrics 1982, 39, 536–537. [Google Scholar] [CrossRef]
Conci, A.; Proença, C.B. A fractal image analysis system for fabric inspection based on a box-counting method. Comput. Netw. ISDN Syst. 1998, 30, 1887–1895. [Google Scholar] [CrossRef]
Bu, H.G.; Wang, J.; Huang, X.B. Fabric defect detection based on multiple fractal features and support vector data description. Eng. Appl. Artif. Intel. 2009, 22, 224–235. [Google Scholar] [CrossRef]
Kaneko, H. A generalized fractal dimension and its application to texture analysis-fractal matrix model. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Glasgow, UK, 23–26 May 1989; pp. 1711–1714. [Google Scholar] [CrossRef]
Almeida, T.; Moutinho, F.; Matos-Carvalho, J.P. Fabric defect detection with deep learning and false negative reduction. IEEE Access 2021, 9, 81936–81945. [Google Scholar] [CrossRef]
Wood, J.E. Applying fourier and associated transforms to pattern characterization in textiles. Text. Res. J. 1990, 60, 212–220. [Google Scholar] [CrossRef]
Kwak, C.; Ventura, J.A.; Tofang-Sazi, K. Automated defect inspection and classification of leather fabric. Intell. Data Anal. 2001, 5, 355–370. [Google Scholar] [CrossRef]
Kang, X.; Zhang, E. A universal defect detection approach for various types of fabrics based on the Elo-rating algorithm of the integral image. Text. Res. J. 2019, 89, 4766–4793. [Google Scholar] [CrossRef]
Song, L.; Li, R.; Chen, S. Fabric defect detection based on membership degree of regions. IEEE Access 2020, 8, 48752–48760. [Google Scholar] [CrossRef]
Butler, C.P. The michelson echelon spectroscope. Nature 1899, 59, 606–607. [Google Scholar] [CrossRef]
Mallat, S.G. A theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans. Pattern. Anal. Mach. Intell. 1988, 11, 674–693. [Google Scholar] [CrossRef]
Kumar, A.; Pang, G.K. Defect detection in textured materials using optimized filters. IEEE Trans. Syst. Man Cybern. Part B Cybern. 2002, 32, 553–570. [Google Scholar] [CrossRef] [PubMed]
Hoffer, L.M.; Francini, F.; Tiribilli, B.; Longobardi, G. Neural networks for the optical recognition of defects in cloth. Opt. Eng. 1996, 35, 3183–3190. [Google Scholar] [CrossRef]
Sari-Sarraf, H.; Goddard, J.S. Vision system for on-loom fabric inspection. IEEE Trans. Ind. Appl. 1999, 35, 1252–1259. [Google Scholar] [CrossRef]
Kang, X.; Zhang, E. A universal and adaptive fabric defect detection algorithm based on sparse diction nary learning. IEEE Access 2020, 8, 221808–221830. [Google Scholar] [CrossRef]
Yapi, D.; Allili, M.S.; Baaziz, N. Automatic fabric defect detection using learning-based local textural distributions in the contourlet domain. IEEE Trans. Autom. Sci. Eng. 2017, 15, 1014–1026. [Google Scholar] [CrossRef]
Cohen, F.S.; Fan, Z. Automated inspection of textile fabrics using textural models. IEEE Trans. Pattern. Anal. Mach. Intell. 1991, 13, 803–808. [Google Scholar] [CrossRef]
Hajimowlana, S.H.; Muscedere, R.; Jullien, G.A.; Roberts, J.W. 1D autoregressive modeling for defect detection in web inspection systems. In Proceedings of the 1998 Midwest Symposium on Systems & Circuits (MWSCAS), Notre Dame, IN, USA, 9–12 August 1998; pp. 318–321. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-excitation networks. IEEE Trans. Pattern. Anal. Mach. Intell. 2017, 32, 2011–2023. [Google Scholar] [CrossRef]
He, Z.; Yang, W.; Liu, Y.; Liu, J.; Zhang, J. Insulator Defect Detection Based on YOLOv8s-SwinT. Information 2024, 15, 206. [Google Scholar] [CrossRef]
Zhang, Z.; Huang, X.; Wei, D.; Chang, Q.; Liu, J.; Jing, Q. Copper Nodule Defect Detection in Industrial Processes Using Deep Learning. Information 2024, 15, 802. [Google Scholar] [CrossRef]
Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 833–851. [Google Scholar] [CrossRef]
Nasim, M.; Mumtaz, R.; Ahmad, M.; Ali, A. Fabric Defect Detection in Real World Manufacturing Using Deep Learning. Information 2024, 15, 476. [Google Scholar] [CrossRef]
Mei, S.; Wang, Y.; Wen, G. Automatic fabric defect detection with a multi-scale convolutional denoising autoencoder network model. Sensors 2018, 18, 1064. [Google Scholar] [CrossRef]
Wang, Z.; Jing, J. Pixel-wise fabric defect detection by CNNs without labeled training data. IEEE Access 2020, 8, 161317–161325. [Google Scholar] [CrossRef]
Xu, X.; Chen, J.; Zhang, H.; Ng, W.W.Y. D4Net: De-deformation defect detection network for non-rigid products with large patterns. Inform. Sci. 2021, 547, 763–776. [Google Scholar] [CrossRef]
Li, D.; Hu, J.; Wang, C.; Li, X.; She, Q.; Zhu, L.; Zhang, T.; Chen, Q. Involution: Inverting the inherence of convolution for visual recognition. In Proceedings of the 2021 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021. [Google Scholar] [CrossRef]
Niu, Z.; Zhong, G.; Yu, H. A review on the attention mechanism of deep learning. Neurocomputing 2021, 452, 48–62. [Google Scholar] [CrossRef]
Mnih, V.; Heess, N.; Graves, A.; Kavukcuoglu, K. Recurrent models of visual attention. Adv. Neural Inf. Process. Systems 2014, 3, 2204–2212. [Google Scholar] [CrossRef]
Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. [Google Scholar] [CrossRef]
Derose, J.F.; Wang, J.; Berger, M. Attention Flows: Analyzing and comparing attention mechanisms in language models. IEEE Trans. Vis. Comput. Gr. 2020, 27, 1160–1170. [Google Scholar] [CrossRef] [PubMed]
Yu, X.-M.; Feng, W.-Z.; Wang, H.; Chu, Q.; Chen, Q. An attention mechanism and multi-granularity-based Bi-LSTM model for Chinese Q&A system. Soft. Comput. 2019, 24, 5831–5845. [Google Scholar] [CrossRef]
He, L.; Chan, J.C.-W.; Wang, Z. Automatic depression recognition using CNN with attention mechanism from videos. Neurocomputing 2021, 422, 165–175. [Google Scholar] [CrossRef]
Li, Y.; Zeng, J.; Shan, S.; Chen, X. Occlusion aware facial expression recognition using CNN with attention mechanism. IEEE Trans. Image. Process. 2018, 14, 2439–2450. [Google Scholar] [CrossRef]
Sun, W.; Zhao, H.; Jin, Z. A visual attention based ROI detection method for facial expression recognition. Neurocomputing 2018, 296, 12–22. [Google Scholar] [CrossRef]
Zhang, Y.; Yin, Z.; Nie, L.; Huang, S. Attention based multi-layer fusion of multispectral images for pedestrian detection. IEEE Access 2020, 8, 165071–165084. [Google Scholar] [CrossRef]
Buades, A.; Coll, B.; Morel, J.M. A non-local algorithm for image denoising. Computer Vision and Pattern Recognition. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005. [Google Scholar] [CrossRef]
Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local neural networks. arXiv 2017, arXiv:1711.07971. [Google Scholar] [CrossRef]
Yue, K.; Sun, M.; Yuan, Y.; Zhou, F. Compact generalized non-local network. In Proceedings of the 32nd Conference on Neural Information Processing Systems (NIPS 2018), Montréal, QC, Canada, 3–8 December 2018. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G. ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Systems 2017, 60, 84–90. [Google Scholar] [CrossRef]
Liu, L.; Ouyang, W.; Wang, X.; Fieguth, P.; Chen, J.; Liu, X.; Pietikinen, M. Deep learning for generic object detection: A Survey. Int. J. Comput. Vision. 2019, 128, 261–318. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern. Anal. Mach. Intell. 2016, 38, 142–158. [Google Scholar] [CrossRef]
Uijlings, J.R.R.; van de Sande, K.E.A.; Gevers, T.; Smeulders, A.W. Selective search for object recognition. Int. J. Comput. Vision. 2014, 104, 154–171. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern. Anal. Mach. Intell. 2014, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed]
Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the2016 IEEE Conference on Computer Vision and Pat-tern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar] [CrossRef]
Redmon, J.; Farhadi, A. YOLOv3: An incremental improvement. arXiv 2018, arXiv:2018.02767. [Google Scholar] [CrossRef]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot Multibox Detector; Springer: Cham, Switzerland, 2016. [Google Scholar] [CrossRef]
Zou, Z.; Shi, Z.; Guo, Y.; Ye, J. Object detection in 20 years: A Survey. arXiv 2019, arXiv:1905.05055. [Google Scholar] [CrossRef]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. IEEE Trans. Pattern. Anal. Mach. Intell. 2017, 42, 318–327. [Google Scholar] [CrossRef]
Bochkovskiy, A.; Wang, C.Y.; Liao, H. YOLOv4: Optimal speed and accuracy of object detection. arXiv 2020. [Google Scholar] [CrossRef]
Li, F.; Xiao, K.; Hu, Z.; Zhang, G. Fabric defect detection algorithm based on improved YOLOv5. Vis. Comput. 2023, 40, 2309–2324. [Google Scholar] [CrossRef]
Kang, X. Research on fabric defect detection method based on lightweight network. J. Eng. Fibers Fabr. 2024, 19, 15589250241232153. [Google Scholar] [CrossRef]
Zhou, Q.; Sun, H.; Chen, P.; Chen, G.; Wang, S.; Wang, H. Research on the Defect Detection Algorithm of Warp-Knitted Fabrics Based on Improved YOLOv5. Fibers Polym. 2023, 24, 2903–2919. [Google Scholar] [CrossRef]
Lu, B.; Huang, B. A texture-aware one-stage fabric defect detection network with adaptive feature fusion and multi-task training. J. Intell. Manuf. 2024, 35, 1267–1280. [Google Scholar] [CrossRef]

Figure 1. The comparison between convolution and involution. (a) Schematic diagram of involution instance; (b) schematic diagram of convolution; and (c) diagram of involution.

Figure 2. Overall network architecture diagram; left: Faster R-CNN network architecture; right: RedNet50 network framework.

Figure 3. Schematic diagram of IoU.

Figure 4. Defective fabric image samples.

Figure 5. (a) Above: the original image. Middle: detection effect diagram of ResNet50. Bottom: detection effect diagram of RedNet50. (b) First line: the original image. The second line: detection effect diagram of ResNet50. The third line: detection effect diagram of RedNet50.

Figure 6. Detection renderings of different involution kernel sizes. The 3 × 3, 5 × 5, 7 × 7, and 9 × 9 on the left represent the size of the kernel.

Table 1. Results of 5-fold cross-validation for our model.

Fold	${A P}_{50}^{b b o x}$	${A P}_{L}^{b b o x}$
Fold 1	84.2	52.9
Fold 2	82.6	51.1
Fold 3	83.2	54.2
Fold 4	83.9	53.5
Fold 5	85.5	51.1

Table 2. Comparison of data from different backbone networks.

Detector	Backbone	#Params (M)	FLOPs (G)	${A P}_{50}^{b b o x}$	${A P}_{L}^{b b o x}$
Faster R-CNN	ResNet50	41.14	206.68	85.9	50.5
Faster R-CNN	RedNet50	31.21	176.19	85.5	51.1
Mask R-CNN	ResNet50	43.76	258.22	81.1	38.2
RetinaNet	ResNet50	36.17	205.69	85.0	49.0

Table 3. Results of comparison of the different advanced detectors.

Author	Dataset	${A P}_{50}^{b b o x}$
Author	Dataset	Faster R-CNN	Improved Method
Feng li et al. [58]	Over 8000	51.4%	65.1%
Xuejuan kang et al. [59]	800	88.9%	81.1%
Qihong zhou et al. [60]	2907	61.3%	65.7%
Bingyu lu et al. [61]	47,128	63.1%	68.8%
Ours	6308	85.9%	85.5%

Table 4. Comparison of detection results of different involution kernel sizes.

Detector	Backbone	Kernel Size	#Params (M)	FLOPs (G)	${A P}_{50}^{b b o x}$	${A P}_{L}^{b b o x}$
Faster R-CNN	RedNet50	3 × 3	30.42	173.31	75.1	42.6
Faster R-CNN	RedNet50	5 × 5	31.84	178.49	80.7	46.1
Faster R-CNN	RedNet50	7 × 7	31.21	176.19	85.5	51.1
Faster R-CNN	RedNet50	9 × 9	31.80	178.79	78.8	45.4

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ke, Z.; Yu, L.; Zhi, C.; Xue, T.; Zhang, Y. A Novel Involution-Based Lightweight Network for Fabric Defect Detection. Information 2025, 16, 340. https://doi.org/10.3390/info16050340

AMA Style

Ke Z, Yu L, Zhi C, Xue T, Zhang Y. A Novel Involution-Based Lightweight Network for Fabric Defect Detection. Information. 2025; 16(5):340. https://doi.org/10.3390/info16050340

Chicago/Turabian Style

Ke, Zhenxia, Lingjie Yu, Chao Zhi, Tao Xue, and Yuming Zhang. 2025. "A Novel Involution-Based Lightweight Network for Fabric Defect Detection" Information 16, no. 5: 340. https://doi.org/10.3390/info16050340

APA Style

Ke, Z., Yu, L., Zhi, C., Xue, T., & Zhang, Y. (2025). A Novel Involution-Based Lightweight Network for Fabric Defect Detection. Information, 16(5), 340. https://doi.org/10.3390/info16050340

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Involution-Based Lightweight Network for Fabric Defect Detection

Abstract

1. Introduction

2. Related Work

2.1. Attention Mechanism

2.2. Target Detection Framework

3. Method

3.1. Involution Module

3.2. Involution-Based Faster R-CNN Structure

3.3. Loss Function

4. Results

4.1. Dataset and Implementation Platform

4.2. Evaluation Metrics

4.3. Experimental Results

5. Conclusions

6. Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI