YOLO-BGS Optimizes Textile Production Processes: Enhancing YOLOv8n with Bi-Directional Feature Pyramid Network and Global and Shuffle Attention Mechanisms for Efficient Fabric Defect Detection

Lu, Gege; Xiong, Tian; Wu, Gaihong

doi:10.3390/su16187922

Open AccessArticle

YOLO-BGS Optimizes Textile Production Processes: Enhancing YOLOv8n with Bi-Directional Feature Pyramid Network and Global and Shuffle Attention Mechanisms for Efficient Fabric Defect Detection

by

Gege Lu

,

Tian Xiong

and

Gaihong Wu

^*

College of Textile Engineering, Taiyuan University of Technology, Jinzhong 030600, China

^*

Author to whom correspondence should be addressed.

Sustainability 2024, 16(18), 7922; https://doi.org/10.3390/su16187922

Submission received: 6 August 2024 / Revised: 24 August 2024 / Accepted: 9 September 2024 / Published: 11 September 2024

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Timely detection of fabric defects is crucial for improving fabric quality and reducing production losses for companies. Traditional methods for detecting fabric defects face several challenges, including low detection efficiency, poor accuracy, and limited types of detectable defects. To address these issues, this paper chose the YOLOv8n model for continuous iteration enhancement in order to improve its detection performance. First, multiscale feature fusion was realized by the Bi-directional Feature Pyramid Network (BiFPN). Second, the Shuffle Attention Mechanism (SA) is introduced to optimize feature classification. Finally, the Global Attention Mechanism (GAM) was used to improve global detection accuracy. Empirical findings demonstrated the improved model’s efficacy, attaining a test set mean average precision (mAP) value of 96.6%, which is an improvement of 3.6% compared to the original YOLOv8n. This validates that YOLO-BGS excels in detecting textile defects. It effectively locates these defects, minimizes resource waste, and fosters sustainable production practices.

Keywords:

production losses; fabric defect; YOLOv8n; attention mechanism; feature extraction

1. Introduction

Fabric defect detection is a crucial step in the quality control process in textile production, which aims to identify and locate defects occurring in fabrics [1,2]. Through fabric defect detection, fabric defects can be detected and repaired early, effectively improving the overall quality of the fabric, reducing resource waste and reducing labor costs [3,4]. According to the survey, fabric defects will cause major losses to companies. If a piece of fabric has defects, its value will decrease by approximately 45–65% [5]. At the same time, fabric defects can lead to downtime, rework and repair in the production process, thereby increasing production time and reducing production efficiency. And the company may have to invest more human, material and financial resources to correct these deficiencies, resulting in higher production costs. In addition, hidden factors such as reduced product competitiveness and increased production risks due to fabric defects also negatively impact companies’ profitability. Hence, it is of utmost urgency and significance to put forward a method for detecting fabric defects automatically that is both highly efficient and precise [6].

The two main categories of previous research on fabric defect detection are computer vision methods and traditional research methods. Visual detection [7], optical detection [8], and manual detection [9] are the three main categories of traditional research methods. For example, early researchers were able to identify fabric defects using image processing and analysis techniques [10]. Additionally, researchers can use optical instruments to detect defects in textiles, such as by using infrared, ultraviolet and other specific wavelengths of light to detect fabric defects. Manual detection involves examining textiles through visual observation and tactile assessment. However, these traditional methods have several limitations. Fabric image processing techniques are often designed and optimized for specific defect types or image conditions, which can result in poor performance under different defect types or changing conditions. Optical detection methods impose strict requirements on the light source, temperature, humidity and vibration, which increases the complexity and cost of detection. In manual inspection, inexperienced inspectors face problems such as low detection speed (only 12 m per minute) and low accuracy. Furthermore, inspector fatigue can lead to reduced efficiency and quality [11,12], and significantly reduce the overall production efficiency. Therefore, computer vision methods that enable timely and automated detection of fabric defects are widely used [13].

Research in computer vision has focused on two main directions: traditional machine learning methods on the one hand and deep learning methods on the other. The traditional field of machine learning includes a variety of techniques, including Gray Level Co-Occurrence Matrix (GLCM), Local Binary Pattern (LBP), Support Vector Machine (SVM) and Artificial Neural Network (ANN) [14,15,16]. Li et al. [17] developed a pattern free fabric defect detection method based on GLCM, which adapts to different textile image features and has lower computational complexity. Ghosh et al. [18] developed a pattern defect detection system to identify various types of fabric defects using a multi-class SVM algorithm and were able to achieve 98% testing accuracy and exceptional computational efficiency compared to other machine learning methods. Pourkaramdel et al. [19] introduced a new version of LBP and completed a quadruple mode to extract image features of fabric defects, with a detection rate of 97.66%. Anami et al. [20] used two different classifiers, SVM and ANN, to detect fabric images with and without defects and achieved an accuracy of 94% and 86.5%, respectively, with SVM performing better. However, traditional machine learning methods have limitations in that they rely on manually designed features and perform poorly on new types of defects, along with high computational demands, limiting their generalization abilities. To overcome these problems, the emergence of deep learning methods provides a more robust and automated way to handle fabric defect detection tasks [21,22].

There are two kinds of deep learning methods: two-stage deep learning methods and single-stage deep learning methods. Two-stage deep learning methods mainly include the “Regional Proposal” and “Defect Recognition” stages. The common two-stage deep learning methods are mainly R-CNN series models, such as traditional R-CNN, Fast R-CNN, Faster R-CNN, and Mask R-CNN. Ren et al. [23] generated high-quality regional proposals through end-to-end training combined with Fast R-CNN for detection and achieved a frame rate of 5 fps on GPU with good results. Revathy et al. [24] proposed Improved Mask R-CNN (IM-RCNN) for fabric defect detection and achieved a high precision of 0.978, which is higher than MobileNet-2, U-Net, LeNet-5, and DenseNet, respectively, by 6.45%, 1.66%, 4.70% and 3.86%. Zhou et al. [25] optimized Faster R-CNN by integrating the Feature Pyramid Network (FPN), Deformable Convolution (DC) network and Distance IoU loss function and achieved 62.07% mAP and 97.37% AP50 in the DAGM 2007 record. However, two-stage deep learning methods rely on prior knowledge and require the classifier to be retrained for new types of errors or unbalanced samples, which increases the workload and complexity. In addition, these methods require multiple forward and backward propagations, which leads to high computational requirements and limits real-time performance and application scope. Therefore, flexible and fast single-stage deep learning methods have become an important approach in detecting fabric defects.

Single-stage object detection methods include the You Only Look Once (YOLO) series, Single Shot MultiBox Detector (SSD) and RetinaNet. In Cheng et al. [26], instead of the original convolution layers in YOLOv3, depth-separable convolutions and residual blocks were used, improving the spatial pyramid pooling module for better detection results while reducing the parameters and processing load. However, YOLOv3 has some shortcomings, namely insufficient small target detection, slower detection speed, and high computational load in fabric defect detection, which requires further improvement and optimization. Jing et al. [27] combined YOLOv3 with the k-means algorithm for dimension clustering of target frames, which improved its ability to detect small targets and made its application to plain weave fabrics more effective. At the same time, YOLOv5 further improved on YOLOv3, providing higher detection accuracy, faster detection speed and wider application range. Hu et al. [28] evaluated the detection performance of the generated fabric defect and non-defect images and showed that the average precision of the recursive convolutional neural network model of YOLOv5 improved by 6.13% and 14.57%. However, for fabric defect detection, the direct application of the original YOLOv5 model does not work well due to the numerous defect types, sparse distribution and unbalanced samples, and the defects are usually small and difficult to detect. In view of the above deficiencies, Li et al. [29] introduced FD-YOLOv5, an improved YOLOv5 fabric defect detection model, with an enhanced detection capacity for smaller defects. The accuracy of the improved model was increased by 8.3% and 3.2%, and the parameters, calculation amount and weight were reduced by 8.4%, 11.2% and 14.3%, respectively.

In addition, compared to YOLOv3 and YOLOv5, YOLOv7 has a more optimized network structure and parameter design, which enhances the effectiveness and speed of fabric defect detection even more. However, YOLOv7 still has limitations in detection performance and feature extraction. To solve these problems, Kang et al. [30] developed the AYOLOv7-tiny network, in which the introduction of an empty depth layer followed by a stepless volume improved feature extraction ability, simplified model complexity, and reduced computational effort. Moreover, the appearance of YOLOv8 further improved the detection performance. Talaat and Zain Eldin [31] found that the object detection method based on the YOLOv8 model had significantly improved accuracy and speed, and the accuracy rate in all categories was up to 97.1%.

Previous studies have contributed to the refinement of the YOLOv8 model. However, they did not sufficiently consider the variability of fabric defect types and overlooked the training inefficiencies caused by increased network depth and parameter number. Therefore, this paper proposes an improved YOLOv8n model named YOLO-BGS. First, by integrating BiFPN, YOLOv8n successfully achieves cross-scale feature integration, significantly improving the detection accuracy while maintaining model speed. Second, the integration of the SA mechanism combines spatial and channel attention, strengthening the feature extraction capabilities. Finally, the introduction of GAM improves the global interactivity and detection accuracy of the model, making it more appropriate for target detection on multiple scales.

Here are the remaining four parts of this paper. Section 2 elaborates on the improvement method of YOLOv8n. Section 3 introduces the data set, experimental environment, parameter setting, evaluation index and experimental results. Section 4 analyzes the experimental results. The entire text is summarized in Section 5, along with a plan for further research in this area.

2. Theory and Methods

2.1. YOLOv8n

YOLOv8, the most recent iteration of the object detection algorithms in the YOLO series, significantly improves detection performance and has proven its effectiveness as a fast single-stage detection method. This algorithm includes four network architectures: YOLOv8n, YOLOv8m, YOLOv8l and YOLOv8x, covering various application scenarios. Given the stringent hardware performance and real-time requirements for fabric defect detection tasks, the lightweight YOLOv8n model was selected and optimized in this study. As seen in Figure 1, the four primary components of the YOLOv8n model are the input layer, backbone layer, neck layer, and output layer.

The main task of the input layer is to accept the input data and pass it to the next layer. The input layer contains input nodes, each corresponding to a feature of the input data. In the image classification task, the input layer can convert the pixel values of the image into a tensor form that the neural network can process.

The backbone layer is responsible for feature extraction of the processed data. Conv, C2f and SPPF modules are commonly used structures. The Conv module expands the receptive field of the network through multiple convolution operations and performs nonlinear transformations through activation functions (such as ReLU) to extract high-level features. The C2f module is a special connection structure introduced by YOLOv4. While preserving low-level features, it increases the expressiveness of high-level features and improves the accuracy and speed of the network. The SPPF module is a spatial pyramid pool structure used to capture feature information of different scales. All three play an important role in YOLOv8n and together contribute to the performance improvement of the network.

The main function of the neck layer is to fuse the feature information. FPN and PAN are both network structures used to handle multiscale features. By adopting side-joining and path aggregation strategies, they can effectively merge different levels of feature information and improve the performance of object detection and semantic segmentation tasks.

The output layer is used to generate the final result of target detection, which usually includes three sub-layers: anchor layer, prediction layer and detection layer. The anchor layer is used to generate the default box, the prediction layer is used to predict the target category and location, and the detection layer is used to generate the final target detection result.

2.2. Bi-Directional Feature Pyramid Network (BiFPN)

During the detection process, fabric defects pose significant challenges due to their varying scales and sizes as well as the fact that they can easily be affected by interference from neighboring areas. To solve these problems, YOLOv8n combines the Path Aggregation Network (PAN) and Feature Pyramid Network (FPN) features, improving the capability to locate and detect defects of different sizes. This approach not only effectively integrates multiscale feature information, but also successfully models contextual information at different levels within the image. However, the introduction of FPN and PAN also leads to an increase in network calculation complexity as well as the complexity of hyperparameter tuning during model design and model optimization. Therefore, this study used the Bi-Directional Feature Pyramid Network (BiFPN), as shown in Figure 2. BiFPN achieves an excellent balance between accuracy and efficiency through weighted feature fusion and bidirectional cross-scale connections [32]. In addition, this network can be flexibly combined with various mainstream backbone networks (such as ResNet and EfficientNet), making it a powerful feature fusion network architecture for target detection tasks.

Compared to the PANet architecture, BiFPN’s basic modules have more comprehensive and flexible design features. To address the limitations in the effectiveness of feature fusion, BiFPN first optimized nodes with only single input or output paths during the design phase. This decision simplified the network structure and optimized the overall architecture of the bidirectional network, thereby increasing its efficiency. Next, a skip connection was added between the original input and output nodes. This design strategy greatly facilitated the fusion of cross-layer features and greatly improved the detection accuracy. Finally, through multiple iterations, BiFPN integrated features from different paths into a single feature layer, achieving the fusion of more complex features. Figure 3 shows the process.

BiFPN, with its unique bidirectional channel design, achieves cross-scale connections by integrating features from the feature extraction network with those from the bottom-up path. This design preserves surface semantic information while preventing excessive loss of deep semantic information. Unlike traditional feature fusion methods, BiFPN does not apply uniform weights to features of different scales. Uniform weighting can result in unbalanced output feature images when input features have different resolutions. Therefore, BiFPN assigns different weights depending on the importance of the input features and strengthens the feature fusion by repeatedly applying this structure.

During the weighted fusion process, BiFPN uses Fast Normalized Fusion, a technique that controls weights in the range of 0 to 1, thereby increasing training speed. The implementation of cross-scale connections is based on skip connections and bidirectional paths, which together achieve an optimal combination of weighted fusion and bidirectional cross-scale connections. Equation (1) illustrates this particular computation procedure:

O = \sum_{i} \frac{w_{i}}{ε + \sum_{j} w_{j}} \cdot I_{i}

(1)

where

O

represents the output feature,

I_{i}

represents the input feature, and

w

represents the weight of each node in the network. Notably, the learning rate

ε

is set to 0.0001 to ensure the stability of the training process and avoid unnecessary fluctuations in the results.

2.3. Global Attention Mechanism (GAM)

Within the domain of fabric defect detection, many variables, including the type of fabric, the production process, and changes in the state of the machine, can affect the type and severity of defects. These factors are complex and closely related. The Squeeze and Excitation (SE) attention mechanism improves the model’s focus on critical channels by thoroughly exploring the internal connections of the channel attention modules, thereby strengthening the feature representation capabilities. However, it only considers the feature relationship in the channel dimension and may not be able to refine the information in the spatial dimension. To address this limitation, this paper introduces the Global Attention Mechanism (GAM), which significantly improves the perception of global features of the network by effectively capturing global context information. The Channel Attention Mechanism (CAM) and the Spatial Attention Mechanism (SAM) are the two main modules that make up GAM. A two-layer Multi-Layer Perceptron (MLP) is used by the channel attention submodule to reinforce dimensional channel-spatial dependencies and uses a three-dimensional arrangement strategy to ensure information integrity across three dimensions. Figure 4 depicts the complete processing flow, T, and the specific calculation process is shown in Equations (2) and (3):

F_{2} = M_{c} (F_{1}) \otimes F_{1}

(2)

F_{3} = M_{s} (F_{2}) \otimes F_{2}

(3)

The initial state of a given input feature map is denoted by

F_{1}

∈ R^C×H×W. After processing, the intermediate state is

F_{2}

and the result is

F_{3}

. Channel and spatial attention maps are denoted by

M_{c}

and

M_{s}

, respectively. The multiplication of elements is indicated by the symbol

\otimes

.

For CAM, the input feature map undergoes a dimensional transformation. Subsequently, the transformed feature map is introduced into the MLP, which returns its dimensions to their initial state and processes them using the sigmoid function to produce output. This process is shown in Figure 5.

GAM primarily uses convolutional processing, which bears some resemblance to the SE attention mechanism, for SAM. First, the quantity of channels and the computational load are reduced by convolution with the convolution kernel 7. Then, a convolution operation is performed again with a convolution kernel 7 to increase the quantity of channels and maintain the consistency of the quantity of channels. Finally, it uses the sigmoid function for the output. This process is shown in Figure 6.

2.4. Shuffle Attention (SA)

Deep learning makes extensive use of traditional attention mechanisms to capture potential correlations between various locations in the input sequence, thereby significantly improving task performance. The Convolutional Block Attention Module (CBAM) is one of these mechanisms that helps the network focus more intently on target regions by efficiently extracting pertinent features in both spatial and channel dimensions. However, this mechanism requires high computational resources, which can increase the overall complexity of the network.

Therefore, this study presents a more efficient and lightweight Shuffle Attention (SA) mechanism. To increase computational efficiency, the SA mechanism divides the channel dimension into several sub-features and processes them simultaneously. Shuffle units are used by SA to accurately characterize dependencies in both spatial and channel dimensions while processing each sub-feature. Then, all sub-features are combined, and the channel shuffle operator is applied to enable data sharing and merging among various sub-features. This mechanism achieves better performance without increasing the computational cost.

As shown in Figure 7, the SA attention mechanism consists of four core sections:

(1) Feature Grouping: Regarding a given feature map X ∈ R^C×W×H, the channel, width, and height of the feature map are denoted, respectively, by C, W, and H. The feature map X is first split up into G groups X = [X₁, X₂, …, X_G], where X_i ∈ R^(C/G)×W×H; each sub-feature X_i is, thereafter, further subdivided into two branches X_i1 and X_i2 ∈ R^{(C/G)/2×W×H} along the channel dimension. A channel attention map is created by a branch that focuses on the connections between channels. The other branch builds a spatial attention map by focusing on the spatial relationships between features.

(2) Channel Attention: Traditional methods such as SE modules and ECA networks are not suitable for lightweight attention modules due to their excessive parameters. Therefore, this study optimized it. First, channel statistics were generated and global information was embedded using Global Average Pooling (GAP). Then, features were enhanced through a gated mechanism and sigmoid activation function. Equations (4) and (5) show the specific calculation procedure:

s = f_{g p} (X_{i 1}) = \frac{1}{h \times w} \sum_{x = 1}^{h} \sum_{y = 1}^{w} X_{i 1} (x, y)

(4)

X_{i 1}^{'} = σ (f_{c} (s)) \cdot X_{i 1} = σ (W_{1} s + b_{1}) \cdot X_{i 1}

(5)

where

s

denotes the mean pooling feature,

f_{g p}

represents the global average pooling function,

σ

indicates the sigmoid activation function, and

W_{1}

and

b_{1}

∈ R^{((C/G)/2)×1×1} are used as scaling and shifting parameters, respectively.

(3) Spatial Attention: As an addition to channel attention, spatial attention can be considered. The first step in obtaining spatial statistics from

X_{i 2}^{'}

is to apply Group Normalization (GN). In this process, it is necessary to evaluate the error level introduced by GN. This requires a combination of group partitioning strategies, accuracy of statistical estimates, adaptability to different network structures and tasks, and the ultimate impact on model generalization. This evaluation minimizes the error range and optimizes model performance. A linear fusion function is then used to improve

X_{i 2}^{'}

. The fusion linear function can dynamically modify weights. It enhances feature integration and minimizes complexity. Simultaneously, this function ensures efficient computation and lowers the demands of model training. Finally, input features are mapped onto query, key and value vectors through linear projection, thereby capturing various aspects of the input. Note that weights are calculated as dot products over positions between queries and key vectors, thereby quantifying the relevance of information. These weights are reused by Softmax to normalize them into probabilities, which are then used to measure value vectors, create an output feature map weighted by all positions, and highlight important areas. Equation (6) illustrates the specific calculation procedure:

X_{i 2}^{'} = σ (w_{2} \times G N (X_{i 2}) + b_{2}) \times X_{i 2} = σ (w_{2} s_{2} + b_{2}) \times X_{i 2}

(6)

where

s_{2}

represents the normalized feature and

w_{2}

and

b_{2}

∈ R^{((C/G)/2)×1×1} are two parameters that can be continuously trained through the network.

(4) Aggregation: After completing the two attention learning and recalibration features, the two branches need splicing and aggregation. The resulting matrix,

X_{i}^{'} = [\begin{matrix} X^{'} i 1, X^{'} i 2 \end{matrix}] \in R^{(C / G) \times W \times H}

, is obtained by simple contact fusion. The sub-features are then combined and channel shuffle is applied to create an information flow between groups along the channel dimension. The SA’s final output matches the input size, which makes integrating the SA module into other structures straightforward.

3. Experiment and Analysis

3.1. Data Set

In this study, an open-source large-scale dataset based on Internet resources was carefully constructed to ensure comprehensive detection of model performance. The dataset comprises 800 grayscale images, each measuring 416 × 416 pixels, which facilitates data uniformity and simplifies processing. Additionally, applying transformations such as rotation, flipping, and scaling enhances data variability and mitigates overfitting risks. The images represent a range of common defect types, including but not limited to yarn breakage, holes, stains, buttonholes, hairballs, and scuff marks, effectively illustrating typical challenges in fabric production. The dataset was categorized into training, validation, and test sets, with an 8:1:1 distribution ratio. Thus, the training set contains 640 images, the validation set includes 60 images, and the test set comprises 80 images. Figure 8 provides a visual representation of the selected images in this dataset. Furthermore, the dataset was annotated using a markup tool. The training set’s labeled data includes four categories of case distributions, bounding boxes, center coordinates of bounding boxes (x, y), along with width and height measurements. For further details, refer to Figure 9.

3.2. Experimental Environment and Evaluation Index

The following configuration was used for the experimental setup. The operating system was Linux, the CPU was an Intel Xeon E5-2680 v3, and the GPU was an NVIDIA GeForce RTX 2080 Ti with 11 GB of VRAM. The experiment utilized the PyTorch 1.7.0 framework and Python 3.8. The hyperparameters included an initial learning rate of 0.01, 150 training epochs, and a batch size of 32.

Four fundamental evaluation measures were included in this work to fully assess the enhanced YOLOv8 algorithm’s performance in fabric defect detection: F1, precision (P), recall (R), and mean average precision (mAP). The following is the corresponding calculation formula (see Equations (7)–(10)):

P = \frac{T P}{T P + F P}

(7)

R = \frac{T P}{T P + F N}

(8)

F 1 = \frac{2 \times P \times R}{P + R}

(9)

m A P = \frac{\sum_{q = 1}^{Q} A P (q)}{Q}

(10)

Among them, precision reflects how accurately the model determines positive examples, i.e., the proportion of all samples predicted to be positive that are actually positive. Recall measures the fraction of genuine positive samples accurately predicted as such. Multiple class performance in object detection tasks is assessed using the mean average precision (mAP), which is the average of the average precision of those classes. F1, the weighted harmonic average of precision and recall, is used to evaluate the accuracy and recall performance of the model as a whole. Its value is between 0 and 1, and the closer it is to 1, the better the overall performance of the model [33].

3.3. Before and after Improvement

In order to verify an advantage of this model before and after improvement, the optimized model was compared with the original YOLOv8n using the same data set while ensuring the relevant configuration remained unchanged. Table 1 displays the results of the comparison.

Table 1 shows that the YOLO-BGS model significantly outperforms the original YOLOv8n model in four key metrics: precision, recall, F1 score, and mAP, with performance improvements of 3.7%, 4.1%, 3.9%, and 3.6%, respectively. These improvements are mainly due to the following points: First, the fusion of targets at various scales is optimized and a better balance between accuracy and efficiency is ensured by the introduction of BiFPN architecture. Second, the application of GAM efficiently extracts defect features from images, increasing the object detection accuracy by a significant margin. Third, the SA module’s integration enables the classification model to concentrate more on regions associated with fabric flaws. This increases the computational efficiency and generalization ability of the model while maintaining high accuracy, thereby improving the model stability. Overall, the YOLO-BGS model demonstrated higher precision in fabric defect detection and achieved model lightweighting, meeting the requirements for precise and timely detection of fabric defects.

In order to comprehensively evaluate and compare the performance of fabric defect detection before and after model optimization, this study constructed a Precision–Recall curve comparison (PR curve) based on the Intersection over Union (IoU) threshold provided, which was set to 0.5 during the test phase. See Figure 10 and Figure 11 for details.

3.4. Ablation Experiment

To comprehensively assess the effectiveness of the YOLO-BGS model in terms of design and lightweight properties, an ablation experiment was conducted in this study to investigate the new network structure in depth. The experiment aimed to highlight the real value of each improvement strategy and provide a more intuitive demonstration of model performance. The relevant test results are shown in detail in Table 2 and Figure 12 and Figure 13.

According to the data in Table 2 and Figure 12 and Figure 13, the combination of the three innovative mechanisms and the YOLOv8n submodel significantly increased the F1 value of the model, up to 90.7%. This optimization is mainly because the bidirectional information flow structure in BiFPN plays an important role. By efficiently integrating multi-scale feature information, it raises the mAP values by increasing the recall and detection rates as well as detection accuracy. In addition, BiFPN uses an innovative mechanism to dynamically assess the influence of each input feature by introducing trainable parameters. This design significantly enhances the network’s ability to integrate multiple layers of feature information.

In addition, after adding the SA mechanism, a notable 2.3% increase in the mAP value of the enhanced model was observed compared to the original model. This substantial improvement is primarily attributed to the SA mechanism’s enhanced ability to precisely capture and focus on key feature areas. This allows the model to quickly and accurately locate targets during image processing, significantly increasing the processing speed. Concurrently, SA handles the sub-features of the channel dimension in parallel groups and leverages the channel shuffle operator to facilitate information exchange among the distinct sub-features. This approach enhances performance while minimizing computational costs.

Figure 14 shows the effect of adding the SA mechanism on the feature extraction ability of the model. The changes brought about by this mechanism can be intuitively understood by comparing the visualized results. The outcomes demonstrate how well the model with the SA mechanism performs in terms of accuracy and speed of detection. This capability enables the model to process and analyze large amounts of data faster, significantly improving the real-time performance and response speed in detecting fabric defects.

Finally, after integrating the GAM, the mAP value of the model increased by 1.3%, up to 96.6%. This significant improvement is mainly due to GAM’s enhancement of YOLOv8n’s perception capabilities by introducing global attention and aggregation modules. These additions enable more precise focusing on fabric defect areas during the image analysis. In addition, GAM reconstructed the channel and spatial attention submodules using 3D array techniques and a two-layer MLP structure. This reconstruction preserves information across three dimensions, effectively strengthens the interdependence between channels and spatial dimensions, and significantly improves the model’s feature representation capabilities.

To fully assess the actual effect of the GAM on model performance, this study conducted detailed visualization comparisons before and after structural integration. According to Figure 15, the introduction of the GAM has made the model more focused on critical and relevant feature areas in classification and localization tasks. This modification greatly increases the model’s detection efficiency at multiple scales and demonstrates its effectiveness in handling diverse defect sizes.

The above experimental data and visualization results show that the four improvement measures proposed in this study can improve the detection efficiency of the model under various combinations. It also proves to some extent the effectiveness and practicality of these improved methods.

3.5. Other Comparative Experiments

To assess the advantages of the YOLOv8n model in detection performance, this study conducted a series of experiments and compared it with mainstream single-stage and two-stage object detection models. The comparison results are shown in Figure 16 and Table 3.

According to Table 3 and Figure 16, the detection accuracy of the Faster R-CNN model is mainly limited by the dependence on predefined anchor boxes to generate candidate regions. This design can result in incomplete coverage of all defect areas, resulting in missing or false detections, which is a significant disadvantage compared to other two-stage detection models. Furthermore, Faster R-CNN’s complicated network structure and many parameters raise its computational cost, making its deployment on end devices more difficult. As the network depth increases, lower resolution feature maps may cause features of small objects to be lost during propagation, weakening the SSD model’s performance in detecting small fabric defects. In contrast, YOLOv3 increases the precision of fabric defect detection at multiple scales while maintaining its speed and lightweight properties through fine division and efficient feature fusion. As an updated version of YOLOv3, YOLOv5 shows higher accuracy in detecting tasks of different scales and complexity. Efficient training strategies such as data augmentation and loss function optimization not only accelerate convergence but also improve the generalization ability of the model. Although the lightweight YOLOv7 has some detection performance and generalization capabilities, there is still room for further optimization in handling complex scenes and small object detection. The YOLO-BGS model proposed in this study uses a more advanced feature extraction network and improved multi-scale detection technology. The improved model realizes high real-time accuracy of fabric multi-scale defect detection. This breakthrough enables the fabric defects detection process to be completed quickly, significantly improving production efficiency and detection speed.

4. Discussion

Fabric defects such as broken yarn, holes, wear and tear, etc. directly affect the appearance and quality of products [34], leading to a decrease in production efficiency and an increase in costs [35]. In view of this, this paper presents an improved detection method based on YOLOv8n, which has certain advantages in fabric defect detection.

YOLOv8 is characterized by its flexible and efficient performance in the field of object detection. However, further optimization is required in terms of computational load, small object detection, and trade-off between real-time processing and accuracy. On this basis, this study chose YOLOv8n as a minimum weight model that reduces computation and storage requirements while maintaining a higher level of performance. The improved YOLOv8n incorporates an enhanced BiFPN structure to improve multi-scale feature fusion efficiency, thereby enhancing the detection ability for small objects and complex scenes. At the same time, GAM was added to improve the understanding of the global structure and relationship of the network, improve the ability of target perception at multiple scales, and improve the detection accuracy. In addition, by adding a SA mechanism, the model can process the input data more efficiently and reduce the amount of unnecessary calculations, thereby increasing the efficiency of object detection. The experimental data recorded in Table 1, Table 2 and Table 3 strongly support the excellent performance of the model developed in this study in detecting fabric defects.

In the comparative analysis discussed in this study, the Faster R-CNN model is not suitable for real-time fabric detection due to its lower accuracy and higher computational cost. The SSD model strikes a better balance between speed and precision compared to the original YOLO and Faster R-CNN. However, due to the reliance on fixed anchor frame dimensions and proportions to predict the target bounding box, it does not adapt well to defects of different sizes and shapes in the fabric. The YOLOv3-tiny model is easier to deploy on embedded or mobile devices due to its smaller size and faster inference speed, significantly reducing computational costs. However, due to its lightweight nature, there may be some challenges when dealing with small targets and complex scenes. YOLOv5 has a more optimized network structure compared to YOLOv3 and uses multi-scale feature maps for object detection, which can better handle complex scenes and various targets. YOLOv7 integrates advanced optimization strategies such as the Mish activation function and Cross-Stage Partial Network (CSP) structure to enhance the model accuracy and generalization ability. Nevertheless, balancing low computational cost with efficient and accurate object detection at multiple levels remains a challenge. The YOLO-BGS proposed in this paper has advantages in lightweight design, real-time processing, accuracy, flexibility and scalability, and integrates new backbone architectures and loss functions. These strengths enable YOLO-BGS to deliver excellent performance in various fabric defect detection scenarios.

However, this study has certain limitations. Although the integration of BiFPN significantly improves model performance in object detection tasks, BiFPN includes several hyperparameters, such as weights and fusion modes, between different scale features. The values of these hyperparameters have a large impact on model performance. Inappropriate hyperparameter settings can lead to reduced performance or instability. When introducing the GAM mechanism, care must be taken to ensure that the model focuses on global structures without neglecting local details. This requires careful consideration and adjustments in model design, requiring extensive experimentation and debugging work. In addition, although the SA mechanism effectively captures more sequence information and improves model capability, it primarily focuses on the relative positional relationships of elements within sequences, which may poorly model long-term dependencies. There are various disruptive factors when detecting fabric defects. For example, uneven lighting, material properties of the fabric, and physical, chemical and environmental factors (such as temperature, humidity and gas composition) are likely to cause defects to be falsely detected.

In future work, a diverse dataset comprising various types of fabric images under complex conditions will be collected. This dataset will facilitate the study of the correlation between fabric defect characteristics and their surrounding environment, thereby enhancing the model’s capabilities in defect detection and interference resistance. Additionally, the model will be selected and adjusted based on actual application scenarios to maximize its advantages and address its limitations. Improvements will focus on model optimization, lightweight design, real-time performance, generalization ability, and integration with automated equipment. These efforts aim to improve its fabric defect detection performance, bringing more innovation and value to the fabric manufacturing industry.

5. Conclusions

Accurate and effective detection of fabric defects remains challenging due to the different types and sizes of the defects as well as the requirements for real-time and high-speed detection. Timely detection and repair of fabric defects can improve the overall quality of fabric, optimize the textile production process, and reduce the economic loss of enterprises. Therefore, research into detecting fabric defects is of considerable importance.

In this paper, an improved YOLOv8n detection model named YOLO-BGS is proposed. The selected lightweight YOLOv8n has a smaller model size and consumes fewer computing resources compared to other versions of YOLO. This model incorporates the BiFPN structure, which significantly enhances the precision and efficiency of target detection through bidirectional connection and weighted fusion mechanisms. The addition of SA and GAM further enhances the model’s feature extraction ability and strengthens the global information modeling, thus improving the effectiveness of multi-scale target detection. The suggested model achieves a mAP of 96.6%, according to the experimental results. Compared to other popular object detection frameworks, the YOLO-BGS model strikes a more effective balance between precision and recall, demonstrating superior detection performance. Going forward, the focus will be on optimizing the performance of the model, especially strengthening its detection capabilities and exploring the potential to implement applications on end-user devices, reduce defect rates, and promote sustainable economic development.

Author Contributions

Conceptualization, G.L.; formal analysis, T.X.; methodology, G.L. and T.X.; software, G.L. and T.X.; validation, G.L., T.X. and G.W.; writing—original draft, G.L.; writing—review and editing, G.L. and G.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Transformation and Guidance of Scientific and Technological Achievements in Shanxi Province (No. 202104021301053), Fundamental Research Program of Shanxi Province (Nos. 20210302123114, 202203021211146), 2022 Shanxi Art Science Planning Project (No. 22BG082), Shanxi Province Key Research and Development Project (No. 202302040201009).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The relevant data are included in the paper.

Acknowledgments

The authors would like to thank the editors and anonymous reviewers for their helpful suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Liu, J.; Wang, C.; Su, H.; Du, B.; Tao, D. Multistage GAN for Fabric Defect Detection. IEEE Trans. Image Process. 2020, 29, 3388–3400. [Google Scholar] [CrossRef] [PubMed]
Lu, B.; Zhang, M.; Huang, B. Deep Adversarial Data Augmentation for Fabric Defect Classification with Scarce Defect Data. IEEE Trans. Instrum. Meas. 2022, 71, 1–13. [Google Scholar] [CrossRef]
Alruwais, N.; Alabdulkreem, E.; Mahmood, K.; Marzouk, R.; Assiri, M.; Abdelmageed, A.A.; Abdelbagi, S.; Drar, S. Hybrid Mutation Moth Flame Optimization with Deep Learning-Based Smart Fabric Defect Detection. Comput. Electr. Eng. 2023, 108, 108706. [Google Scholar] [CrossRef]
Powell, D.; Magnanini, M.C.; Colledani, M.; Myklebust, O. Advancing Zero Defect Manufacturing: A State-of-the-Art Perspective and Future Research Directions. Comput. Ind. 2022, 136, 103596. [Google Scholar] [CrossRef]
Shakir, S.; Topal, C. Unsupervised Fabric Defect Detection with Local Spectra Refinement (LSR). Neural Comput. Appl. 2024, 36, 1091–1103. [Google Scholar] [CrossRef]
Liu, G.; Ren, J. Feature Purification Fusion Structure for Fabric Defect Detection. Vis. Comput. 2024, 40, 3825–3842. [Google Scholar] [CrossRef]
Zhu, R.; Xin, B.; Deng, N.; Fan, M. Fabric Defect Detection Using Cartoon–Texture Image Decomposition Model and Visual Saliency Method. Text. Res. J. 2023, 93, 4406–4418. [Google Scholar] [CrossRef]
Kumar, A. Computer-Vision-Based Fabric Defect Detection: A Survey. IEEE Trans. Ind. Electron. 2008, 55, 348–363. [Google Scholar] [CrossRef]
Maray, M.; Aldehim, G.; Alzahrani, A.; Alotaibi, F.; Alsafari, S.; Alghamdi, E.A.; Hamza, M.A. Deer Hunting Optimization with Deep Learning-Driven Automated Fabric Defect Detection and Classification. Mob. Netw. Appl. 2023. [Google Scholar] [CrossRef]
Fouda, Y.M. Integral Images-Based Approach for Fabric Defect Detection. Opt. Laser Technol. 2022, 147, 107608. [Google Scholar]
Chaka, K.T.; Shiferaw, A.A.; Sharew, S.T. Inspection of Cotton Woven Fabrics Produced by Ethiopian Textile Factories through a Real-Time Vision-Based System. J. Nat. Fibers 2023, 20, 2286615. [Google Scholar] [CrossRef]
Huang, Y.; Jing, J.; Wang, Z. Fabric Defect Segmentation Method Based on Deep Learning. IEEE Trans. Instrum. Meas. 2021, 70, 1–15. [Google Scholar] [CrossRef]
Talu, M.F.; Hanbay, K.; Hatami Varjovi, M. CNN-Based Fabric Defect Detection System on Loom Fabric Inspection. Tekst. Ve Konfeksiyon 2022, 32, 208–219. [Google Scholar] [CrossRef]
Yaşar Çıklaçandır, F.G.; Utku, S.; Özdemir, H. Determination of Various Fabric Defects Using Different Machine Learning Techniques. J. Text. Inst. 2024, 115, 733–743. [Google Scholar] [CrossRef]
Harel, D.; Yerushalmi, R.; Marron, A.; Elyasaf, A. Categorizing Methods for Integrating Machine Learning with Executable Specifications. Sci. China Inf. Sci. 2024, 67, 111101. [Google Scholar] [CrossRef]
Kuznetsova, N.; Sagirova, Z.H.; Dhif, I.; Gognieva, D.; Gogiberidze, N.; Chomakhidze, P.; Kopylov, P. Determination of Left Ventricular Diastolic Dysfunction Using Machine Learning Methods. Eur. Heart J. 2021, 42, ehab724.3051. [Google Scholar] [CrossRef]
Khwakhali, U.S.; Tra, N.T.; Tin, H.V.; Khai, T.D.; Tin, C.Q.; Hoe, L.I. Fabric Defect Detection Using Gray Level Co-Occurence Matrix and Local Binary Pattern. In Proceedings of the 2022 RIVF International Conference on Computing and Communication Technologies (RIVF), Ho Chi Minh City, Vietnam, 20–22 December 2022; IEEE: New York, NY, USA, 2022; pp. 226–231. [Google Scholar]
Ghosh, A.; Guha, T.; Bhar, R.B.; Das, S. Pattern Classification of Fabric Defects Using Support Vector Machines. Int. J. Cloth. Sci. Technol. 2011, 23, 142–151. [Google Scholar] [CrossRef]
Pourkaramdel, Z.; Fekri-Ershad, S.; Nanni, L. Fabric Defect Detection Based on Completed Local Quartet Patterns and Majority Decision Algorithm. Expert Syst. Appl. 2022, 198, 116827. [Google Scholar] [CrossRef]
Anami, B.S.; Elemmi, M.C. Comparative Analysis of SVM and ANN Classifiers for Defective and Non-Defective Fabric Images Classification. J. Text. Inst. 2022, 113, 1072–1082. [Google Scholar] [CrossRef]
Zahra, A.; Amin, M.; El-Samie, F.E.A.; Emam, M. Efficient Utilization of Deep Learning for the Detection of Fabric Defects. Neural Comput. Appl. 2024, 36, 6037–6050. [Google Scholar] [CrossRef]
Kahraman, Y.; Durmuşoğlu, A. Deep Learning-Based Fabric Defect Detection: A Review. Text. Res. J. 2023, 93, 1485–1503. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
Revathy, G.; Kalaivani, R. Fabric Defect Detection and Classification via Deep Learning-Based Improved Mask RCNN. SIViP 2024, 18, 2183–2193. [Google Scholar] [CrossRef]
Zhou, H.; Jang, B.; Chen, Y.; Troendle, D. Exploring Faster RCNN for Fabric Defect Detection. In Proceedings of the 2020 Third International Conference on Artificial Intelligence for Industries (AI4I), Irvine, CA, USA, 21–23 September 2020; pp. 52–55. [Google Scholar]
Lin, B. Safety Helmet Detection Based on Improved YOLOv8. IEEE Access 2024, 12, 28260–28272. [Google Scholar] [CrossRef]
Jing, J.; Zhuo, D.; Zhang, H.; Liang, Y.; Zheng, M. Fabric Defect Detection Using the Improved YOLOv3 Model. J. Eng. Fibers Fabr. 2020, 15, 155892502090826. [Google Scholar] [CrossRef]
Hu, X.; Liang, Y.; Wang, H.; Tan, Y.; Liu, S.; Pan, F.; Wu, Q.; He, Z. Fabric Defect Image Generation Method Based on the Dual-Stage W-Net Generative Adversarial Network. Text. Res. J. 2024, 94, 00405175241233942. [Google Scholar] [CrossRef]
Li, F.; Xiao, K.; Hu, Z.; Zhang, G. Fabric Defect Detection Algorithm Based on Improved YOLOv5. Vis. Comput. 2024, 40, 2309–2324. [Google Scholar] [CrossRef]
Kang, X.; Li, J. AYOLOv7-Tiny: Towards Efficient Defect Detection in Solid Color Circular Weft Fabric. Text. Res. J. 2024, 94, 225–245. [Google Scholar] [CrossRef]
Talaat, F.M.; ZainEldin, H. An Improved Fire Detection Approach Based on YOLO-v8 for Smart Cities. Neural Comput. Appl. 2023, 35, 20939–20954. [Google Scholar] [CrossRef]
Du, Y.; Liu, X.; Yi, Y.; Wei, K. Optimizing Road Safety: Advancements in Lightweight YOLOv8 Models and GhostC2f Design for Real-Time Distracted Driving Detection. Sensors 2023, 23, 8844. [Google Scholar] [CrossRef]
Du, Y.; Xu, X.; He, X. Optimizing Geo-Hazard Response: LBE-YOLO’s Innovative Lightweight Framework for Enhanced Real-Time Landslide Detection and Risk Mitigation. Remote Sens. 2024, 16, 534. [Google Scholar] [CrossRef]
Li, Z.; Fan, Q.; Yin, Y. Colloidal Self-Assembly Approaches to Smart Nanostructured Materials. Chem. Rev. 2022, 122, 4976–5067. [Google Scholar] [CrossRef] [PubMed]
Rêgo, A.D.S.; Furtado, G.E.; Bernardes, R.A.; Santos-Costa, P.; Dias, R.A.; Alves, F.S.; Ainla, A.; Arruda, L.M.; Moreira, I.P.; Bessa, J.; et al. Development of Smart Clothing to Prevent Pressure Injuries in Bedridden Persons and/or with Severely Impaired Mobility: 4NoPressure Research Protocol. Healthcare 2023, 11, 1361. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Structure of YOLOv8n.

Figure 2. Basic module structure of PANet (a) and BiFPN (b).

Figure 3. The BiFPN structure.

Figure 4. Overview of GAM.

Figure 5. Channel attention submodule.

Figure 6. Spatial attention submodule.

Figure 7. Structure of SA.

Figure 8. Sample images of the dataset.

Figure 9. Quantity and distribution of label data of fabric defects.

Figure 10. PR curve of YOLOv8n.

Figure 11. PR curve of YOLO-BGS.

Figure 12. Comparison of mAP values with different precisions.

Figure 13. Comparison of mAP values with different models.

Figure 14. Performance comparison: (a) YOLOv8n and (b) YOLOv8n + BiFPN + SA.

Figure 15. Performance comparison: (a) YOLOv8n and (b) YOLO-BGS.

Figure 16. Comparison of mainstream models.

Table 1. Comparison of defect detection effect.

Model	Precision (%)	Recall (%)	F1 (%)	mAP (%)
YOLOv8n	83.9	90.0	86.8	93.0
YOLOv8n + BiFPN + GAM + SA (YOLO-BGS)	87.6	94.1	90.7	96.6

Table 2. Comparison of the results of the ablation experiment.

Model	Precision (%)	Recall (%)	F1 (%)	mAP (%)
YOLOv8n	83.9	90.0	86.8	93.0
YOLOv8n + BiFPN	84.7	91.3	87.9	94.5
YOLOv8n + BiFPN + SA	86.2	92.6	89.2	95.3
YOLOv8n + BiFPN + GAM + SA (YOLO-BGS)	87.6	94.1	90.7	96.6

Table 3. Comparison of mainstream models.

Model	Precision (%)	Recall (%)	F1 (%)	mAP (%)
Faster-RCNN	76.2	80.6	78.3	81.3
SSD	77.6	83.4	80.4	83.2
YOLOv3	79.8	84.3	82.0	85.7
YOLOv5	81.0	86.8	83.8	88.4
YOLOv7	82.1	88.5	85.2	90.8
YOLO-BGS	87.6	94.1	90.7	96.6

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lu, G.; Xiong, T.; Wu, G. YOLO-BGS Optimizes Textile Production Processes: Enhancing YOLOv8n with Bi-Directional Feature Pyramid Network and Global and Shuffle Attention Mechanisms for Efficient Fabric Defect Detection. Sustainability 2024, 16, 7922. https://doi.org/10.3390/su16187922

AMA Style

Lu G, Xiong T, Wu G. YOLO-BGS Optimizes Textile Production Processes: Enhancing YOLOv8n with Bi-Directional Feature Pyramid Network and Global and Shuffle Attention Mechanisms for Efficient Fabric Defect Detection. Sustainability. 2024; 16(18):7922. https://doi.org/10.3390/su16187922

Chicago/Turabian Style

Lu, Gege, Tian Xiong, and Gaihong Wu. 2024. "YOLO-BGS Optimizes Textile Production Processes: Enhancing YOLOv8n with Bi-Directional Feature Pyramid Network and Global and Shuffle Attention Mechanisms for Efficient Fabric Defect Detection" Sustainability 16, no. 18: 7922. https://doi.org/10.3390/su16187922

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

YOLO-BGS Optimizes Textile Production Processes: Enhancing YOLOv8n with Bi-Directional Feature Pyramid Network and Global and Shuffle Attention Mechanisms for Efficient Fabric Defect Detection

Abstract

1. Introduction

2. Theory and Methods

2.1. YOLOv8n

2.2. Bi-Directional Feature Pyramid Network (BiFPN)

2.3. Global Attention Mechanism (GAM)

2.4. Shuffle Attention (SA)

3. Experiment and Analysis

3.1. Data Set

3.2. Experimental Environment and Evaluation Index

3.3. Before and after Improvement

3.4. Ablation Experiment

3.5. Other Comparative Experiments

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI