1. Introduction
During textile production, defects such as thread breaks, pilling, hole, thread floats, and stains may occur due to machinery issues or operator errors, leading to poor product quality and adversely affecting the company’s production efficiency [
1]. The complex texture of fabric surfaces makes traditional fabric defect detection methods, which often rely on manual visual inspection, inefficient, subjective, and prone to visual fatigue. According to [
2], the accuracy of manual inspection is only around 70%, which is insufficient to meet the demands of large-scale production. Therefore, it is crucial to develop an automated, efficient, accurate, and cost-effective fabric defect detection system using computer vision and deep learning technologies. Traditional object detection algorithms rely on a combination of handcrafted features and classifiers, which require high-quality images, involve complex processing, and are highly sensitive to noise and interference, limiting their effectiveness in detecting fabric defects.
In recent years, deep learning-based object detection algorithms have rapidly developed in object detection due to their strong learning capabilities and robustness to scale variations. Based on the number of detection stages, deep learning-based object detection algorithms can be divided into two-stage detection algorithms, represented by Faster R-CNN [
3], and one-stage detection algorithms, represented by YOLO (You Only Look Once) [
4] and SSD (Single-Shot MultiBox Detector) [
5]. In research on two-stage fabric defect detection algorithms, Sun Xuan et al. [
6] improved the prediction boxes by using an enhanced K-means clustering method and replaced the backbone network of Faster R-CNN with an optimized ResNet50. This algorithm improved the accuracy of fabric defect detection but did not address the issue of computational complexity.
Compared to two-stage detection algorithms, single-stage algorithms do not require the generation of candidate regions, offering higher real-time performance. They have unique advantages in multi-scale detection and multi-task learning, and multi-scale detection can recognize objects at different scales, thereby improving detection accuracy, making them a current research hotspot in fabric defect detection. Xie HS et al. [
7] added the Fully Convolutional Squeeze and Excitation (FCSE) module to the traditional SSD and validated it on the TILDA and Xuelang datasets, resulting in improved detection accuracy. Faced with a complex textile texture background, Guo YB (Guo, Yongbin) et al. [
8] proposed a Convolutional Squeeze and Excitation (CSE) channel attention module and integrated it into the YOLOv5 backbone, enhancing defect detection and anti-interference capabilities. Fan et al. [
9] embedded a channel and spatial dual-attention mechanism (CBAM) into YOLOv5, effectively improving the feature allocation issues associated with a single-attention mechanism and enhancing model accuracy. However, the computational load of the model was not significantly reduced. Jing, J. F. (Jing, Junfeng) et al. [
10] applied the k-means algorithm for the dimensional clustering of target frames and added YOLO detection layers on feature maps of different sizes. The improved network model achieved an error detection rate of less than 5%. These methods generally suffer from high computational complexity and an imbalance between accuracy and detection speed, making them difficult to deploy on resource-constrained edge devices.
With the advent of lightweight networks, various scholars have combined them with YOLO models to propose new lightweight object detection algorithms. Kang, X. et al. [
11] addressed the issues of complex deep learning model construction and high network complexity by using the lightweight YOLOv5s model as a base. They integrated the Convolutional Block Attention Module (CBAM) and feature enhancement module into the backbone and neck, respectively, and modified the loss function to CIoU_Loss. Liu BB et al. [
12] integrated new convolutional operators into the Extended Efficient Layer Aggregation Network to optimize feature extraction, effectively capturing spatial features while reducing computation. The experiments demonstrated that this fabric defect detection model reduced model parameters and the computational load by 18.03% and 20.53%. Although the aforementioned researchers have made significant progress in lightweight YOLO network detection methods, the complex background texture and numerous small targets in fabric defects pose challenges. The feature extraction capabilities of these networks remain limited, indicating that there is still room for improvement in the model’s lightweight design.
To address the aforementioned issues, this study proposes an efficient and accurate lightweight YOLOv8n-based fabric defect detection algorithm, GSL-YOLOv8n, suitable for textile production lines. The algorithm includes the following improvements:
Ghost Network Integration: The Ghost network is used to enhance the standard convolution (Conv) modules and C2f in the YOLOv8n network, significantly reducing the model’s parameter count.
Semantic Information Extraction: To address the loss of semantic information between different features and the loss of small target features in the C2f module of YOLOv8, the parameter-free attention mechanism SimAM is embedded at the end of the backbone network.
Lightweight Detection Head: To further achieve model lightweighting, a lightweight detection head (LSCDH) is designed by combining the GroupNorm and shared convolution concepts.
The experimental results on fabric defect datasets demonstrate the effectiveness of the proposed algorithm. It successfully balances detection accuracy and speed while reducing model complexity, making it suitable for deployment on devices with limited computational resources.
The remaining sections of this paper are organized as follows: The second section elaborates on the YOLOv8 algorithm and the proposed improvements. The third section introduces the fabric dataset, experimental setup, and detailed experimental results. The final section presents the conclusions and future work outlook of this study.
3. Result and Discussion
3.1. Experimental Platform
The experiments were conducted on a 64-bit Windows 10 platform equipped with an NVIDIA GeForce RTX 3060Ti (NVIDIA, Santa Clara, CA, USA). The implementation utilized Python 3.11.5 and was conducted within the PyTorch 2.1.0 deep learning framework. CUDA 12.1 and CUDNN 12.1 were employed for GPU acceleration, enhancing the training speed of the YOLOv8 model. The training parameters are detailed in
Table 1.
3.2. Dataset Description
The experimental dataset was selected from actual scenarios captured by a textile manufacturing enterprise and the Tianchi [
19] public dataset. Images captured from actual scenarios were used for training and testing; a total of 2115 images containing textile defects were used for training and testing. These images include six common textile defects: Float, Skips, Pilling, Hole, Pulling Out Yarn, and Stain. The six types of common textile defects are illustrated in
Figure 5. Due to the high similarity between noise and minor defects in textile images, to improve the model’s generalization ability and robustness and to prevent overfitting, this study utilized data augmentation methods such as rotation, scaling, cropping, and color changes to expand the dataset to three times its original size, resulting in a total of 6345 images. The dataset was randomly divided into training, validation, and test sets in a ratio of 8:1:1. The final training samples consisted of 1376 images of Float, 1011 images of Skips, 606 images of Pilling, 336 images of Hole, 2016 images of Pulling Out Yarn, and 1734 images of Stain, totaling 6345 images. Additionally, a portion of the Tianchi public textile image dataset was selected to validate the generalization capability of the proposed model.
To investigate the impact of augmented data on defect detection performance, a comparative experiment was designed to evaluate the detection effectiveness before and after data augmentation. The evaluation metrics selected for this experiment were mAP@0.5, Recall (R), and Precision (P). The experimental results are shown in
Table 1. According to the results presented in
Table 2, it can be concluded that the augmented dataset demonstrates higher mAP@0.5, Recall, and Precision values compared to the original dataset, indicating improved overall model performance. Therefore, the subsequent comparative and ablation experiments were conducted using the augmented dataset.
The confusion matrices in
Figure 6 compare the model’s performance before and after augmenting the training dataset. The left matrix, based on the original dataset, shows a higher number of false positives, particularly in the “Stains” and “Background” categories, with dispersed off-diagonal values, indicating overestimation errors caused by insufficient and imbalanced training samples [
20]. After augmenting the dataset (right matrix), the off-diagonal values significantly decrease, and the Precision improves, with more concentrated diagonal values for categories like “Stains” and “Background”, indicating more accurate predictions. This improvement demonstrates that data augmentation effectively reduces overestimation errors, enhancing the model’s robustness and generalization capability.
3.3. Evaluation Metrics
To verify the effectiveness of the experiments, the following evaluation metrics were used: Mean Average Precision (mAP@0.5), Precision (P), model parameters, Recall (R), model size, and detection speed (Frames Per Second, FPS). mAP@0.5, P, and R were used to evaluate the detection performance of the model. Higher values of mAP@0.5, P, R, and FPS indicate better detection performance. The model parameters, model size, and YOLO (Giga Floating-point Operations Per Second) were used to evaluate the lightweight performance of the model. Smaller values of model parameters, model size, and GFLOPs indicate better lightweight performance.
3.4. Ablation Experiments
To validate the effectiveness of each improvement module in the proposed GSL-YOLOv8n algorithm, we conducted eight sets of ablation experiments on the fabric defect dataset. The evaluation metrics included model size, GFLOPs, FPS, model parameters, and mAP@0.5. Based on YOLOv8, we replaced standard convolutions with Ghost convolutions, upgraded the detection head to the newly designed shared convolution detection head (LSCDH), and introduced the SimAM attention mechanism. The detection results are shown in
Table 3, where “√” indicates the use of the corresponding module.
Based on the experimental results presented in
Table 3 the following observations can be made: (1) Complexity Reduction with Ghost Convolutions: By replacing the standard convolutions in the YOLOv8 network with Ghost convolutions, the model size decreased by 39.7%, the computation and parameter count were reduced by 38.3% and 43.0%, respectively, and the mAP@0.5 increased by 0.46% compared to the unmodified YOLOv8 model. (2) Improvement with SimAM Attention Mechanism: To address the multi-scale feature problem of textile defects, the SimAM parameter-free attention mechanism was introduced. This resulted in a 0.46% increase in mAP@0.5 and a 25.8% increase in FPS without increasing the model size, parameters, or computational load. This implies that SimAM enhances the model’s ability to select relevant features, allowing it to locate and process important features more quickly and reducing the time spent on irrelevant features, thus improving inference efficiency. (3) Network Lightweighting with LSCDH: To further lightweight the network, replacing the original YOLOv8 detection head with the shared convolution detection head (LSCDH) resulted in a model size reduction of 20.6%, computation and parameter count decreases of 19.8% and 21.4%, respectively, and an mAP@0.5 improvement of 0.33%. (4) Combination of improvements: After replacing the standard convolutions in YOLOv8 with Ghost convolutions and introducing SimAM, the model size and parameter count decreased by 41.3% and 43.9%, respectively, with the mAP@0.5 increasing to 97.99. Replacing the detection head with LSCDH after introducing Ghost convolutions resulted in model size and parameter count reductions of 60.3% and 56.8%, with the mAP@0.5 increasing to 97.75. Combining all three improvements (Ghost convolutions, SimAM, and LSCDH) into the YOLOv8 model, creating the GSL-YOLOv8 model, resulted in reductions of 66.7% in model size, 58.0% in computation, and 67.4% in parameter count, with the FPS remaining nearly constant and the mAP@0.5 improving by 0.60%. The ablation experiments indicate that each improvement point and their combinations positively impacted the model’s performance, demonstrating that the proposed algorithm effectively enhances the original model’s capability to detect fabric defects.
Figure 7 compares the training performance of the baseline YOLOv8 and the improved GSL-YOLOv8 models across four key metrics. In
Figure 7a, GSL-YOLOv8 converges faster and consistently maintains higher mAP@50 values, indicating improved detection accuracy.
Figure 7b highlights its higher Precision, especially in early epochs.
Figure 7c shows improved Recall, with better instance detection and fewer misses.
Figure 7d presents lower and more stable loss values, reflecting enhanced optimization and generalization. Overall, GSL-YOLOv8 achieves superior accuracy, stability, and robustness in object detection.
3.5. Comparison Experiments
3.5.1. Attention Mechanism SimAM Comparison Experiments
To address the issue of background noise affecting the model’s accuracy in detecting small fabric defects, this study incorporates SimAM into the end of the YOLOv8n backbone to enhance the network’s capability of extracting global features. The attention mechanism module can be added to various parts of the YOLOv8 network, such as the backbone and feature extraction networks. To verify the impact of SimAM on algorithm performance when positioned differently within YOLOv8n, SimAM is embedded at the end of the backbone and within the neck network for comparative experiments with the original YOLOv8n model. “YOLOv8n+backbone” indicates SimAM embedded at the end of the backbone, while “YOLOv8n+neck” denotes SimAM embedded within the neck network. The experimental results are shown in
Table 4.
As shown in
Table 4, when SimAM is embedded at the end of the backbone network, the AP values for three types of detection targets—fiber breaks, holes, and loose fibers—are the highest. Compared to the original YOLOv8 and the version with SimAM added to the neck network, the mAP@0.5 shows the best performance. Specifically, the AP values for holes and loose fibers are improved by 2.8% and 1.6%, respectively, over the YOLOv8 model. In summary, embedding the SimAM attention mechanism at the end of the backbone network significantly enhances the network’s feature extraction capability, improving both the accuracy and robustness of target detection.
To verify the advantages of the parameter-free attention module SimAM, this study added a Deformable Attention Transformer (DAttention), Bi-Level Routing Attention (BiFormer), Mixed Local Channel Attention (MLCA), and Convolutional Block Attention Module (CBAM) to the same position in the YOLOv8 network. These four commonly used attention mechanisms were compared with SimAM in the experiments. The experimental results are shown in
Table 5.
Table 5 experimental results show that introducing attention mechanisms at the end of the backbone network improves both Recall (R) and mean Average Precision at 0.5 (mAP@0.5), thereby enhancing the model’s ability to recognize defect features. Among the four mainstream attention mechanisms, SimAM achieved the highest improvement in FPS, indicating that SimAM helps the model extract and utilize features more efficiently, reducing redundant computations during inference and thereby increasing inference speed. Although CBAM performed best in terms of mAP@0.5, its Recall and FPS performance were average. In contrast, SimAM demonstrated superior overall performance across all metrics.
3.5.2. Result Visualization
To provide a more intuitive observation of the improved algorithm’s effectiveness in fabric defect detection, this paper selects several fabric images for comparison of the detection results.
Figure 8 shows the visualization results of detecting textile defects using different models under the same environment. From
Figure 8a,b, it can be seen that for the detection of Pilling defects, the improved GSL-YOLOv8 algorithm achieved a detection accuracy of 0.85, which is better than the original YOLOv8n model. In the detection of small target stains, GSL-YOLOv8n exhibited the best detection accuracy. When detecting elongated defects such as Pulling Out Yarn, Float, and Skips, although GSL-YOLOv8 misclassified some textile backgrounds as stain defects, resulting in false detections, it still achieved the highest detection accuracy overall. In summary, the GSL-YOLOv8n algorithm demonstrated superior recognition accuracy in detecting textile defects of varying sizes and shapes compared to other algorithms.
3.5.3. Comparison of Algorithms on Different Datasets
To validate the generalization capability of the GSL-YOLOv8n algorithm, a subset of 3,563 images from the Tianchi dataset was selected for comparative experiments against different algorithms. The fabric defect types include holes, stains, coarse weft, loose warp, fuzz, knots, and warping defects. The dataset was randomly divided into training, validation, and test sets in a ratio of 8:1:1. The experimental results are shown in
Table 6.
As shown in
Table 6, the YOLOv6 algorithm achieves the highest FPS at 303.1, but it suffers from high model complexity and a slower convergence speed. Compared to the other three algorithms, the GSL-YOLOv8n performs best in terms of GFLOPs, Recall (R), and mAP@0.5/%. Additionally, the GFLOPs of the GSL-YOLOv8n algorithm are reduced by 58.0% compared to the original YOLOv8n model. This indicates that the GSL-YOLOv8n algorithm demonstrates superior performance on this dataset, offering better robustness and generalization capabilities.
3.5.4. Comparison Experiment of Different Algorithms
To validate the effectiveness of the proposed GSL-YOLOv8n algorithm compared to other classical algorithms, we use the YOLOv8n model as the baseline network model. The evaluation metrics include model size, GFLOPs, the number of model parameters, and mAP@0.5. We selected the classical two-stage model Faster R-CNN, and the single-stage models SSD and YOLO for comparison with the proposed GSL-YOLOv8n algorithm on the enhanced dataset. The experimental results are shown in
Table 7.
From the comparison results of different algorithms shown in
Table 7, it is evident that the proposed GSL-YOLOv8n algorithm offers advantages in both detection accuracy and speed over Faster R-CNN. For single-stage algorithms like SSD, which struggle with detecting small targets in fabric defects, the mAP@0.5 is the lowest, and both the model size and parameter count are relatively high. Among the different versions of the YOLO algorithm, YOLOv5s performs best in terms of mAP@0.5, but its large model size and parameter count make it less suitable for deployment on resource-constrained embedded devices. YOLOv6 demonstrates the best performance in terms of FPS. The proposed GSL-YOLOv8n algorithm excels in GFLOPs and parameter count, with a model size as low as 2.1 and an mAP@0.5 of 98.29, making it fully capable of meeting the demands for fast and accurate detection in industrial production.
Table 8 presents the AP and mAP@0.5 detection results for six types of defects across eight different algorithms. It is evident that the SSD algorithm performs worse compared to other models in various metrics for fabric defect detection. The improved GSL-YOLOv8n model shows an increase in AP values for most defects, especially for small defect targets such as hole and loose threads. Additionally, the GSL-YOLOv8n model exhibits a significant improvement in AP values for elongated defects like Skips and loose threads.
In a comparison of Precision–Recall curves,
Figure 9a shows a steeper curve compared to
Figure 9b, indicating that the GSL-YOLOv8 model converges more quickly in detection tasks, achieving higher accuracy in the early stages of training. The Recall–Confidence curve in
Figure 9c is smoother than that in
Figure 9d and demonstrates better generalization in the low-confidence range. This suggests that GSL-YOLOv8 can maintain high detection accuracy, even as confidence decreases. In summary, the optimized model outperforms the original model in terms of overall performance.
3.6. Computational Complexity Analysis
To provide a more comprehensive analysis of the lightweight characteristics of the proposed GSL-YOLOv8n, we conduct a theoretical computational complexity assessment. For the original YOLOv8n model, the core convolution operations have a time complexity of O(N^2 * K^2 * C_in * C_out), where N is the input feature map size, K is the kernel size, and C_in and C_out represent the input and output channels, respectively. The improved GSL-YOLOv8n incorporates Ghost convolutions, reducing redundancy and lowering the complexity to O(N^2 * K^2 * C_in * C_out/r), where r is the compression ratio of the Ghost module.
Additionally, the lightweight shared convolution detection head (LSCDH) used in GSL-YOLOv8n further enhances computational efficiency. The LSCDH module allows for multi-scale feature fusion with a time complexity of O(N^2 * C_in * C_out), significantly reducing repeated calculations compared to traditional detection heads. The integration of SimAM provides enhanced feature extraction with negligible computational overhead.
As indicated in the results, GSL-YOLOv8n maintains a near-equivalent FPS to YOLOv8n (238.9 vs. 239.6) while drastically reducing model size, parameter count, and computational load by 66.7%, 67.4%, and 58.0%, respectively. These improvements highlight the effectiveness of the proposed method in optimizing computational complexity while preserving high detection accuracy.
4. Conclusions
Based on the multi-scale characteristics of fabric defects, the dataset was preprocessed to enhance the model’s generalization ability. Building on the YOLOv8 model, we applied the concept of Ghost convolutions to improve the YOLOv8 backbone network. Specifically, we replaced the standard convolutions (Conv) in YOLOv8 with GhostConv and modified the C2f module to C2fGhost. This approach maintained model accuracy while reducing its complexity. Additionally, we integrated the parameter-free attention mechanism SimAM at the end of the backbone network, which enhanced the model’s ability to integrate multi-scale information.
The YOLOv8 detection head was upgraded to the newly designed LSCDH, which utilized the concept of shared convolutions to merge and adjust channel numbers of features. This combination of low-level detail and high-level semantic information boosts the model’s feature extraction capabilities while ensuring its lightweight nature, making it suitable for deployment on mobile devices. Ablation experiments validated the effectiveness of each module in our proposed GSL-YOLOv8n algorithm. Compared to the YOLOv8n algorithm, the improved GSL-YOLOv8n model reduced model size, computational complexity, and parameter count by 66.7%, 58.0%, and 67.4%, respectively. This resulted in a lightweight model with enhanced small target defect feature extraction and anti-interference capabilities while maintaining detection speed, achieving an mAP@0.5 of 98.29%.
In the future, we plan to integrate this model into resource-constrained embedded devices, exploring the application of lightweight models for fabric defect detection in environments with limited computing resources.