Next Article in Journal
Impact of Domestic Heating on Air Pollution—Extreme Pollution Events in Serbia
Previous Article in Journal
The Potential Relationship between Biomass, Biorefineries, and Bitcoin
 
 
Article
Peer-Review Record

YOLO-BGS Optimizes Textile Production Processes: Enhancing YOLOv8n with Bi-Directional Feature Pyramid Network and Global and Shuffle Attention Mechanisms for Efficient Fabric Defect Detection

Sustainability 2024, 16(18), 7922; https://doi.org/10.3390/su16187922
by Gege Lu, Tian Xiong and Gaihong Wu *
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Sustainability 2024, 16(18), 7922; https://doi.org/10.3390/su16187922
Submission received: 6 August 2024 / Revised: 24 August 2024 / Accepted: 9 September 2024 / Published: 11 September 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

In this work, the authors proposed an enhanced YOLOv8n model, named YOLO-BGS, to improve fabric defect detection. In this model, a Bi-directional Feature Pyramid Network (BiFPN) is integrated for the purposes of multiscale feature fusion, a Shuffle Attention (SA) mechanism is used for optimal feature classification, and a Global Attention Mechanism (GAM) is applied for optimal global detection accuracy. Empirical results demonstrate a significant improvement in detection performance, achieving a mean Average Precision (mAP) of 96.6%. However, there are some concerns of this reviewer to be addressed. Please find below some comments.

- Your YOLO-BGS model was a measured choice among numerous YOLO versions or other models that could have been the base model for enhancement. Why was the YOLO-BGS model picked on top of other YOLO versions or any alternative models?

- Could you tell us the advantages and the shortcomings of particular Bi-directional Feature Pyramid Network (BiFPN) for multiscale feature fusion in the field of fabric defect detection?

- What were the factors that led to the choice of Shuffle Attention (SA) and Global Attention Mechanism (GAM) over other methods, and how do the methods in question affect dynamic accuracy?

- How the implementation of your YOLO-BGS algorithm for fabric defect detection tip the scales on the efficiency of computation and accuracy in comparison to other algorithms?

- For training and testing the model, you could you give me more details about the dataset, for example, the different types of defects, the size of the dataset, and the prepossessing steps applied?

- What other performance metrics were used in addition to mAP to evaluate the model? How did YOLO-BGS perform with these metrics compared to the baseline YOLOv8n?

- The ability of YOLO-BGS to generalize the unseen patches of faults, which were left out in the training phase was also tested. The model is also supposed to handle the changes in fabric patterns and textures effectively. How was the effect of the above-pointed out issues engaged with?

Comments on the Quality of English Language

Minor revision is required for English language. 

Author Response

  1. Summary

Thank you very much for taking the time to review this manuscript. We have provided detailed responses to each of your valuable comments. Please review them.

 

  1. Point-by-point response to Comments and Suggestions for Authors

Comments 1: - Your YOLO-BGS model was a measured choice among numerous YOLO versions or other models that could have been the base model for enhancement. Why was the YOLO-BGS model picked on top of other YOLO versions or any alternative models?

Response 1: Thank you for your valuable feedback on our manuscript. The reasons for choosing YOLO-BGS model are as follows: First, the two-stage deep learning method (including R-CNN, Fast R-CNN, Faster R-CNN, Mask R-CNN, etc.) requires a large amount of computation and high computing resources, which will limit the real-time and application scope of fabric defect detection. Second, compared with other single-stage models (such as YOLOv3, YOLOv5, YOLOv7, SSD, etc.), it was found that YOLO-BGS, as the improved YOLOv8n model, was superior to other alternative models in terms of accuracy and feature extraction, making it the first choice.

 

Comments 2: -Could you tell us the advantages and the shortcomings of particular Bi-directional Feature Pyramid Network (BiFPN) for multiscale feature fusion in the field of fabric defect detection?

Response 2: Thank you for your interest in the Bi-directional Feature Pyramid Network (BiFPN) mentioned in our manuscript. BiFPN is outstanding in the field of multi-scale feature fusion, which greatly improves the performance of fabric defect detection. Its core advantages include the bidirectional transmission of information, which not only deepends understanding of the image context, but also preserves fine details, enabling effective identification of defects at different scales and textures. In addition, BiFPN structure is flexible, supports free interconnection between feature levels, and adaptively learns optimal feature pairing, which is critical for identifying diverse and hidden defects. Although the fusion BiFPN can significantly improve the performance of the model in the target detection task, the BiFPN involves multiple hyperparameters, such as the weights between different scale features and the fusion mode. The values of these hyperparameters have a great influence on the performance of the model. The above related content has been added to the discussion section of the manuscript.

 

Comments 3: - What were the factors that led to the choice of Shuffle Attention (SA) and Global Attention Mechanism (GAM) over other methods, and how do the methods in question affect dynamic accuracy?

Response 3: Thank you for your feedback on our manuscript. Traditional Attention mechanisms, such as CBAM (Convolutional Block Attention Module), have a high demand for computing resources, which may increase the computational complexity of the network. Therefore, we choose a more efficient lightweight Shuffle Attention (SA) mechanism to process the channel dimension sub-features of the packet in parallel, and use the channel Shuffle operator to realize information communication between different sub-features, thus improving the performance while reducing the computing cost. In addition, traditional attentional mechanisms such as SE (Squeeze-and-Excitation) only consider the feature relationship in the channel dimension, and may not be able to fine-tune the information in the spatial dimension. Therefore, we refer to the Global Attention Mechanism (GAM), which has the advantage of efficiently capturing global context information and improving the network's awareness of global features. Finally, SA and GAM can improve dynamic accuracy by balancing local and global context information, as well as optimizing computational efficiency without compromising the model's ability to dynamically recognize important features. These mechanisms enable the model to quickly adapt to different input complexities and changing data dynamics, thereby improving overall performance and robustness.

 

Comments 4: - How the implementation of your YOLO-BGS algorithm for fabric defect detection tip the scales on the efficiency of computation and accuracy in comparison to other algorithms?

Response 4: Thank you for your feedback on our manuscript. By introducing the improved BiFPN network in YOLOv8n, YOLO-BGS improves the efficiency of multi-scale feature fusion and enhances the detection capability of small targets and complex scenes. At the same time, the GAM is added to enhance the understanding of the global structure and relationship of the network, improve the multi-scale target perception ability, and improve the detection accuracy. In addition, by adding SA mechanism, the model can process the input data more efficiently and reduce the amount of unnecessary computation, thus enhancing the efficiency of object detection.

 

Comments 5: - For training and testing the model, you could you give me more details about the dataset, for example, the different types of defects, the size of the dataset, and the prepossessing steps applied?

Response 5: Thank you for pointing this out. We have made relevant additions in the manuscript. The dataset we built comprises 800 grayscale images, each measuring 416×416 pixels, which facilitates data uniformity and simplifies processing. Additionally, applying transformations such as rotation, flipping, and scaling enhances data variability and mitigates overfitting risks. The images represent a range of common defect types, including but not limited to yarn breakage, holes, stains, buttonholes, hairballs, and scuff marks, effectively illustrating typical challenges in fabric production. The dataset is categorized into training, validation, and test sets, with an 8:1:1 distribution ratio. Thus, the training set contains 640 images, the validation set includes 60 images, and the test set comprises 80 images. Furthermore, the dataset is annotated using a markup tool. The training set's labeled data includes four categories of case distributions, bounding boxes, center coordinates of bounding boxes (x, y), along with width and height measurements.

 

Comments 6: - What other performance metrics were used in addition to mAP to evaluate the model? How did YOLO-BGS perform with these metrics compared to the baseline YOLOv8n? 

Response 6: Thank you for your valuable feedback on our paper. In addition to mAP, we also used three performance metrics: accuracy (P), recall (R), and F1 to evaluate the model. The YOLO-BGS is significantly better than the original YOLOv8n model in four key indicators: accuracy, recall rate, F1 score and mAP, and the performance increases by 3.7%, 4.1%, 3.9% and 3.6%, respectively.

 

Comments 7: - The ability of YOLO-BGS to generalize the unseen patches of faults, which were left out in the training phase was also tested. The model is also supposed to handle the changes in fabric patterns and textures effectively. How was the effect of the above-pointed out issues engaged with?

Response 7: Thank you very much for your valuable insight. At present, our main work is to improve the performance of fabric defect detection model, and it has not involved the recognition of some changes in pattern and texture. Based on your valuable suggestions, we will continue to adjust and optimize the model according to the actual application situation in the subsequent work, bringing more innovation and value to the fabric manufacturing industry.

Reviewer 2 Report

Comments and Suggestions for Authors

The content of the article is quite interesting. The authors used various methods to detect problematic aspects in the textile industry and reduce production losses from defective products, which will significantly increase the company's performance from the innovations proposed by the authors. At the same time, it is necessary to clarify the following debatable points:

1. Page 2, Line 56-57: Incorrect method name: Gray Level Cookcurrence Matrix. Must be: Gray Level Co-Occurrence Matrix

2. Page 9, Line 294:  The first step in obtaining spatial statistics from 𝑋𝑖2 is to apply Group Normalization (GN).

It is worth evaluating the level of error introduced by the normalization procedure

3. Page 9, Line 295: The authors state: fusion linear function is used. 

The linear nature of the fusion function needs substantiation.

4. Page 9, Line 297-298: generating a feature map with spatial attention weights that enhance the importance of specific regions. 

Isn't it worth detailing the procedure for determining the spatial weight, indicating whether this procedure can introduce an error and how expert it is?

Author Response

  1. Summary

Thank you very much for taking the time to review this manuscript. We have provided detailed responses to each of your valuable comments. Please review them.

  1. Point-by-point response to Comments and Suggestions for Authors

Comments 1: - Page 2, Line 56-57: Incorrect method name: Gray Level Cookcurrence Matrix. Must be: Gray Level Co-Occurrence Matrix.

Response 1: Thank you very much for correcting the error in our manuscript. We have completed the modification on Page 2, Line 56-57.

Comments 2: - Page 9, Line 294:  The first step in obtaining spatial statistics from  is to apply Group Normalization (GN). It is worth evaluating the level of error introduced by the normalization procedure.

Response 1: Thank you very much for your valuable advice. We think it is necessary to evaluate the error introduced by GN. This evaluation minimizes the error range and optimizes model performance. Therefore, we need to pay attention to the following aspects. First, the influence of group division on the normalization effect, to find the optimal group number to balance the correlation utilization between channels and the normalization granularity. The second is the statistical accuracy, especially the estimation deviation of the mean variance in the small sample group. The third is hierarchical sensitivity, which analyzes the influence of different layer characteristics on GN performance. Finally, the improvement or loss of model generalization ability is quantified by comparing with other regularization methods on verification set. We have added the above to the original manuscript on Page 9, Line 295-299.

Comments 3: - Page 9, Line 295: The authors state: fusion linear function is used. The linear nature of the fusion function needs substantiation.

Response 1: Thank you so much for pointing that out. The fusion linear function can dynamically modify weights. It enhances feature integration and minimizes complexity. Simultaneously, this function ensures efficient computation and lowers the demands of model training. Its inter-channel interactions and nonlinear transformations notably enhance the capacity for feature representation. We have added the above to the original manuscript on Page 9, Line 299-302.

 

Comments 4: - Page 9, Line 297-298: generating a feature map with spatial attention weights that enhance the importance of specific regions.

Isn't it worth detailing the procedure for determining the spatial weight, indicating whether this procedure can introduce an error and how expert it is?

Response 1: Thank you very much for your comments. The process of determining spatial attention weights can be divided into four steps: The first step is to convert input features into query (Q), key (K), and value (V) vectors through linear projection. These vectors capture different aspects of the attentional computational input. The second step derives the attention weight by calculating the compatibility (usually by dot product) between the query vector at one location and all key vectors at multiple locations. This quantifies how much information each position should receive from other positions. To ensure that the attention weights are interpretable and usable, the third step is to normalize them, usually using the softmax function, which sums them to 1 and converts them into probabilities. Finally, standardized attention weights are applied to value vectors to create an output feature map where each element is a weighted sum of all position values, highlighting the most relevant regions. However, errors may occur when determining spatial weights caused by quantization bias, attention bias, insufficient normalization, or overfitting. This process is highly specialized and requires a deep understanding of deep learning theories, knowledge of attention mechanisms and mathematical principles. We combine practice adaptation and validation to ensure the effectiveness and efficiency of the model. We have added the above to the original manuscript on Page 9, Line 302-307.

Reviewer 3 Report

Comments and Suggestions for Authors

The authors describe in detail the equipment, its operation, installation, and data analysis in the manuscript. The equipment improvement is detailed. However, the authors do not highlight how this relates to the aims of the journal Sustainability. Content-wise, the manuscript would be more suited to a hardware-related journal.

Author Response

  1. Summary

Thank you very much for taking the time to review this manuscript. We have provided detailed responses to each of your valuable comments. Please review them.

  1. Point-by-point response to Comments and Suggestions for Authors

Comments 1: - The authors describe in detail the equipment, its operation, installation, and data analysis in the manuscript. The equipment improvement is detailed. However, the authors do not highlight how this relates to the aims of the journal Sustainability. Content-wise, the manuscript would be more suited to a hardware-related journal.

Response 1: Thank you very much for your recognition of our manuscript and your valuable feedback. The relationship between our articles and the goals of the Journal Sustainability can be analyzed from the following perspectives: First, the YOLO-BGS model proposed in this paper integrates BiFPN, GAM and SA mechanisms to efficiently detect textile defects, reduce resource waste and improve raw material utilization, which is in line with the resource efficiency principle of sustainable development. Second, the technology reduces the defect rate, reduces the energy consumption and environmental emissions caused by rework, supports green production, and is in line with environmental protection goals. In addition, it improves product quality, strengthens corporate competitiveness, strengthens industrial chains and promotes economic sustainability. Finally, as an example of technological innovation, this paper emphasizes the role of artificial intelligence in industrial transformation, responding to the Sustainable Development Goals' advocacy of innovation and knowledge sharing. The above mentioned content is also emphasized in the abstract, introduction, discussion and conclusion of the manuscript. Thanks again for your feedback.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

Thank you for addressing the concerns of this reviewer. I have no more comments.

Reviewer 3 Report

Comments and Suggestions for Authors

The manuscript is not corrected according to the comments.

Back to TopTop