Research on Automated Fiber Placement Surface Defect Detection Based on Improved YOLOv7

Wen, Liwei; Li, Shihao; Dong, Zhentao; Shen, Haiqing; Xu, Entao

doi:10.3390/app14135657

Open AccessArticle

Research on Automated Fiber Placement Surface Defect Detection Based on Improved YOLOv7

by

Liwei Wen

^*,

Shihao Li

,

Zhentao Dong

,

Haiqing Shen

and

Entao Xu

College of Material Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(13), 5657; https://doi.org/10.3390/app14135657

Submission received: 27 April 2024 / Revised: 9 June 2024 / Accepted: 10 June 2024 / Published: 28 June 2024

Download

Browse Figures

Versions Notes

Abstract

:

Due to the black and glossy appearance of the carbon fiber prepreg bundle surface, the accurate identification of surface defects in automated fiber placement (AFP) presents a high level of difficulty. Currently, the enhanced YOLOv7 algorithm demonstrates certain performance advantages in this detection task, yet issues with missed detections, false alarms, and low confidence levels persist. Therefore, this study proposes an improved YOLOv7 algorithm to further enhance the performance and generalization of surface defect detection in AFP. Firstly, to enhance the model’s feature extraction capability, the BiFormer attention mechanism is introduced to make the model pay more attention to small target defects, thereby improving feature discriminability. Next, the AFPN structure is used to replace the PAFPN at the neck layer to strengthen feature fusion, preserve semantic information to a greater extent, and finely integrate multi-scale features. Finally, WIoU is adopted to replace CIoU as the bounding box regression loss function, making it more sensitive to small targets, enabling more accurate prediction of object bounding boxes, and enhancing the model’s detection accuracy and generalization capability. Through a series of ablation experiments, the improved YOLOv7 shows a 10.5% increase in mAP and a 14 FPS increase in frame rate, with a maximum detection speed of 35 m/min during the AFP process, meeting the requirements of online detection and thus being able to be applied to surface defect detection in AFP operations.

Keywords:

automated fiber placement; deep learning; defect detection; YOLOv7; attention mechanism

1. Introduction

Composite materials are assembled structures combined of different materials, characterized by being lightweight, having high strength, and exhibiting corrosion resistance, making them widely utilized in aerospace, automotive manufacturing, and other fields [1]. The significant increase in the use of composite materials relies on the support of automated fiber placement (AFP) technology. AFP technology combines the advantages of tape winding technology and tape laying technology. It utilizes a placement head to form multiple pre-impregnated fiber tows or dry fiber bundles into a narrow band of adjustable width, which is then laid on the mold surface and consolidated by pressure rollers and heating devices for shaping. A typical fiber placement machine, as shown in Figure 1, mainly consists of three parts: the machine body, the creel system, and the fiber placement head. The primary function of the machine body is to enable the rapid movement of the placement head and the positioning in spatial coordinates. The creel system is responsible for the storage, unwinding, film recovery, tension control, transmission, and guiding of the prepreg tow material rolls. The placement head primarily performs the functions of pressing, feeding, cutting, guiding, heating, and rolling the prepreg tow, placing it onto the mold [2].

However, the specificity of the AFP process poses a series of laying quality issues, primarily manifested in various surface defects generated during AFP, including foreign objects, overlaps, gaps, and twists. These defects lead to negative impacts such as local heterogeneity and stress concentration in composite materials, resulting in a potential maximum performance reduction of 12% [3,4,5,6,7]. Therefore, timely and accurate detection and repair of these defects are crucial.

Current methods for detecting surface defects in AFP primarily rely on manual inspection, yet this approach presents some notable issues. Firstly, manual inspection requires a significant amount of time, effort, and human resources. Approximately 20% of the time in the composite material automated laying process is spent inspecting and repairing surface defects, severely limiting productivity improvement. Secondly, manual inspection suffers from subjectivity and consistency issues, as different inspectors may have varying judgments regarding defects, especially concerning subtle defects. Therefore, the development of an AFP defect detection system that can enhance detection efficiency and accuracy and reduce human resource allocation is of significant importance for improving the performance and product quality of AFP machines.

Currently, there are three main technological approaches for surface defect detection in AFP of composite materials. One approach is based on laser technology for detection. The laser projection-assisted inspection system developed by Farjad et al. (2015) [8] can only achieve assisted inspection, fundamentally still relying on manual inspection. With the widespread application of laser triangulation sensors in industrial inspection, Kitson et al. [9] integrated laser triangulation sensors on the laying head to achieve defect detection of gaps, overlaps, and foreign objects to some extent. However, after in-depth research by many researchers, it was found that relying solely on the position information of the filament edges results in low accuracy in determining surface defects of the layers, and the detection speed is also not ideal. Another approach is based on infrared thermal imaging defect detection technology. Denkena et al. (2017) [10] proposed an online automatic laying detection scheme based on infrared thermal imaging and computer image processing technology. The core of this system is to use infrared thermal imaging combined with edge detection algorithms to extract filament shape and position, thereby determining laying information and identifying defects such as gaps and twists. Similarly, Schmidt et al. (2017) [11] have also realized online defect identification using this approach. Although the defect detection scheme based on infrared thermal imaging has shown certain effectiveness, it still has some limitations. Firstly, temperature data can be easily affected by environmental factors such as temperature gradients, humidity, and airflow, which may lead to misjudgment or missed defects. Secondly, the detection speed of the infrared thermal imager is relatively slow, especially in cases of large-area laying or high-speed laying, which may not meet the real-time monitoring requirements. The third approach is based on visible-light image defect detection technology. Soucy et al. [12] installed cameras on the laying head to monitor laying defects online. However, ambient light may cause changes in the brightness of the laying area, thereby interfering with image capture and analysis.

In recent years, the development of deep learning and computer vision technologies has provided new solutions for surface defect detection in AFP. Through learning from large amounts of data, deep learning models can automatically extract features and classify and detect defects. There are two common types of object detection algorithms in deep learning: two-stage detection algorithms such as Faster R-CNN, and single-stage detection algorithms such as the YOLO series. Two-stage detection algorithms have high detection accuracy but are slow in speed, while single-stage detection algorithms have relatively high detection accuracy and fast detection speed, meeting the requirements for online detection. Zhang et al. (2022) [13] proposed a multi-scale AFP defect detection algorithm, SPFFY-CA, enhancing automatic defect recognition efficiency and accuracy in automated fiber placement processes. Meister et al. (2023) [14] presented a novel approach utilizing Convolutional and Recurrent Neural Networks for real-time defect detection in automated fiber placement processes, offering a significant advancement for automated inspection systems in composite engineering. Deep learning-based detection methods have been widely applied in various industries [15,16,17,18,19,20,21,22,23,24] but have not been extensively used in the field of AFP. This can be attributed to two main reasons: firstly, the lack of high-quality datasets for surface defect inspection in composite material automated laying processes; and secondly, the relatively low detection rate and accuracy of small-target defects generated in AFP, which do not suffice for stable industrial applications.

Building upon the aforementioned background, this paper proposes an improved YOLOv7-based automated surface defect detection algorithm for AFP to achieve detection of surface defects in automated laying. The main contributions of this paper are as follows:

1. Utilizing the self-developed AFP equipment from Nanjing University of Aeronautics and Astronautics (NUAA) to establish a detection platform, a composite material automated laying surface defect dataset comprising over 5000 images was constructed. The defect categories include gaps, overlaps, foreign objects, twists, and wrinkles.

2. We introduce the BiFormer attention mechanism to enhance the backbone network, capturing fine information in the feature map, and strengthening the relevance of contextual information in the image to improve the extraction capability of small target features such as gaps and overlaps.

3. Replacing the PAFPN structure with the AFPN in the neck layer, AFPN fuses adjacent low-level features and gradually integrates high-level features into the fusion process, effectively avoiding significant semantic differences between non-adjacent levels, ensuring a balanced retention of detailed and semantic information in the AFP defect detection task.

4. We adopt the WIoU loss function to replace the original CIoU loss function as the bounding box regression loss function. The WIoU loss function features a dynamic non-monotonic focus mechanism, reducing the gradient gain for simple defects. Compared to CIoU, it can focus on challenging cases such as overlapping or occluding surface defects, enhancing the model’s generalization capability.

2. Overview and Detection Principle of YOLOv7

The YOLO (You Only Look Once) neural network is a real-time object detection system that transforms the object detection problem into a single regression problem. The YOLO network divides the input image into a fixed number of grids, with each grid responsible for detecting objects that fall within it. Each grid predicts multiple bounding boxes and their corresponding class probabilities. Through a single forward pass, YOLO can simultaneously output multiple detection results, making it extremely fast. The network design includes convolutional layers and fully connected layers, with the convolutional layers used for feature extraction and the fully connected layers used for predicting bounding boxes and classes [25]. YOLOv1 was proposed in 2015, using the Darknet-19 network for end-to-end object detection, but it had shortcomings in small-object detection and localization accuracy. YOLOv2, introduced in 2016, addressed these issues by employing an improved Darknet-19 network and a multi-scale feature map design, enhancing detection accuracy. YOLOv3, launched in 2018, utilized the Darknet-53 network and feature pyramid network, further boosting performance. YOLOv4, released in 2020, adopted the CSPDarknet53 network, SAM attention mechanism, and other techniques, significantly improving detection performance. In the same year, YOLOv5 was introduced, optimizing the structure to enhance detection accuracy and speed. YOLOv7, released in 2021, achieved further improvements in both accuracy and speed, making it one of the most powerful detection algorithms available today [25,26,27,28].

The YOLOv7 algorithm has achieved new heights in speed and accuracy ranging from 5 FPS to 160 FPS, surpassing previous real-time object detection models with the highest accuracy of 56.8% AP on the GPU V100, achieving 30 FPS or higher [26]. YOLOv7 is an anchor-based object detection algorithm, building upon the previous versions with numerous improvements, including the integration of the Extended-ELAN (EELAN) aggregation network, convolution re-parameterization, and the proposal of concatenation-based models for model scaling [26]. The algorithm’s model structure consists of the input layer, the backbone network for feature extraction, and the head network for feature fusion, loss calculation, and prediction.

The backbone network comprises several BConv layers, E-ELAN modules, and MPConv modules [26,29]. The BConv layer fuses convolution and Batch Normalization (BN) layers into a single convolution layer to reduce computational complexity. The E-ELAN module increases the base of features by using multiple sets of convolutions to add features and then combines these groups of features, enhancing the model’s learning capability and parameter utilization without disrupting the gradient path or encountering issues like gradient explosion or vanishing gradients. The MPConv layer has two branches: one branch performs downsampling through max pooling to expand the receptive field, while the other branch achieves channel variation through convolution layers with different kernel sizes and adds the results together to enhance the network’s generalization ability.

In the head network, layers such as SPPCSPC (improved spatial pyramid pooling structure), several BConv layers, several MPConv layers, several concatenation layers, and RepConv layers (Repvgg_block) are used for final classification, regression prediction of the target image, and the generation of output results [26,29].

3. Model Improvements Based on the YOLOv7 Algorithm

3.1. Enhancement of Backbone Network with BiFormer Attention Mechanism Based on Transformer

The Transformer, as a widely utilized network model in the recent field of computer vision, relies on self-attention mechanisms and possesses several advantages. Firstly, it is capable of effectively capturing long-range dependencies in data [30], which is crucial for handling various complex data tasks. Secondly, the model design of the Transformer is flexible and free from inductive bias, enabling it to adapt to various types of data [31]. Additionally, the Transformer exhibits high parallelism, which accelerates the training and inference processes of large-scale models, thus improving computational efficiency [32,33,34].

Traditional object detection algorithms like YOLOv7 extract image features using operations such as convolution and pooling. However, this approach has limitations; to address this issue, this study introduces a Transformer-based BiFormer attention mechanism module [35].

In the BiFormer model, the two-dimensional input feature map

X \in R^{H \times W \times C}

is first divided into

S \times S

non-overlapping regions to ensure each region contains

\frac{H W}{S^{2}}

feature vectors. Subsequently,

X

undergoes reshaping to obtain

X^{r} \in R^{S^{2} \times \frac{H W}{S^{2}} \times C}

. Following this, linear projection is applied to

X^{r}

using fully connected layers, resulting in a new

Q, K, V {\in R}^{S^{2} \times \frac{H W}{S^{2}} \times C}

. Through this projection operation, BiFormer can capture correlations between different positions and channels, providing more comprehensive input information for the subsequent attention mechanism.

Q = X^{r} W^{q}, K = X^{r} W^{k}, V = X^{r} W^{v}

(1)

where

W^{q}

,

W^{k}

, and

W^{v}

are the projection weights for query, key, and value, with dimensions of

R^{C \times C}

.

To facilitate region-to-region routing, BiFormer adopts a directed graph approach. Firstly, BiFormer performs region-wise average pooling on

Q

and

K

to obtain

Q^{r}, K^{r} \in R^{S^{2} \times C}

. Through this step, BiFormer can calculate the similarity between queries and keys at the region level. Subsequently, BiFormer conducts matrix multiplication between the transpose of

Q^{r}

and

K^{r}

to generate an adjacency matrix of the region-to-region affinity graph, denoted as

A^{r} \in R^{S^{2} \times S^{2}}

.

A^{r} = Q^{r} {(K^{r})}^{T}

(2)

In order to further optimize the affinity graph, BiFormer retains the top k connections of each region with other regions. BiFormer introduces a routing index matrix

I_{r} \in N^{S^{2} \times S^{2}}

, where each row of

I_{r}

contains the indices of the top k most relevant regions to the i-th region.

Through the routing index matrix

I_{r}

, Equation (3) [35] filters and collects

K

and

V

to obtain

K^{g}

and

V^{g}

. Finally, a fine-grained token-to-token attention mechanism is applied to

Q

,

K^{g}

, and

V^{g}

collectively, as shown in Equation (4), where LCE [36] represents depth-wise separable convolution with a kernel size of 5 and a stride of 1.

K^{g} = g a t h e r (K, I^{r}), V^{g} = g a t h e r (V, I^{r})

(3)

O = A t t e n t i o n (Q, K^{g}, V^{g}) + L C E (V)

(4)

In summary, the BiFormer attention mechanism guides YOLOv7 to focus more on surface defects in AFP, disregarding irrelevant regions. When calculating attention weights, it considers relationships between the same type of surface defects as well as different types of surface defects. This allows it to capture more comprehensive feature relationships and effectively detect surface defects in AFP. Furthermore, the BiFormer attention mechanism retains certain positional information, enabling better understanding and representation of sequential data through relative position encoding. For instance, in the case of overlap and gap defects in AFP, this positional information helps the model grasp context information of defects and identify their precise locations more accurately.

3.2. Enhancement of Neck Layer Feature Fusion with Introduction of AFPN Instead of PAFPN

For the defect detection task, critical features need to encompass detailed semantic information of objects and be extracted through deep neural networks. However, in the current feature pyramid structure, high-level features propagate through multiple intermediate scales, interact with features at these scales, and eventually fuse with low-level features. However, during the propagation and interaction process, the semantic information of high-level features may be lost or weakened. Meanwhile, the bottom–up path structure of the Progressive Attention Feature Pyramid Network (PAFPN) [37,38,39,40] employed by YOLOv7 may introduce the opposite issue: detailed information of low-level features may be lost or weakened during propagation and interaction. Therefore, here, we introduce the Asymptotic Feature Pyramid Network (AFPN) [41] to enhance feature fusion. The AFPN initializes by fusing adjacent low-level features and gradually incorporates high-level features into the fusion process. This strategy can prevent significant semantic differences between non-adjacent levels and ensure a balanced preservation of detailed and semantic information in object detection tasks [41].

Similar to other object detection methods based on feature pyramid networks [37], the AFPN extracts features at different pyramid levels from the backbone network to form a set of multi-scale features. In the feature fusion step, low-level features first interact with subsequent levels and are eventually fused with high-level features. Through convolution operations, a set of multi-scale features is generated. For the YOLO model, only a subset of features is input into the feature pyramid network, producing the output set.

The AFPN architecture is illustrated in Figure 2. By gradually fusing features at different pyramid levels, the AFPN architecture brings the features closer to semantic information. This progressive fusion approach reduces semantic gaps compared to directly fusing non-adjacent hierarchical features. For instance, extracting the final layer feature from each feature level of the backbone network forms a set of features at different scales [37], namely

\{C_{2}, C_{3}, C_{4}, C_{5}\}

. The fusion of features between

C_{2}

and

C_{3}

helps reduce the gap between them, and the adjacency of

C_{3}

and

C_{4}

further reduces the semantic gap between

C_{2}

and

C_{4}

. This progressive fusion approach enhances the effectiveness of feature fusion.

For size adjustment and feature fusion in the AFPN, 1 × 1 convolutions and bilinear interpolation are used for upsampling, while different convolution kernels and strides are used for downsampling. Through four residual units [42], features are learned. Unlike YOLOv7, there is no 8x upsampling and downsampling.

During the multi-level feature fusion process, the AFPN employs ASFF [43] to allocate different spatial weights to features from different layers, enhancing the importance of key levels while mitigating conflicting information from different objects. Figure 3 illustrates the fusion of features from three levels. Let us assume that

x_{i j}^{n \to l}

represents the feature vector at position

(i, j)

from the nth layer to the lth layer. Through adaptive spatial fusion of multi-level features, we obtain the fused feature vector, denoted as

y_{i j}^{l}

. It is defined as a linear combination of feature vectors

x_{i j}^{1 \to l}

,

x_{i j}^{2 \to l},

and

x_{i j}^{3 \to l}

, as shown in Equation (5).

y_{i j}^{l} = α_{i j}^{l} x_{i j}^{1 \to l} + β_{i j}^{l} x_{i j}^{2 \to l} + γ_{i j}^{l} x_{i j}^{3 \to l}

(5)

where

α_{i j}^{l}

,

β_{i j}^{l}

, and

γ_{i j}^{l}

represent the spatial weights of the three levels of features at level l, and

α_{i j}^{l} + β_{i j}^{l} + γ_{i j}^{l} = 1

.

3.3. Improvement in Bounding Box Regression Loss Function with Introduction of WIoU Instead of CIoU

Since YOLOv1, the YOLO series models have widely adopted various loss functions, including the bounding box regression loss function, classification loss function, and confidence loss function, to optimize the model’s predictions and improve the accuracy and performance of object detection. The BBR loss function guides target localization, the classification loss function determines the target’s category, and the confidence loss function measures the accuracy and confidence of the predicted box. The combination of these loss functions has contributed to the outstanding performance of the YOLO series models in object detection tasks.

In YOLOv7, the CIoU loss function, used in YOLOv5, has been adopted to calculate the regression loss, as shown in Equations (6)–(9) [44]:

v = \frac{4}{π^{2}} {(\arctan \frac{w^{g t}}{h^{g t}} - \arctan \frac{w}{h})}^{2}

(6)

α = (\frac{v}{(1 - I o U) + v})

(7)

R_{C I o U} = (\frac{ρ^{2} (b, b^{g t})}{c^{2}} + α v)

(8)

L_{C I o U} = 1 - I o U + (\frac{ρ^{2} (b, b^{g t})}{c^{2}} + α v)

(9)

L_{C I o U} = 1 - I o U + (\frac{ρ^{2} (b, b^{g t})}{c^{2}} + α v)

(10)

where:

α

is the weight parameter;

v

represents the similarity between the aspect ratios of the predicted box and the ground truth box;

c

is the diagonal distance of the minimum enclosing box containing the predicted box and the ground truth box;

ρ^{2} (b, b^{g t})

represents the Euclidean distance between the centers of the predicted box and the ground truth box;

R_{C I o U}

is the difference penalty term.

To address the issue of low-quality examples in the training data and enhance the model’s generalization ability, Wise-IoU (WIoU) [45] was proposed as a distance attention mechanism and the WIoUv1 model was constructed using a two-layer attention mechanism. By introducing a parameter

R_{W I o U}

with a range of [1, e), it significantly enhances the

L_{I o U}

for normal-quality anchors. Additionally, another parameter,

L_{I o U}

, with a range of [0, 1], is introduced to notably decrease the

R_{W I o U}

for high-quality anchor boxes. When there is good overlap between the anchor box and the target box, this parameter places more importance on the distance between their center points. The incorporation of these two attention mechanisms in designing the loss function allows for reducing the penalty on low-quality examples due to geometric factors and for minimizing interference with the model. Specifically, the introduction of

R_{W I o U}

directs more attention toward anchors of normal quality, thereby improving the model’s ability to handle such anchors. On the other hand, the introduction of

L_{I o U}

primarily adjusts high-quality anchor boxes by reducing

R_{W I o U}

, prompting the model to focus more on the distance between the center points of the target box and the anchor box. This enhancement contributes to the performance of object detection algorithms and enhances the model’s accuracy in capturing the shape of targets.

Next, the dynamic non-monotonic anchor box fine-tuning model is introduced, which evaluates the abnormality of anchor boxes by comparing

L_{I o U}

and

\bar{L_{I o U}}

. A lower abnormality indicates higher quality of the anchor box, and thus, smaller gradient gains are assigned to focus the fine-tuning efforts on anchors of normal quality. Additionally, assigning smaller gradient gains to anchor boxes with higher abnormality effectively prevents significant harmful gradients caused by low-quality examples. A non-monotonic focus coefficient is built using

β

and integrated into WIoUv1. By introducing the dynamic non-monotonic FM, we can more flexibly handle the abnormality of anchor boxes, thereby enhancing the model’s performance.

In conclusion, the final formula for WIoUv3 is as follows:

L_{W I o U v 1} = R_{W I o U} L_{I o U}

(11)

R_{W I o U} = \exp (\frac{{(x - x_{g t})}^{2} + {(y - y_{g t})}^{2}}{{(W_{g}^{2} + H_{g}^{2})}^{*}})

(12)

r = β / (δ α^(β - δ))

(13)

L_{W I o U v 3} = r L_{W I o U v 1}

(14)

4. Experimental Setup and Result Analysis

4.1. Overall Design of AFP Surface Defect Detection Experimental Platform

The experiment is based on the 8-tow automated fiber placement machine developed by Nanjing University of Aeronautics and Astronautics (NUAA), as shown in Figure 4. A complete defect detection system was integrated, including computer systems, industrial cameras, lighting sources, lenses, and other equipment. The automated fiber placement machine is responsible for automatic fiber placement of prepreg materials, while the computer control system operates and conducts defect detection. The industrial camera captures real-time surface images, the lighting source ensures image clarity and contrast, and the lens magnifies and focuses the images. By integrating these devices and technologies, the system can automatically detect defects on the laid surface, providing strong support for quality control in composite material production.

During the AFP process, the industrial camera monitors the entire layup process and communicates with the computer. Once the industrial camera detects surface defects, the computer communicates with the programmable logic controller (PLC) of the AFP machine to trigger alarms or reduce the speed. In this study, Python programming was utilized to facilitate communication between the computer and the PLC of the AFP machine, with the Snap7 library serving as the communication tool, as illustrated in Figure 5.

The defect detection algorithm analyzes images to promptly identify surface defects. When a defect is detected, the computer triggers a stop signal to the Siemens PLC. To achieve closed-loop control between the computer and PLC, a function named trigger_stop() was defined in Python to set the boolean variable of the stop address to True and send the stop signal to the PLC using the Snap7 library.

After successful deployment, the computer acquires real-time image data through the industrial camera for defect detection and analysis. Once a defect is detected, the computer sends instructions to trigger the AFP machine to stop and perform necessary actions. This closed-loop control workflow not only enhances the efficiency and reliability of the AFP platform but also effectively improves its capability to handle surface defects.

4.2. AFP Surface Defect Dataset

Currently, there is no established comprehensive open dataset for surface defects in composite material AFP within the industry. Considering the data quantity requirements of deep learning, this dataset primarily consists of surface defect data generated during AFP operations on actual typical components. Additionally, artificial simulation experiments were conducted to further enrich the dataset. The carbon fiber prepreg used in this experiment is a cutting-edge research material developed by the Aerospace Research Institute of Materials & Processing Technology (ARIMT), with the grade being MT7001603B.

During the actual laying process, we use an industrial camera to record videos to capture the complete visual information of the laying process. Subsequently, the video data are processed to convert them into image sequences, and the data are then selected for subsequent annotation. Figure 6 illustrates the collection of overlap and gap defects during the AFP process. However, due to the limited number of captured images and potential selection bias, the types and forms of surface defects may not be fully covered. To obtain more defect images for research purposes, we need to artificially simulate AFP operations to generate diverse defect images, such as overlapping and gapped defects during the laying process.

To simulate actual defect images, it is necessary to replicate the actual laying environment, including camera position, light source position, and shooting angle, among others. By accurately simulating these conditions, the generated defect images become more realistic. This approach helps in obtaining defect images that closely resemble those in real laying processes, thereby providing accurate data for research and algorithm evaluation. Subsequently, we simulate defects in AFP, such as adjusting the width of single-fiber placement tracks to create gap and overlap defects, or introducing defects in multi-fiber placement tracks. Meanwhile, the camera is fixed for video capturing, and defect images are saved and captured. Additionally, two types of defects occur during the placement process; by simulating these defects and different background environments, the dataset is enriched with diverse training and testing samples to improve algorithm adaptability. Through adjusting placement tracks, simulating defects at different placement levels and backgrounds, and introducing wrinkle defects, we enrich the dataset. By combining real and simulated surface defects, diverse samples are provided to challenge algorithm training and evaluation, aiding in optimizing defect detection algorithms. This simulated dataset facilitates the enhancement of defect detection in production and provides valuable references.

Datasets are the foundation for building and training models. In this study, approximately 8000 sample images were collected and gradually filtered to form the final dataset, totaling over 5000 images. The dataset was divided into training, validation, and test sets at a ratio of 8:1:1. To more accurately describe and distinguish different defect scenarios, these defects are categorized into five different types: gap, overlap, foreign material, wrinkle, and twist.

4.3. Evaluation Indicators

The confusion matrix is a common evaluation method used in object detection tasks to help illustrate the performance of a model. “TN” stands for true negative, which shows the number of negative examples classified accurately. Similarly, “TP” stands for true positive, which indicates the number of positive examples classified accurately. The term “FP” shows the false positive value, i.e., the number of actual negative examples incorrectly classified as positive; and “FN” means the false negative value, which is the number of actual positive examples incorrectly classified as negative [46].

In object detection tasks, the following commonly used performance metrics are as shown in Table 1.

Precision is calculated among the positive results correctly detected [47]. The calculation formula is as follows:

P r e c i s i o n = \frac{T P}{T P + F P}

(15)

Recall is calculated among the boxes that should be detected (ground truth) and the true positives [47]. The calculation formula is as follows:

R e c a l l = \frac{T P}{T P + F N}

(16)

AP: Average precision is the area-under-curve (AUC) region of the PR curve [47]. The average precision can comprehensively consider the performance of the detection algorithm at different confidence levels.

mAP is the average of the AP for each category. For multi-class object detection tasks, the average precision for each category is typically calculated, and then, the mean average precision is obtained by averaging them. mAP can more comprehensively evaluate the performance of the entire object detection system.

IoU measures the overlap between the detection box and the ground truth box. It is calculated by computing the ratio of the intersection area of the two boxes to the union area. Typically, if the IoU of two boxes is greater than a predefined threshold (e.g., 0.7), the detection is considered correct.

The inference time can be simply understood as the time needed to obtain the output results from the input image through the model.

FPS represents the number of images the model can process in a unit of time. Frames per second is a key metric for assessing the real-time performance of a detection system.

The detection time for a single image refers to the time required to process one image. It is similar to inference time, but more specifically describes the time cost of processing a single image.

Objectness loss (confidence loss) determines whether there are objects in the predicted bounding box. This loss function helps the model to distinguish the background and foreground areas.

Location loss (box regression loss) is applied only when the predicted bounding box contains an object. The loss function measures the errors between the predicted bounding box and the ground truth.

Classification loss determines which category the object in the prediction box belongs to.

4.4. Ablation Study

4.4.1. Experimental Introduction

To verify the impact of the proposed improvement method on detection performance and assess the advantages of the method proposed in this paper, we conducted a series of ablation experiments. These experiments involved five models: the original YOLOv7 model, the YOLOv7 model with the added BiFormer attention mechanism, the YOLOv7 + AFPN model with improved neck layer feature fusion, the YOLOv7 + WIoU model with improved regression box loss function, and a fusion model combining the three improvement points. For brevity, we refer to them as YOLOv7, YOLOv7+BiFormer, YOLOv7 + AFPN, YOLOv7 + WIoU, and Improved YOLOv7, respectively, where Improved YOLOv7 integrates the three optimization strategies.

All experiments were conducted under the same experimental environment and parameters. We first analyzed the training process through the training loss value to evaluate the training effectiveness of different models. Subsequently, we compared the detection accuracy and detection speed using the mAP and FPS metrics to assess the performance differences of different models in detection tasks. Lastly, we further analyzed and evaluated the actual detection effects of different algorithm models through comparisons of detected images and videos.

4.4.2. Experimental Environment and Parameter Configuration

The current experimental setup and model training parameter settings are outlined in Table 2 and Table 3.

4.4.3. Training Results and Analysis

Based on the experimental environment, parameters, and dataset described above, the five models introduced in Section 4.4.1 were trained for 300 epochs, using only the training and validation sets from the AFP dataset.

As shown in Figure 7 and Figure 8, they represent the Location loss function, Objectness loss function, and Classification loss function. We observed a gradual decrease in these loss functions over 300 iterations. In the initial 50 epochs, the decline rate of these three loss functions significantly accelerated, followed by a moderation, and stabilized at lower loss values, approaching optimal levels in the final 10 epochs.

For the Location loss, YOLOv7 + BiFormer, YOLOv7 + WIoU, and Improved YOLOv7 all exhibited a faster decrease in the initial 10 epochs, while YOLOv7 and YOLOv7 + AFPN had a slightly slower decline slope. This indicates that the YOLOv7 + BiFormer, YOLOv7 + WIoU, and Improved YOLOv7 algorithms are able to learn useful features more quickly in the initial stages and effectively reduce losses during training. Among them, YOLOv7 + BiFormer and Improved YOLOv7 both incorporate the BiFormer self-attention mechanism in the algorithm. BiFormer uses sparse sampling during feature extraction, which retains fine-grained details critical for small object defects like those in AFP surfaces. Therefore, BiFormer can better capture the subtle features of defects. The dynamic non-monotonic focusing mechanism of the WIoU loss function makes it more sensitive to features of small targets, thereby providing some optimization. After 300 iterations, the localization loss of all five algorithms converged, with Improved YOLOv7 performing the best, followed by YOLOv7+BiFormer and YOLOv7 + WIoU, then YOLOv7 + AFPN, and finally YOLOv7.

For both Objectness loss and Classification loss, the models show a fast convergence speed in the first 50 epochs, and by epoch 300, the models have essentially converged. From the curve in Figure 8a, it can be observed that the Objectness losses of YOLOv7 + BiFormer, YOLOv7 + AFPN, YOLOv7 + WIoU, and Improved YOLOv7 models are similar in magnitude. Compared to the original YOLOv7, these models achieve lower Objectness loss, thus further enhancing the precision performance of the models.

Figure 9 illustrates that under the same configuration parameters, all five algorithms underwent 300 iterations. The original YOLOv7 showed a rapid increase around the 75th iteration and began to converge, stabilizing around the 125th iteration. In comparison, the proposed improved algorithm, Improved YOLOv7, showed a rapid increase in the first 50 iterations and reached stability around the 100th iteration. This indicates that the improved model in this study achieved good performance stability in a relatively short period of time.

In summary, overall, the improved YOLOv7 shows a significant performance improvement compared to the original YOLOv7. Additionally, the other four models also demonstrate a certain level of improvement compared to the original YOLOv7. These analysis results further confirm that the proposed improved algorithm in this study exhibits superior performance in experiments, showing good convergence speed and generalization capability.

4.4.4. Test Results and Analysis

To validate the detection performance of the model, we used the model to detect and recognize surface defects of composite material in the test set automatically. The models tested in addition to the improved models YOLOv7 + BiFormer, YOLOv7 + AFPN, YOLOv7 + WIoU, and Improved YOLOv7 include other classical and representative detection model algorithms such as Faster R-CNN [48], EfficientDet [49], YOLOv5 [50], YOLOv5-tiny [50], YOLOv7 [29], and YOLOv7-tiny [29]. Table 4 illustrates the average precision (AP) of the improved models and other models in each category.

For gap defects, the AP of YOLOv7 is 79%, while YOLOv7 + BiFormer with the incorporation of the BiFormer attention mechanism significantly improves the AP to 91.5%, indicating the positive effect of the BiFormer attention mechanism in capturing subtle features of small object defects. In comparison, YOLOv7 + AFPN and YOLOv7 + WIoU show slightly smaller improvements in AP, reaching 88% and 89%, respectively. Regarding overlap defect detection, YOLOv7 achieves an AP of 83%, whereas YOLOv7 + BiFormer achieves an AP of 92.3%, demonstrating that the BiFormer attention mechanism outperforms the AFPN structure and WIoU loss function in performance. For wrinkle and twist defect detection, YOLOv7 achieves AP values of 84% and 88%, respectively, while YOLOv7 + BiFormer, YOLOv7 + AFPN, and YOLOv7 + WIoU all surpass the original algorithm’s AP values. Improved YOLOv7, in particular, stands out in performance on these two defect types, reaching AP values of 95.7% and 97.3%, attributed to the fusion of the BiFormer attention mechanism, AFPN feature fusion, and WIoU loss function advantages, enhancing sensitivity to details, feature fusion, and the accuracy of bounding box regression. Lastly, for foreign object defect detection, YOLOv7 achieves an AP of 91%, while Improved YOLOv7 achieves a high AP of 98.5%, indicating its stronger detection capability for foreign object defects. Other algorithms such as YOLOv7 + BiFormer, YOLOv7 + AFPN, and YOLOv7 + WIoU also demonstrate better performance compared to the original algorithm.

Overall, Improved YOLOv7 demonstrates superior detection performance across all defect types, with significantly higher average AP values compared to other algorithms. In particular, in wrinkle and twist defect detection, Improved YOLOv7 outperforms other algorithms, which can be attributed to its comprehensive integration of various improvement strategies. Firstly, the introduction of the BiFormer attention mechanism enables the model to better regress the size and position of targets by capturing subtle defect features, thereby enhancing the detection accuracy and recall. Secondly, the AFPN feature fusion strategy balances detailed information and semantic information, avoids significant semantic differences between different hierarchies, and strengthens feature fusion. Additionally, the WIoU loss function is more sensitive to small targets, aiding in optimizing the accuracy of bounding box regression. Improved YOLOv7 outperforms other algorithms in AFP defect detection in composite materials. This algorithm demonstrates higher accuracy and practicality in handling various defects due to the enhancement of detail sensitivity, feature fusion optimization, and accuracy improvement in bounding box regression brought about by the integration of multiple improvement strategies.

Table 5 presents the performance of each model on the test set. It is observed that compared to the original YOLOv7 (85.3% AP, 48 FPS), the comprehensive performance metrics (including mAP and FPS) of YOLOv7 + BiFormer, YOLOv7 + AFPN, YOLOv7+WIoU, and Improved YOLOv7 have all been improved. This indicates that the improvement strategies have enhanced the ability of the original YOLOv7 to handle AFP defect detection tasks in composite materials to a certain extent.

The table shows that the YOLOv7 + BiFormer model significantly outperforms the original YOLOv7, achieving a mAP of 93.8% and 55 FPS. In particular, in the detection of complex small object defects, the BiFormer attention mechanism enables the accurate capture of fine-detail features and global contextual information. Combined with sparse attention mechanisms, it enhances detection accuracy and recall.

Similarly, we observe that YOLOv7 + AFPN also achieves a higher mAP value (90.5%) and faster frame rate (52 FPS) compared to the original YOLOv7. The innovation of the AFPN lies in its introduction of an adaptive feature fusion scheme based on an FPN, which balances global semantic information and local detail information better, thereby improving the accuracy of target localization and classification.

For YOLOv7 + WIoU that incorporates the WIoU loss function, it achieves an mAP of 92.3% and an FPS of 55. The usage of the WIoU loss function optimizes target detection performance in two ways: first, its design takes into account the diversity of target scales, shapes, and positions in the detection task, enabling more accurate allocation of optimization resources for each anchor point; second, to address the issue of low-quality samples in the training data, such as defects under different lighting angles with varying recognition difficulty, the WIoU loss function dynamically allocates gradient gains based on the quality of defects in the preselected boxes. This not only reduces the corresponding loss but also prevents large harmful gradients from affecting model learning and optimization. The Improved YOLOv7 model, which incorporates all of these improvement strategies, achieves optimal performance in both mAP and FPS, reaching 95.8% and 62 FPS, respectively. This result fully confirms the effectiveness of the improvement strategies in this study and the performance improvement brought about by their combined use in target detection tasks.

4.4.5. Application Results and Analysis

To validate the detection performance of the model and compare it with the YOLOv7 model, this study applied the detection model to actual photos or videos from the composite material AFP site under detection. As shown in Figure 10, images a-c are photo data from the test set, while images d~f are screenshots from videos of the AFP process after detection. Different lighting conditions were selected to showcase various scenarios. To ensure that edge defects in the test set can be clearly displayed with their categories and confidence levels for comparison, we decided to fill the surroundings of images a~c with a white area. For the confidence information not displayed in the video detection screenshots d~f, we will clearly label it in the caption.

In Figure 10, we can see that images a1 to f1 show the results of detection using the YOLOv7 algorithm, while images a2 to f2 depict the results of detection using the improved version of YOLOv7. Below is a detailed analysis of these results:

Firstly, in image a, the original YOLOv7 algorithm exhibits missed detections for overlapping defects and lower confidence compared to the improved YOLOv7. This suggests that the improved model demonstrates higher accuracy and reliability in detecting overlapping defects. In images c and d, where the lighting angles are relatively uniform, the focus should be on the accuracy and confidence of the bounding boxes. It is evident that the improvement from the WIoU loss function in the improved YOLOv7 significantly enhances the bounding box regression, with the defect bounding boxes in c2 and d2 notably superior to those in c1 and d1. Additionally, the confidence levels in the improved YOLOv7 are also superior to those in the original YOLOv7. Furthermore, in image d, we encounter the issue of overlapping defects of multiple classes, where foreign object defects overlap with gap defects. The improved YOLOv7 exhibits better performance in addressing this issue, detecting high-quality defects that are difficult to differentiate more accurately compared to the original YOLOv7, with higher confidence levels. Lastly, in images e and f, due to dim lighting, the original YOLOv7 shows missed detections in image e1, while the improved YOLOv7 exhibits higher confidence levels under the premise of detecting the targets. Additionally, when facing situations where multiple surface defects overlap, the original YOLOv7 algorithm experiences missed detections. The dynamic and non-monotonic focusing mechanism of the replaced WIoU loss function in the improved YOLOv7 reduces the gradient gains for simple examples, enabling the model to focus on challenging instances such as occlusions, enhancing the model’s generalization performance against difficult samples with target occlusions.

Through the comparative analysis above, it is evident that the improved YOLOv7 demonstrates better detection performance and capabilities in real-world applications at composite material AFP sites.

5. Conclusions

This study proposes an AFP surface defect detection algorithm based on Improved YOLOv7, leveraging computer vision and deep learning. To address the issues of missed detections and false alarms in detecting small-sized defects and overlapping multiple defects faced by the original YOLOv7 algorithm, targeted optimization work was conducted. Firstly, the attention mechanism BiFormer was introduced to enable the model to capture subtle information in feature maps, enhance the correlation of image context information, and improve the extraction capability of features for small targets. Secondly, the AFPN was employed as a replacement for the original YOLOv7’s PAFPN. The AFPN fuses neighboring low-level features and gradually incorporates high-level features, avoiding significant semantic differences between non-adjacent levels and maintaining a balance between detailed and semantic information in the AFP defect detection task. Additionally, WIoU was introduced as a replacement for CIoU. The WIoU loss function utilizes a dynamic and non-monotonic focusing mechanism, reducing the gradient gains for simple examples and enhancing the model’s generalization performance on difficult samples with target occlusions. Through a series of ablation experiments, the effectiveness and superiority of the improved YOLOv7 model in AFP defect detection were verified. The improved YOLOv7 model achieved the best performance in terms of mAP and FPS, with values of 95.8% and 62 FPS, respectively. This confirms the effectiveness of the improvement strategy, and its detection performance meets the requirements for online detection, making it suitable for industrial applications.

Author Contributions

Conceptualization, L.W. and S.L.; methodology, L.W.; software, S.L.; validation, L.W., S.L. and Z.D.; formal analysis, H.S.; investigation, E.X.; data curation, Z.D.; writing—original draft preparation, L.W. and S.L.; writing—review and editing, L.W., S.L. and Z.D.; visualization, Z.D.; supervision, Z.D.; All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Acknowledgments

We gratefully acknowledge the support of Hiwing Aerospace Materials Research Institute (Suzhou) Co., Ltd., Suzhou, China.

Conflicts of Interest

The authors declare no conflict of interest.

References

Raju, A.; Shanmugaraja, M. Recent Researches in Fiber Reinforced Composite Materials: A Review. Mater. Today Proc. 2021, 46, 9291–9296. [Google Scholar] [CrossRef]
Wang, X.; Zhang, Y.; Zhao, C. Research Status of Automatic Fiber Placement Equipment for Composite Materials. Aeronaut. Manuf. Technol. 2018, 61, 83–90. [Google Scholar]
Adams, D.O.; Bell, S.J. Compression Strength Reductions in Composite Laminates Due to Multiple-Layer Waviness. Compos. Sci. Technol. 1995, 53, 207–212. [Google Scholar] [CrossRef]
Croft, K.; Lessard, L.; Pasini, D.; Hojjati, M.; Chen, J.; Yousefpour, A. Experimental Study of the Effect of Automated Fiber Placement Induced Defects on Performance of Composite Laminates. Compos. Part A Appl. Sci. Manuf. 2011, 42, 484–491. [Google Scholar] [CrossRef]
Cantwell, W.; Morton, J. The Significance of Damage and Defects and Their Detection in Composite Materials: A Review. J. Strain Anal. Eng. Des. 1992, 27, 29–42. [Google Scholar] [CrossRef]
Blom, A.W.; Lopes, C.S.; Kromwijk, P.J.; Gurdal, Z.; Camanho, P.P. A Theoretical Model to Study the Influence of Tow-Drop Areas on the Stiffness and Strength of Variable-Stiffness Laminates. J. Compos. Mater. 2009, 43, 403–425. [Google Scholar] [CrossRef]
Lan, M.; Cartié, D.; Davies, P.; Baley, C. Microstructure and Tensile Properties of Carbon–Epoxy Laminates Produced by Automated Fibre Placement: Influence of a Caul Plate on the Effects of Gap and Overlap Embedded Defects. Compos. Part A Appl. Sci. Manuf. 2015, 78, 124–134. [Google Scholar] [CrossRef]
Shadmehri, F.; Ioachim, O.; Pahud, O.; Brunel, J.; Landry, A.; Hoa, V.; Hojjati, M. Laser-Vision Inspection System for Automated Fiber Placement (AFP) Process. In Proceedings of the 20th International Conference on Composite Materials, Copenhagen, Danemark, 19–24 July 2015. [Google Scholar]
Kitson, L.E.; Rock, D.K.; Eder, J.E. Composite Material Laser Flaw Detection 1996. Available online: https://patents.google.com/patent/US5562788/en (accessed on 18 April 2023).
Brüning, J.; Denkena, B.; Dittrich, M.-A.; Hocke, T. Machine Learning Approach for Optimization of Automated Fiber Placement Processes. Procedia CIRP 2017, 66, 74–78. [Google Scholar] [CrossRef]
Schmidt, C.; Denkena, B.; Völtzer, K.; Hocke, T. Thermal Image-Based Monitoring for the Automated Fiber Placement Process. Procedia CIRP 2017, 62, 27–32. [Google Scholar] [CrossRef]
Soucy, K.A. In-Process Monitoring for Quality Assurance of Automated Composite Fabrication. In Review of Progress in Quantitative Nondestructive Evaluation: Volume 15A; Thompson, D.O., Chimenti, D.E., Eds.; Springer: Boston, MA, USA, 1996; pp. 2225–2231. ISBN 978-1-4613-0383-1. [Google Scholar]
Zhang, Y.; Wang, W.; Liu, Q.; Guo, Z.; Ji, Y. Research on Defect Detection in Automated Fiber Placement Processes Based on a Multi-Scale Detector. Electronics 2022, 11, 3757. [Google Scholar] [CrossRef]
Meister, S.; Wermes, M. Performance Evaluation of CNN and R-CNN Based Line by Line Analysis Algorithms for Fibre Placement Defect Classification. Prod. Eng. Res. Devel. 2023, 17, 391–406. [Google Scholar] [CrossRef]
Cui, P.; Zhang, J.; Han, B.; Wu, Y. Performance Evaluation and Model Quantization of Object Detection Algorithm for Infrared Image. In Proceedings of the Seventh Asia Pacific Conference on Optics Manufacture and 2021 International Forum of Young Scientists on Advanced Optical Manufacturing (APCOM and YSAOM 2021), Shanghai, China, 28–31 October 2022; Volume 12166, pp. 554–559. [Google Scholar]
Giri, S.R.K.S.; Logesh, P.; Praba, R.D.; Kavitha, K.; Kalaiselvi, A. Traffic Surveillance System Using YOLO Algorithm and Machine Learning. In Proceedings of the 2023 2nd International Conference on Advancements in Electrical, Electronics, Communication, Computing and Automation (ICAECA), Coimbatore, India, 16–17 June 2023; pp. 1–6. [Google Scholar]
Rivero-Palacio, M.; Alfonso-Morales, W.; Caicedo-Bravo, E. Anemia Detection Using a Full Embedded Mobile Application with Yolo Algorithm. In Proceedings of the IEEE Colombian Conference on Applications of Computational Intelligence, Virtual, 27–28 May 2021; pp. 3–17. [Google Scholar]
Liu, J.; Zhu, X.; Zhou, X.; Qian, S.; Yu, J. Defect Detection for Metal Base of TO-Can Packaged Laser Diode Based on Improved YOLO Algorithm. Electronics 2022, 11, 1561. [Google Scholar] [CrossRef]
Sun, L.; Zhang, R.; Liu, Y.; Rao, T. Mask Wearing Detection Algorithm for Dense Crowds from a Monitoring Perspective. Comput. Eng. 2023, 9, 313–320. [Google Scholar] [CrossRef]
Medak, D.; Posilović, L.; Subašić, M.; Budimir, M.; Lončarić, S. Automated Defect Detection From Ultrasonic Images Using Deep Learning. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 2021, 68, 3126–3134. [Google Scholar] [CrossRef] [PubMed]
Ullah, I.; Khan, R.U.; Yang, F.; Wuttisittikulkij, L. Deep Learning Image-Based Defect Detection in High Voltage Electrical Equipment. Energies 2020, 13, 392. [Google Scholar] [CrossRef]
Ma, X.; Kittikunakorn, N.; Sorman, B.; Xi, H.; Chen, A.; Marsh, M.; Mongeau, A.; Piché, N.; Williams, R.O.; Skomski, D. Application of Deep Learning Convolutional Neural Networks for Internal Tablet Defect Detection: High Accuracy, Throughput, and Adaptability. J. Pharm. Sci. 2020, 109, 1547–1557. [Google Scholar] [CrossRef] [PubMed]
Davari, N.; Akbarizadeh, G.; Mashhour, E. Corona Detection and Power Equipment Classification Based on GoogleNet-AlexNet: An Accurate and Intelligent Defect Detection Model Based on Deep Learning for Power Distribution Lines. IEEE Trans. Power Deliv. 2022, 37, 2766–2774. [Google Scholar] [CrossRef]
Wu, Y.; Qin, Y.; Qian, Y.; Guo, F.; Wang, Z.; Jia, L. Hybrid Deep Learning Architecture for Rail Surface Segmentation and Surface Defect Detection. Comput. Aided Civ. Infrastruct. Eng. 2022, 37, 227–244. [Google Scholar] [CrossRef]
Jiang, P.; Ergu, D.; Liu, F.; Cai, Y.; Ma, B. A Review of Yolo Algorithm Developments. Procedia Comput. Sci. 2022, 199, 1066–1073. [Google Scholar] [CrossRef]
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
Redmon, J.; Farhadi, A. Yolov3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. Yolov4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Wong, K.-Y. Yolov7 2024. Available online: https://github.com/WongKinYiu/yolov7 (accessed on 18 April 2023).
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L. Pytorch: An Imperative Style, High-Performance Deep Learning Library. Adv. Neural Inf. Process. Syst. 2019, 32. Available online: https://arxiv.org/abs/1912.01703 (accessed on 18 April 2023).
Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv 2019, arXiv:1810.04805. [Google Scholar]
Reddy, M.D.M.; Basha, M.S.M.; Hari, M.M.C.; Penchalaiah, M.N. Dall-e: Creating Images from Text. UGC Care Group I J. 2021, 8, 71–75. [Google Scholar]
Zhu, L.; Wang, X.; Ke, Z.; Zhang, W.; Lau, R.W. BiFormer: Vision Transformer with Bi-Level Routing Attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 10323–10333. [Google Scholar]
Ren, S.; Zhou, D.; He, S.; Feng, J.; Wang, X. Shunted Self-Attention via Multi-Scale Token Aggregation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Orleans, LA, USA, 18–24 June 2022; pp. 10853–10862. [Google Scholar]
Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation NETWORK for Instance Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
Qiao, S.; Chen, L.-C.; Yuille, A. Detectors: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 10213–10224. [Google Scholar]
Kirillov, A.; Girshick, R.; He, K.; Dollár, P. Panoptic Feature Pyramid Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 6399–6408. [Google Scholar]
Yang, G.; Lei, J.; Zhu, Z.; Cheng, S.; Feng, Z.; Liang, R. AFPN: Asymptotic Feature Pyramid Network for Object Detection. In Proceedings of the 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC) Honolulu, Oahu, HI, USA, 1–4 October 2023. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
Liu, S.; Huang, D.; Wang, Y. Learning Spatial Fusion for Single-Shot Object Detection. arXiv 2019, arXiv:1911.09516. [Google Scholar]
Zheng, Z.; Wang, P.; Ren, D.; Liu, W.; Ye, R.; Hu, Q.; Zuo, W. Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation. IEEE Trans. Cybern. 2021, 52, 8574–8586. [Google Scholar] [CrossRef] [PubMed]
Tong, Z.; Chen, Y.; Xu, Z.; Yu, R. Wise-IoU: Bounding Box Regression Loss with Dynamic Focusing Mechanism. arXiv 2023, arXiv:2301.10051. [Google Scholar]
Kulkarni, A.; Chong, D.; Batarseh, F.A. 5—Foundations of Data Imbalance and Solutions for a Data Democracy. In Data Democracy; Batarseh, F.A., Yang, R., Eds.; Academic Press: Cambridge, MA, USA, 2020; pp. 83–106. ISBN 978-0-12-818366-3. [Google Scholar]
Park, I.; Kim, S. Performance Indicator Survey for Object Detection. In Proceedings of the 2020 20th International Conference on Control, Automation and Systems (ICCAS), Busan, Republic of Korea, 13–16 October 2020; pp. 284–288. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef]
Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. arXiv 2020, arXiv:1911.09070. [Google Scholar]
Jocher, G. YOLOv5 2020. Available online: https://github.com/ultralytics/yolov5 (accessed on 18 January 2022).

Figure 1. Automatic fiber placement machine (column type).

Figure 2. The structure of the AFPN [41] (the black arrows indicate convolution operations, while the aqua arrows indicate adaptive spatial fusion).

Figure 3. Adaptive spatial feature fusion [41].

Figure 4. The 8-tow automated fiber placement machine developed by Nanjing University of Aeronautics and Astronautics (NUAA).

Figure 5. Defect detection system’s detection process.

Figure 6. Surface defect data collection in AFP process.

Figure 7. Comparison of Location loss.

Figure 8. Comparison of Objectness loss and Classification loss trained based on five models mentioned above: (a) obj_loss; (b) cls_loss.

Figure 9. Comparison chart of mAP for five algorithms.

Figure 10. Comparison of detection results between original YOLOv7 (a1–f1) and Improved YOLOv7 (a2–f2) (overlap confidence in e2 is 0.91, overlap confidence in f1 is 0.89, and overlap confidence in f2 is 0.94).

Table 1. Commonly used object detection evaluation metrics.

Accuracy Evaluation Metrics	Speed Evaluation Metrics
Precision	Forward propagation time
Recall	Frames per second (FPS)
Average Precision (AP)	Floating point operations per second (FLOPs)
mean Average Precision (mAP)	Single-image detection time
Intersection over Union (IoU)	/

Table 2. Experimental hardware and software configuration.

Name	Configuration/Version
Operating system	Windows 11 × 64
CPU	Intel (R) Core (TM) i9-12900
CPU	NVIDIA RTX A2000
Memory	64 GB
Graphics memory	6 GB
IDE	PyCharm 2023.1.3 (Community Edition)
Deep learning framework	Pytorch 1.11.0
CUDA	CUDA 11.3
cudnn	cuDNN 8.3.3
Python version	Python 3.9

Table 3. Model training parameter settings.

Parameter	Setting
Initial Learning Rate	0.01
Epoch	300
Batch Size	6
Momentum Size	0.937
Weight Decay Coefficient	0.0005
Input Image Size	640 × 640
nc	5
Optimizer	SGD

Table 4. Comparison of AP for various surface defects in test set.

	Models	Gap	Overlap	Wrinkle	Twist	Foreign
	Faster R-CNN	68.3	63.4	71.3	77.4	81.7
	EfficientDet	71.5	70.3	75.2	79.5	83.4
	YOLOv5-tiny	75.3	76.5	77.4	80.1	85.7
	YOLOv5	77.4	79.7	79.5	82.4	87.6
	YOLOv7-tiny	75.3	77.1	81.2	83.5	86.9
	YOLOv7	79.6	83.6	84.4	88.6	91.8
Our models	YOLOv7 + BiFormer	91.5	92.3	92.5	94.4	96.6
	YOLOv7 + AFPN	88.2	90.6	91.2	92.9	92.7
	YOLOv7 + WIoU	89.3	90.4	92.9	93.7	92.1
	Improved YOLOv7	93.3	94.5	95.7	97.3	98.5

Table 5. Performance of each model on test set.

	Models	mAP	FPS
	Faster R-CNN	72.7%	18
	EfficientDet	75.6%	28
	YOLOv5-tiny	78.6%	45
	YOLOv5	80.8%	35
	YOLOv7-tiny	80.4%	53
	YOLOv7	85.3%	48
Our models	YOLOv7 + BiFormer	93.8%	55
	YOLOv7 + AFPN	90.5%	52
	YOLOv7 + WIoU	92.3%	55
	Improved YOLOv7	95.8%	62

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wen, L.; Li, S.; Dong, Z.; Shen, H.; Xu, E. Research on Automated Fiber Placement Surface Defect Detection Based on Improved YOLOv7. Appl. Sci. 2024, 14, 5657. https://doi.org/10.3390/app14135657

AMA Style

Wen L, Li S, Dong Z, Shen H, Xu E. Research on Automated Fiber Placement Surface Defect Detection Based on Improved YOLOv7. Applied Sciences. 2024; 14(13):5657. https://doi.org/10.3390/app14135657

Chicago/Turabian Style

Wen, Liwei, Shihao Li, Zhentao Dong, Haiqing Shen, and Entao Xu. 2024. "Research on Automated Fiber Placement Surface Defect Detection Based on Improved YOLOv7" Applied Sciences 14, no. 13: 5657. https://doi.org/10.3390/app14135657

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on Automated Fiber Placement Surface Defect Detection Based on Improved YOLOv7

Abstract

1. Introduction

2. Overview and Detection Principle of YOLOv7

3. Model Improvements Based on the YOLOv7 Algorithm

3.1. Enhancement of Backbone Network with BiFormer Attention Mechanism Based on Transformer

3.2. Enhancement of Neck Layer Feature Fusion with Introduction of AFPN Instead of PAFPN

3.3. Improvement in Bounding Box Regression Loss Function with Introduction of WIoU Instead of CIoU

4. Experimental Setup and Result Analysis

4.1. Overall Design of AFP Surface Defect Detection Experimental Platform

4.2. AFP Surface Defect Dataset

4.3. Evaluation Indicators

4.4. Ablation Study

4.4.1. Experimental Introduction

4.4.2. Experimental Environment and Parameter Configuration

4.4.3. Training Results and Analysis

4.4.4. Test Results and Analysis

4.4.5. Application Results and Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI