2.2.1. YOLOv8 Model
In the field of object detection, the YOLO algorithm has emerged as a critical methodology due to its real-time performance, single-stage detection approach, multi-scale feature fusion, unique anchor box design, and multi-task learning framework, achieving efficient and accurate target detection. The following sections detail the improvements to YOLOv8 across three core components: backbone and neck networks, detection head, and loss function. Its network architecture is illustrated in
Figure 3.
In the backbone network aspect, YOLOv8 adopts the C2f (cross-stage partial bottleneck with two convolutions) module. The C2f module can organically combine high-level features with contextual information, thereby improving detection accuracy. At the same time, this module adjusts channel numbers for models of different scales, thus significantly enhancing model performance. The neck network plays a crucial role in the overall architecture by effectively bridging the gap between feature representations output by the backbone network and predictions from the head network.
The detection head in YOLOv8 adopts a state-of-the-art decoupled architecture that separates feature learning for object localization and classification. This structural decoupling enhances the capability of the model to process multi-task detection tasks efficiently by optimizing distinct subnetworks for different prediction branches, thereby improving computational efficiency and cross-task adaptability.
The loss function calculation of YOLOv8 is divided into two branches, classification loss and regression loss. For classification loss computation, the binary cross-entropy (BCE) function is adopted; while the regression loss combines distribution focal loss (DFL) and CIoU. This combination makes prediction boxes fit ground truth boxes better, further improving the accuracy of target detection.
The adaptability of the YOLOv8 model to carbon fiber prepreg surface defect datasets manifests in the deep alignment between its network architecture and defect characteristics. The backbone network learns vertical periodic texture patterns through lightweight convolutional layers while effectively suppressing background interference using large-receptive-field pooling; the PAFPN [
25] (Progressive Asymmetric Feature Pyramid Network) feature pyramid achieves multi-scale feature fusion, enabling both capture of fine edge features in fiber bundles and perception of global deformation in wrinkles; the dynamic anchor mechanism optimizes detection strategies based on edge distribution characteristics of resin-rich and resin-poor defects; the decoupled head design simultaneously outputs defect locations and category information, meeting industrial quality inspection requirements. The texture-aware attention layer suppresses background noise, data augmentation strategies improve training effectiveness for edge defect samples, and the multimodal fusion architecture enhances recognition capability for defect regions. These designs enable YOLOv8 to accurately locate various defects in complex backgrounds, fully meeting the practical requirements of carbon fiber prepreg quality control.
2.2.2. GAM
Traditional attention mechanisms suffer from information reduction and dimension separation, leading to their utilization of limited receptive field visual representations. In this process, they lose global spatial-channel interactions. The GAM is a global attention mechanism spanning the spatial-channel dimension that improves deep neural network performance by reducing information dispersion and amplifying global interaction representations. Its structure is shown in
Figure 4, where the two parts represent the channel attention module and spatial attention module, respectively.
The Channel Attention Submodule structure (shown in
Figure 5) employs a 3D permutation to preserve information across three dimensions. It then amplifies cross-dimensional channel-space dependencies using a two-layer MLP (Multi-Layer Perceptron).
In the Spatial Attention Submodule structure (illustrated in
Figure 6), spatial information fusion is achieved through two convolutional layers to emphasize spatial correlations. Notably, the GAM adopts the same reduction ratio r as the Bottleneck Attention Module (BAM) [
26] but eliminates the pooling operations. While max-pooling of the BAM leads to information reduction and consequent negative impacts, the GAM removes pooling to better preserve feature map integrity.
This design optimizes computing efficiency while maintaining attention effectiveness. However, the spatial-channel attention framework may still impose significant parameter overhead in certain implementations.
YOLOv8, although it demonstrates superior overall performance compared to previous versions, shows limitations in handling targets like fiber splits due to insufficient global contextual information, making it difficult to accurately capture critical target features. To address this issue, the GAM is introduced.
By weighting global features, the GAM enhances the expressive capacity of target regions during the phase after feature extraction and before object detection, enabling the model to ignore background interference. By intensively focusing on critical target regions, the GAM improves detection accuracy for elongated objects in complex scenarios, particularly for fiber split defects with large aspect ratios. So, it can reduce false positives and missed detections while maintaining precision.
In carbon fiber prepreg surface defect detection, the GAM everages its cross-dimensional global interaction capabilities and forms deep alignment with defect characteristics through unique adaptive processes. For irregular agglomerations of fiber tangles and longitudinal patterns of fiber splits, the spatial attention module focuses on localized detailed features to precisely capture fibrous tangling patterns at the edges of tangles and the linear continuity of splits. The channel attention module amplifies color contrast features of resin-rich and resin-poor defects, effectively distinguishing abnormal yellow or black regions from background textures. For global deformation patterns of wrinkling defects, the GAM global interaction mechanism integrates structural changes across regions and overcomes the limitation of local features to achieve a holistic characterization of wrinkling morphologies.
In complex backgrounds, the GAM suppresses redundant responses from periodic textures, significantly enhancing detection sensitivity for fiber splits and similar targets. Concurrently, it strengthens correlations among multiscale defect features, providing enhanced defect recognition capabilities for industrial quality inspection.
By inserting the GAM between the backbone network (which extracts initial features from input images) and the detection head (responsible for final detection results), the mechanism injects global contextual information post-feature extraction and pre-detection. This allows the model to better capture critical features across the entire image, improving its robustness in handling complex scenes and long-range contextual dependencies.
2.2.3. Deformable Large Kernel Attention
Deformable Large Kernel Attention (DLKA) is a specially designed attention mechanism for visual tasks, aiming to enhance the flexibility and performance of the model in handling objects of varying sizes and shapes. Drawing inspiration from conventional attention mechanisms, it introduces a deformable kernel to strengthen adaptability to local features.
Figure 7 illustrates the schematic diagram of the DLKA module.
Unlike static convolutional filters, DLKA employs deformable kernels that dynamically adjust their spatial configuration during attention computation. This geometric adaptability enables precise feature extraction across objects with varying shapes and scales, particularly enhancing sensitivity to elongated or irregular defects. By replacing conventional small kernels with expanded receptive fields, DLKA efficiently captures long-range dependencies and global contextual patterns. This design addresses the limitations of localized feature extraction in traditional convolutions, making it robust for complex defect distributions. The mechanism integrates multi-scale feature maps through weighted attention maps derived from deformable kernel outputs. This strategy amplifies defect-relevant regions while suppressing background noise, achieving synergistic integration of local details and global semantics. Dynamic kernel deformation adapts to defect geometries without manual parameter tuning, improving generalization across diverse defect typologies. The expanded field of view induced by large kernels concurrently preserves spatial resolution and models defect-environment interactions, providing a critical mechanism for identifying subtle anomalies in complex visual patterns. Multi-scale fusion preserves discriminative features across resolution hierarchies, enabling consistent performance under varying defect sizes.
The applicability of DLKA in carbon fiber prepreg surface defect detection is reflected in its deep alignment with defect characteristics through dynamic deformable large kernel attention mechanisms: Deformable kernels adaptively adjust convolutional kernel shapes to precisely capture the irregular agglomerations of fiber tangles and the elongated directional patterns of fiber splits; the large kernel design expands the receptive field, effectively capturing global deformation features of wrinkling defects; weighted feature fusion mechanisms highlight regions with color contrast (e.g., resin-rich and resin-poor areas) while suppressing background periodic texture interference; multiscale feature fusion architecture balances the capture of fine details in small defects and the global understanding of larger scale defects; and dynamic attention further suppresses background texture responses to enhance recognition of subtle defects. These capabilities allow DLKA to handle the diverse morphologies and multiscale surface defects in carbon fiber prepregs under complex backgrounds with unparalleled flexibility and robustness compared to traditional models.
In order to reduce the false detections of the model, which means to minimize the possibility of false negatives and improve the accuracy of the model, the GAM and DLKA achieve this goal through the following mechanisms, respectively:
Global Information Capture: The GAM captures global information through techniques such as global pooling, enabling the model to comprehensively understand the content of the image. This helps to avoid missing the detection of targets due to the lack of local information and is conducive to reducing false negatives.
Dual Attention Mechanism: The GAM combines channel and spatial attention and emphasizes the interaction among channels, spatial height, and spatial width through 3D arrangement and MLP. This enables the model to more accurately locate and identify targets, enhances the ability to detect targets of different scales and positions, and reduces the probability of false negatives. For example, when detecting multiple objects of different sizes in a complex background, the dual attention mechanism can assist the model in better focusing on the features of each object.
Information Retention and Enhancement: The GAM with the pooling operation removed can better retain the information in the feature maps, enriching the feature representation. Meanwhile, through a weighting method, it ensures that important information is retained during the feature transmission process, allowing the model to make judgments using more complete information in subsequent processing and reducing false negatives caused by information loss.
Deformable Large Kernel: The deformable large kernel of DLKA can adaptively adjust the size and shape of the receptive field, which can better adapt to targets of different shapes and sizes. For some irregular or non-standard shaped targets, the deformable large kernel can more flexibly cover the target area and extract more comprehensive features, thereby reducing false negatives.
Guidance of the Attention Mechanism: As an attention mechanism, DLKA guides the model to focus on the more important regions in the image and suppresses the interference of irrelevant information. When detecting targets, it can help the model concentrate its attention on the regions where targets are likely to exist, improving the detection accuracy of targets and reducing false negatives. For instance, in an image containing multiple objects, DLKA enables the model to focus on the regions of the target objects instead of being distracted by the background or other irrelevant objects.
In the work of integrating the GAM and DLKA mechanisms into YOLOv8s, respectively, for the surface defect detection of carbon fiber unidirectional tape prepregs, each demonstrates unique innovation:
Innovation of the GAM
Differentiation between Background Texture and Defects: The surface of carbon fiber prepregs has a vertical periodic texture background, and the characteristics of various defects are significantly different. The GAM is precisely integrated between the backbone network and the detection head, which can sort out and integrate the global texture information. In this way, when dealing with fiber tangles defects, the GAM can grasp the overall image and use its global information capture ability to quickly locate fiber tangles in the complex periodic texture background. Since the fiber tangles differ from the background texture in terms of morphology and structure, the GAM can highlight their unique agglomerated form, avoid background interference, and improve the detection accuracy.
Optimization of Edge Defect Detection: Considering that the resin-rich defects in the dataset often appear in the edge areas of the samples, the GAM can focus on the global feature changes in the edge areas when processing images. Even if the difference between the resin-rich area and the background is small, the GAM can effectively detect resin-rich defects by integrating the global edge information.
Complementary Collaboration with the Original Model: The addition of the GAM forms a complementary relationship with the original feature extraction and detection mechanism of YOLOv8s. The original model focuses on local feature extraction and conventional target detection, while the GAM focuses on global information integration. When detecting surface defects of carbon fiber prepregs, the two work together. For example, when detecting wrinkle defects, the original model extracts local texture change features, and the GAM judges the overall structural changes of the fabric from a global perspective, jointly improving the detection performance of wrinkle defects.
Innovation of the DLKA Mechanism
Breakthrough in Concealed Defect Detection: Fiber splits defects appear as thin slits, and their directions are consistent with the background texture. The DLKA is good at capturing subtle local features and directional information. The deformable large kernel of the DLKA can adaptively adjust the size and shape of the receptive field according to the characteristics of fiber splits defects, flexibly covering the fiber splits area. It can accurately extract the subtle features of fiber splits, solving the problem of easy missed detection.
Capture of Variable Defect Features: Given that wrinkle defects vary in size and shape, the deformable large kernel of the DLKA can dynamically adjust the receptive field according to the actual shape and size of the wrinkles, comprehensively extracting the features of the wrinkles. Whether it is a small wrinkle or a complex large wrinkle, the DLKA can accurately capture its local features, such as the texture change at the fold and the undulating edge, improving the detection ability of wrinkle defects.
Enhanced Detection of Local Details: The addition of the DLKA introduces a powerful local detail detection ability to the model. In the detection process, the DLKA focuses on the local area of the image and conducts in—depth analysis of the subtle features of the defects. For example, when detecting fiber tangles, it can accurately extract the details of fiber entanglement, which helps to more accurately judge the severity of fiber tangles and enriches the detection dimensions of the model for defects.
Handling of Complex Defect Scenarios: For various complex defect scenarios that may occur on the surface of carbon fiber prepregs, such as the co-existence of multiple defects or the interweaving of defects and background interference, the DLKA can accurately identify defect features in a complex local environment with its flexible receptive field adjustment ability. For example, in the complex area where fiber splits may exist at the edge of a resin-rich defect, DLKA can adjust the receptive field according to different defect features respectively, achieving effective detection of multiple defects in complex scenarios and expanding the application scope of the model.