1. Introduction
Composites reinforced with continuous fibrous fabrics exhibit a complex multi-scale structure [
1], which plays a pivotal role indetermining the mechanical properties of the final composite components. However, the relationship between the macroscale behavior of composites and their internal architecture remains poorly understood yet, mainly due to the the intricate meso- and micro-structures of the reinforced fabrics. A prevalent method to investigate this link involves destructive dissection of the sample, followed by the two-dimensional imaging techniques to analyze its internal structure [
2]. This approach necessitates the slicing of the sample at predetermined intervals, yielding only partial information regarding the material’s internal features and resulting in a fragmented comprehension of the overall structure. To address this deficiency, three-dimensional imaging techniques have been employed recently, with Micro-Computed Tomography (Micro-CT) emerging as a notable example [
3,
4,
5]. Micro-CT is a non-destructive imaging technique that captures two-dimensional X-ray projections of a specimen from multiple angles. These projections are subsequently reconstructed into a three-dimensional volumetric representation using advanced algorithms, such as Filtered Back Projection (FBP), Algebraic Reconstruction Techniques (ART), and mordern deep learning-based approaches.
In fiber-reinforced composites, accurate segmentation of Micro-CT images is a critical step in transforming these volumetric data into digital twin models, which are indispensable for numerical simulations. The precision of segmentation directly influences the reliability of the predictions regarding the mechanical behavior and performance of composite materials. Thus, achieving high segmentation accuracy is essential for establishing a robust connection between imaging data and material property analysis.
Over the years, various segmentation algorithms have been applied to analyze 3D image sequences and distinguish fiber tows. These include edge detection segmentation [
6], threshold segmentation [
7], region growing segmentation [
8], watershed segmentation [
9], and active contour model segmentation [
10]. However, these traditional methods often require substantial manual intervention and are prone to segmentation errors, limiting their practical applicability in complex applications. This underscores the pressing need for more advanced and automated segmentation techniques to address these challenges and enable more precise analysis.
In recent years, deep learning techniques have been widely applied to the segmentation and reconstruction of 3D images [
11]. Convolutional neural networks (CNNs) [
12], in particular, have gained significant attention due to their ability to process structured grid-like data and autonomously extract hierarchical features. This capability makes them exceptionally well-suited for complex image analysis tasks, markedly improving the efficiency and accuracy of feature identification, including applications such as pathological tissues [
13]. Since 2015, these advanced image segmentation methods has been progressively adopted in the field of composites.
Figure 1 provides a chronological overview of deep learning models utilized in the analysis of composite materials, organized by their development timeline rather than the specific timing of their application.
Long et al. [
14] introduced the Fully Convolutional Network (FCN) in 2015, establishing a foundational framework for image segmentation methods. Subsequently, Jia et al. [
15] utilized threshold segmentation algorithms from OpenCV for pixel classification in Micro-CT images with regular microstructures. These regular microstructures refer to the symmetric arrangement of warp and weft yarns in 2.5D woven fabrics, where fiber tows are periodically interlaced with uniform spacing to form a repetitive, organized pattern. The classification outcomes were used as ground-truth labels to train a proposed multi-decoder FCN for segmenting XCT images of complex internal microstructures in 2.5D woven fabrics. Their results demonstrated strong alignment between predictions and experimental findings.
Noise introduces ambiguity at object boundaries and in uniform regions, making it harder to separate features from the background. This significantly reduces segmentation accuracy. To solve the problem of noise interference, Ronneberger et al. [
16] improved the FCN architecture in 2015 and proposed the U-Net convolutional network. U-Net reduces the impact of noise using its symmetric encoder-decoder structure. This design allows the model to capture both low-level and high-level features. In addition, the skip connections in U-Net directly transfer fine spatial details from the encoder to the decoder. This helps the model recover from noise and achieve more accurate segmentation results. Sinchuk et al. [
17] applied both traditional image segmentation algorithms and the U-Net deep learning algorithm to extract and reconstruct the warp and weft yarns in woven composites. The U-Net approach achieved the highest segmentation accuracy. Building on this, Sinchuk et al. [
18] proposed two instance segmentation methodologies for carbon fiber composites: an interpolation-based approach leveraging interlaminar matrix layers and a U-Net 3D method [
19] to generate the optimal inputs for watershed transformation. Compared to geometrical methods, deep learning approaches proved more robust with non-ideal inputs, such as fiber bundles with large contact areas or small orientation variations. The maximum segmentation error using deep learning was 0.81%, outperforming the 1.12% error of geometrical methods. Furthermore, the adaptability of deep learning models to diverse training datasets and input parameters enables superior performance optimization compared to traditional methods.
Figure 1.
The Evolution of Deep Learning-Based Segmentation Models in Composite Materials Since 2015. Semantic segmentation assigns a categorical label to each pixel, treating all objects within the same class as a single entity, such as fiber tows and the matrix. In contrast, instance segmentation not only categorizes pixels but also distinguishes individual instances within the same class, exemplified by the distinct identification of each fiber bundle.The images shown in the figure are sourced sequentially from references [
15,
17,
18,
20,
21].
Figure 1.
The Evolution of Deep Learning-Based Segmentation Models in Composite Materials Since 2015. Semantic segmentation assigns a categorical label to each pixel, treating all objects within the same class as a single entity, such as fiber tows and the matrix. In contrast, instance segmentation not only categorizes pixels but also distinguishes individual instances within the same class, exemplified by the distinct identification of each fiber bundle.The images shown in the figure are sourced sequentially from references [
15,
17,
18,
20,
21].
In 2018, Chen et al. [
22] introduced the DeepLab V3+ model, incorporating innovations like atrous convolutions, multi-scale atrous spatial pyramid pooling [
23], and global context information fusion. These advancements optimized contextual information utilization and segmentation accuracy, achieving superior performance in semantic segmentation tasks. Ali et al. [
21] futher modified the DeepLab V3+ network by using ResNet18 as the backbone, achieving a segmentation accuracy of 91% on 2D plain weave glass fabrics, significantly outperforming k-NN’s 71% accuracy.
While these models are built on Convolutional Neural Networks (CNNs), which are central to deep learning in image processing, they exhibit limitations in capturing global semantic interactions and long-range dependencies. CNNs are adept at extracting local spatial features but rely on multiple convolutional layers to expand the receptive field, which often proves insufficient for global context modeling. This limitation is particularly pronounced in fiber segmentation, where accurately distinguishing fiber tows in complex woven structures demands comprehension of global continuity and spatial relationships. Capturing such dependencies is essential for resolving intricate interlaced patterns where fiber tows span multiple regions.
Transformers, known for their self-attention mechanisms, excel at modeling global context and capturing long-range dependencies, effectively bridging the gap left by CNNs in handling both global and local features simultaneously. The Vision Transformer (ViT) [
24] was the first to introduce this architecture to the visual domain, leveraging global self-attention for impressive results. However, as image resolution increases, ViT’s computational cost escalates, and it struggles to capture fine-grained local details. Swin Transformer [
25] addresses this by using local window attention, reducing complexity and improving local feature capture but sacrificing some global context modeling capabilities.
TransUNet [
26] integrates the global attention mechanisms of Transformers with the precise localization capabilities of U-Net, effectively balancing global context modeling and local feature extraction. This hybrid architecture is particularly effective for complex tasks, such as medical image segmentation [
27], making it a promising approach for composite fabric image segmentation.
Table 1 summarizes the advantages and limitations of the aforementioned algorithms in the context of composite fabric image segmentation.
The objective of this study is to introduce a TransUNet-based framework for segmenting 3D image sequences of fiber-reinforced composites in CT scans, leveraging advancements in automated deep learning architectures. To enhance the model’s representational capacity, a multi-scale feature fusion module, integrated with a boundary enhancement module, was developed to expand the receptive field and capture more intricate features. Additionally, a novel Boundary-guided Learning Module (BLM) was designed to autonomously learn boundary features, further improving segmentation accuracy. Finally, the BatchFormerV2 module was incorporated between CNN and Transformer architectures to enable cross-learning among batch samples, thereby enhancing the model’s generalization capacity.
This paper is organized as follows:
Section 2 delivers a thorough exploration of the comprehensive data processing pipeline, including data acquisition, preprocessing, manual segmentation, and data augmentation.
Section 3 provides a in-depth description of the TransUNet-based automation framework, detailing the multi-scale feature fusion module, BLM, BatchFormerV2 module, loss function, performance metrics, and the approach for the automated segmentation of connected fiber bundles.
Section 4 delineates the methodologies for model training and evaluation, accompanied by a discussion of the results. Conclusively,
Section 5 provides a summary of the principal contributions of this research.
3. Methodology
Boundary segmentation errors are common in the image segmentation of fiber-reinforced composites. These errors occur particularly when fiber bundles are in close proximity. Deep learning methods outperform traditional algorithms in robustness and accuracy. However, achieving precise boundary segmentation and strong generalization remains challenging. To address these issues, this paper proposes the Multi-scale Boundary-guided Learning TransUNet (MBL-TransUNet) model.
The proposed model is built on the TransUNet framework. It retains the ResNet50 backbone network to utilize its strong feature extraction capabilities. The model incorporates multi-scale feature fusion modules, a BLM, and the BatchFormerV2 module. The overall network architecture is illustrated in
Figure 4. Arrows in the figure represent the direction of forward propagation. They illustrate the flow of feature maps. For clarity, the backpropagation directions are intentionally excluded.
Figure 3 illustrates the proposed framework, which consists of six key stages: image preprocessing, manual segmentation, boundary-guided segmentation, data augmentation, training, and evaluation. The ground-truth labels, obtained through manual segmentation, are used during training for pixel-wise classification, ensuring differentiation between fibers and the background. Boundary labels are derived from the ground-truth labels using the Sobel operator and provide additional supervision. They assist the model in capturing the shapes and edges of fiber bundles. These two types of labels work together, with backpropagation optimizing the model for both tasks. In the evaluation stage, the model generates segmentation predictions, which are then evaluated to assess its performance.
The proposed model was implemented using the PyTorch deep learning framework with Python 3.8. The computational environment consisted of a 12th Gen Intel(R) Core(TM) i5-12400F CPU, an NVIDIA GeForce RTX 4060 Ti GPU, and the Ubuntu 18.04.1 (64-bit) operating system.
3.1. Multi-Scale Feature Fusion for Boundary Enhancement Module (MFBEM)
Multi-scale Feature Fusion for Boundary Enhancement Module (MFBEM) is designed to improve feature extraction and enhance boundary detection accuracy. As shown in
Figure 5, this module comprises two main components: Multi-Scale Feature Fusion (MF) and Boundary Enhancement (BE). To effectively extract multi-scale features, convolutional kernels of varying sizes are applied to the input features. Specifically, Conv 1 × 1 captures localized details with minimal spatial complexity, making it ideal for refining pixel-level features. SCConv 3 × 3 leverages Spatial and Channel Reconstruction Convolution (SCConv) [
32], reducing spatial and channel redundancy while preserving rich feature representations. Conv 5 × 5 processes features at a larger receptive field, providing contextual information [
22].
The outputs from these convolutions are concatenated along the channel dimension to fuse features across multiple scales. This fused feature map is then processed through a Conv 3 × 3 layer to integrate and refine features, enhancing feature consistency. The refined feature map is subsequently passed into the boundary enhancement module, which includes SCConv 3 × 3 for enhancing feature representation, Conv 1 × 1 for compressing the channel dimension and highlighting key features, and morphological erosion [
33], a post-processing operation for noise suppression, boundary smoothing, and improved boundary detection.
Finally, the input feature map, the processed Conv 3 × 3 feature map, and the output from the boundary enhancement module are fused using an element-wise sum operation. The input feature map preserves global information, the convolutional feature map captures local details, and the boundary enhancement module improves boundary localization. These complementary feature representations are integrated to enhance segmentation performance, particularly in complex image regions with fine structural details and ambiguous boundaries.
3.2. Boundary-Guided Learning Module (BLM)
The encoder in TransUNet combines multiple convolutional layers with Transformers for feature extraction. However, it relies on frequent upsampling to restore the original image resolution. This process causes significant spatial information loss [
34].
This problem is especially pronounced in textiles with repetitive, periodic structures. These structures have two types of interfaces: the heterogeneous interface between fiber bundles and the matrix, and the homogeneous interface between adjacent fiber bundles. The latter poses a particular challenge due to the high similarity in material properties and grayscale intensity, resulting in ambiguous boundaries. The loss of spatial information exacerbates this issue, causing blurred edges and higher misclassification rates in these regions.
To address these challenges, a BLM was developed to predict boundary information. The predicted boundary maps are compared with the ground truth boundary labels to compute a boundary-specific loss. The loss ensures the model captures boundary-related features effectively through backpropagation. The structure of the BLM module is shown in
Figure 6.
The BLM extracts three feature maps, , , and , which correspond to the outputs of the first three layers of the ResNet50 backbone network. Each feature map is processed through 3 × 3 convolution operations. The first two feature maps are upsampled using linear interpolation with scaling factors of 4 and 2, respectively. This ensures consistent spatial resolution before concatenation with the third feature map. Then, a 1 × 1 convolution layer is applied to adjust the channel dimension of the feature maps at different resolutions. This operation helps unify the representation of features across different scales.
3.3. BatchFormerV2
To improve the model’s generalization for fiber-reinforced composites with varying fiber volume fractions, the BatchFormerV2 module [
35] was introduced. Unlike conventional attention mechanisms that operate along spatial or channel dimensions, BatchFormerV2 functions along the batch dimension. It uses an attention mechanism to learn relationships between samples within a mini-batch. Specifically, for each spatial position (e.g., pixels or patches in the feature map), BatchFormerV2 treats features at the same spatial positions across all samples as a sequence. It uses Transformer-based attention to extract common features from training data with a fiber volume fraction of 60%, avoiding overfitting to individual samples while capturing greater variability in the data distribution. This enhances the model’s ability to generalize to unseen distributions, such as data with a fiber volume fraction of 54%.
The implementation of BatchFormerV2 is as follows: For each spatial position , the features at the i-th position from all samples in the mini-batch are organized as a sequence of length B, where B is the batch size. Let represent the features at the i-th spatial position across the batch. Here, C is the number of channels in the feature map, and corresponds to the total number of patches determined by the H and W of the input. BatchFormerV2 processes these patch features along the batch dimension, producing refined outputs .
The mini-batch is split into two branches during training. Both branches share the same network parameters. One branch incorporates BatchFormerV2 to process the sequences . The other branch bypasses BatchFormerV2 to maintain computational efficiency. BatchFormerV2 is deactivated during testing. This prevents inconsistencies from batch dependencies.
3.4. Loss Function
The network was trained using a custom loss function with two components: one designed to capture the overall layout of the fabric and the other to improve boundary learning. This dual-purpose approach aims to allow the model to segment fabric regions accurately and improve precision in distinguishing boundary features.
For the entire fabric image, the loss function combines the Binary Cross-Entropy (BCE) loss [
36] and the Dice loss [
37]. The formulation is as follows:
The parameters and variables are defined as follows:
, set to
, is a smoothing parameter to avoid division by zero;
g denotes the ground truth annotation;
p denotes the predicted annotation; and
N is the sample count.
The BCE loss function is widely used in semantic segmentation tasks. However, the dataset in this study exhibits significant class imbalance, with foreground pixels representing fiber tows being substantially outnumbered by background pixels. This imbalance can cause the model to favor the background during training, reducing its effectiveness in segmenting fiber bundles. To address this issue, the Dice loss was incorporated alongside the BCE loss. The Dice loss enhances alignment between predicted and actual regions, especially for small foreground areas. To handle the gradient instability associated with the Dice loss, the weight for the BCE loss was set to 0.2, and the weight for the Dice loss was set to 0.8. These weights were determined through grid search to balance segmentation accuracy and optimization stability.
For effective boundary learning, it is crucial to ensure that the learned boundary features are distinct from both the background and fiber bundles. To address the severe class imbalance of boundary pixels, a weighted Binary Cross-Entropy (W-BCE) loss function was employed. The W-BCE loss is formulated as follows:
The parameters and variables are defined as follows:
is the weight assigned to pixel
i, which emphasizes the importance of certain pixels (e.g., boundary pixels). After conducting a grid search, the weight for boundary pixels is set to 0.8, while that for background pixels is set to 0.2. The complete formulation of this composite loss is presented as follows:
3.5. Performance Metrics
This study employs three metrics to evaluate the performance of the model and the enhanced algorithm: Dice coefficient, Mean Intersection over Union (MIoU), and Hausdorff Distance (HD). These metrics are used consistently across all experimental analyses. The Dice coefficient and MIoU specifically measure the overlap between predicted and ground truth pixels [
38]. Their mathematical definitions are as follows:
where True Positive (
) refers to the number of pixels correctly classified as fiber bundles, True Negative (
) denotes the number of pixels correctly classified as background, False Negative (
) represents the number of fiber bundle pixels misclassified as background, and False Positive (
) is the number of background pixels misclassified as fiber bundles.
The bidirectional Hausdorff Distance
quantifies the similarity between two sets. It measures the maximum distance from a point in one set to its nearest point in the other. This effectively captures their morphological differences:
The symbols
and
represent the unidirectional Hausdorff Distances from set
A to set
B and from set
B to set
A, respectively. The notation
denotes the norm used to measure the distance between the point sets
A and
B.
In segmentation tasks, the 95th percentile Hausdorff Distance (HD95) is commonly used instead of the maximum distance to reduce sensitivity to outliers. HD95 calculates the largest distance from a point in one set to its nearest point in the other. It excludes the most extreme 5% of distances. This metric offers a more robust evaluation of boundary differences, making it particularly suitable for segmentation tasks where noise or artifacts may introduce outliers.
3.6. Segmentation of Connected Yarns
Connected component analysis was performed on the predicted annotations using a four-connected region criterion. A pixel was considered connected to its neighbors if they shared one of the four cardinal directions (up, down, left, or right). This method ensured that only directly adjacent pixels were grouped into the same region. A two-pass algorithm was then employed to identify and label the connected fiber bundles.
In the initial pass, some connected regions might be mistakenly assigned multiple labels. To resolve these inconsistencies, a secondary scanning algorithm was applied to rescan the regions, ensuring that each region was correctly and consistently labeled. As shown in
Figure 7, if the number of detected labels was fewer than the actual fiber bundles, it indicated improper separation due to inaccurate pixel grouping. Detected images with connected fiber bundles were cropped, and the cropped regions were adjusted into squares by matching their width and height, simplifying downstream processing.
To enhance feature separation, a Euclidean distance transform was applied to the resized images. This was followed by the watershed algorithm [
39], which effectively isolated the connected yarns. Finally, the separated yarns were resized to their original dimensions and restored to their initial positions.
5. Conclusions
Fiber-reinforced composite materials possess complex multi-scale structures. Micro-CT technology, renowned for its high-resolution three-dimensional imaging capabilities, effectively captures the internal configurations of composite materials, providing crucial data for digital twin modeling. In this study, a preform consisting of eight plies of glass non-crimp fabric was scanned using Micro-CT. Due to the intricate weaving structure of the fiber fabric and indistinct interfaces, conventional image processing methods encounter difficulties in accurately segmenting the fiber bundle interfaces. While traditional image processing software performs excellently in segmenting clear boundaries, its accuracy decreases when confronted with complex woven regions. To address this issue, this study proposes the MBL-TransUNet model, which leverages deep learning techniques to enhance segmentation precision in woven fiber areas, providing a more accurate solution for material property prediction and digital twin modeling of composite materials. The model integrates three key modules—BatchFormerV2, BLM, and MFBEM—and outperforms traditional network architectures, such as FCN, UNet, UNet++, and DeepLabV3+, demonstrating significant advantages in segmentation accuracy for this task.
The ablation experiments emphasize the contribution of each individual module. BatchFormerV2 enhances synergy among modules and improves generalization capabilities. At , it increases the Dice coefficient from 90.12% to 91.27%, MIoU from 80.73% to 83.96%, and reduces HD95 from 4.0059 to 3.0882. The BLM module significantly improves boundary segmentation accuracy, decreasing HD95 from 4.0059 to 2.7530 at , and further reducing it to 1.0000 at . The MFBEM module enhances multi-scale feature fusion, increasing MIoU from 80.73% to 80.91% at , and from 93.44% to 94.43% at .
The combined effect of all three modules delivers the best segmentation performance. At , the Dice coefficient, MIoU, and HD95 achieved 91.86%, 84.96%, and 2.7530, respectively, reflecting improvements of 1.74% in Dice, 4.23% in MIoU, and a reduction of 1.25 in HD95 compared to the baseline model. At , the three metrics reached 97.87%, 95.82%, and 1.0000, with Dice and MIoU improving by 1.26% and 2.38%, and HD95 reducing to 1.0000.
It is noteworthy that the performance significantly degraded when only BLM and MFBEM were integrated without BatchFormerV2. At , Dice and MIoU decreased to 80.53% and 67.41%, respectively, while HD95 increased to 7.4729. Similar drops were observed at . BLM focuses on boundary features, while MFBEM emphasizes multi-scale feature fusion. The integration of these modules led to redundancy and inconsistent feature representations. BatchFormerV2 addressed these issues by enabling cross-batch learning, which enhanced global feature modeling and ensured better synergy between the modules.
The proposed model has demonstrated exceptional performance in segmenting eight plies of glass non-crimp fabric. Future research could further explore the model’s generalization capabilities, particularly its segmentation performance in zero-shot semantic segmentation tasks. Additionally, optimizing the network to effectively handle more complex fabric structures, such as varying weave patterns and fiber types, presents another promising direction for future research.