1. Introduction
Phase fraction calculation is a crucial step in materials science, serving as an indispensable tool for bridging the gap between a material’s composition, processing, microstructure, and properties [
1]. This holds true especially for steel, whose properties are intricately interlinked with its complex microstructure encompassing phases like ferrite, bainite, pearlite, and martensite. Each phase distinctly impacts the material’s mechanical, thermal, and electrical attributes [
2]. The accurate quantification of these phase fractions enables not only the prediction but also the manipulation of steel properties to meet specified performance requisites [
3,
4]. This becomes significantly important when steel is alloyed with various elements to augment its features. Therefore, phase fraction calculation is crucial in devising optimal alloy compositions, culminating in desired microstructures and properties [
5]. The calculation of the distribution and volume fractions of these phases is crucial for forecasting key mechanical attributes such as strength, hardness, and toughness [
6], making the precise annotation and segmentation of phase fraction calculation a practical imperative in steel material advancement.
Historically, phase annotation has been manually performed by experts who identify different phases in microstructural imagery, distinguishing them based on aspects like grain boundaries, color, contrast, and texture. While manual annotations provide accurate outcomes, they are labor-intensive and prone to human errors [
7]. The emergence of machine learning and deep learning has positioned convolutional neural networks (CNNs) [
8] as a promising substitute for this task, mainly focusing on relatively simpler microstructural images procured via optical microscopy (OM) or scanning electron microscopy (SEM) [
9,
10]. However, with an increase in the tensile strength of steels (the peak resistance displayed by a steel plate against pulling forces), its microstructure segmentation becomes increasingly complex, as shown in
Figure 1.
As illustrated in
Figure 1, an increase in tensile strength introduces greater ambiguity in both the phase structure and pattern, complicating the image annotation required for phase fraction calculation. Moreover, alloy steels exhibit further ambiguity in phase boundaries, complicating accurate phase identification. These challenges resemble those encountered in medical imaging where high-resolution high-precision segmentation models like UNet are typically used to find and segment the required elements. However, careful adaptation of such models is required to solve overfitting and to identify the intricate structures inherently present in steel microstructures.
The microstructures formed during the transformation-induced plasticity (TRIP-assisted) effect further complicate the segmentation task. The transformation entailing the deformation of retained austenite into martensite elevates work-hardening rates, enhancing the strength–ductility equilibrium [
11]. The accurate identification of these microstructures for phase fraction calculation is crucial for evaluating the quality of Advanced High-Strength Steels (AHSS). Conventional methodologies like SEM analysis and Electron Backscatter Diffraction (EBSD) are commonly utilized for steel structure analysis, yet they are time-consuming, labor-intensive, and financially draining for extensive analysis [
12,
13], highlighting the necessity for alternative more efficient microstructure characterization methods. In this paper, we introduce a segmentation model for phase fraction calculation using a Deep Neural Network (DNN) on images of AHSS—a futuristic automotive steel material with tensile strengths surpassing 1 gigapascal (GPa).
EBSD maps often show some inconsistencies that result in imprecise labeling and require manual correction. Manual supervision, while necessary, is time-consuming and prone to discrepancies due to variations in expert interpretations, fatigue, or subjective biases. To address these challenges, we have used our own superpixel [
14] labeling software tool alongside EBSD maps to generate high-resolution and consistent label images, as illustrated in
Figure 2. The superpixel labeling tool groups pixels into larger coherent superpixels based on similarity in color, texture, and other factors, thus providing a more efficient way to obtain accurately labeled images. Microscopic AHSS images present unique challenges in segmentation due to the intricate nature of steel microstructures and the similarities in appearance among different components under various magnifications [
15]. In contrast to traditional image segmentation tasks where object boundaries are well-defined, the microstructures in alloy steel exhibit subtle distinctions and a variety of complex patterns. As can be seen in
Figure 2c, the label image features three distinct labels corresponding to different phases of the steel microstructure: orange for ferrite, purple for bainite, and yellow for martensite. While ferrite and martensite phases exhibit definitive textures, bainite hardly presents any easily discernible pattern. The lack of definitive textural structures and edge delineation further complicates the task of phase identification.
Deep learning techniques, particularly CNNs, have shown promising results in various image segmentation tasks. The UNet architecture, a type of CNN, has been widely adopted for medical image segmentation due to its ability to capture both local and global contexts using an asymmetric encoder–decoder structure [
16].
In steel microstructure segmentation, previous studies have explored various methodologies for achieving accurate classification and segmentation results. Traditional methods, such as the data mining techniques proposed by [
17], have laid a solid foundation by extracting morphological features and employing SVMs for feature classification. While effective for simpler structures, these methods often struggle with the complex-phase steel microstructures due to the subtle differences between phases. Our approach extends these methods by leveraging a deep learning architecture that better captures the intricacies of complex-phase steel.
Pauly et al. [
18] explored segmentation on contrasted and etched steel datasets acquired via SEM and LOM imaging, achieving modest accuracies. Our method builds upon this by implementing a UNet-based architecture that not only handles high-contrast variations but also adapts to the intricate textures found in high-strength alloys. Durmaz et al. [
19] presented a multidisciplinary approach that combined specimen preparation with imaging, utilizing UNet to achieve an impressive 90% accuracy in lath–bainite segmentation. Inspired by their success, our work furthers the application of UNet in capturing the textural diversity of complex-phase steel microstructures, achieving similarly high accuracy rates. The microstructure of complex-phase steels typically consists of bainite, ferrite, and martensite or retained austenite [
20] and the task of segmenting these phases is comparatively difficult due to the ambiguity between the structures of the phases. Azimi et al. [
21] performed microstructure segmentation on SEM images containing bainite, ferrite, martensite, and pearlite using CNN and a max-voting scheme. Despite achieving high overall accuracy, their model struggled with certain phase segmentations, indicating a potential imbalance in the dataset. Our approach transcends this limitation by emphasizing the importance of not just texture but also the distinct shapes and structures within the microstructures. Via meticulous data augmentation and labeling, particularly using EBSD, our model demonstrates a nuanced understanding of these complex features.
Recent studies have begun leveraging UNet for the segmentation of dual-phase steel structures with clear class distinctions, simplifying the segmentation task [
11,
22]. In contrast, the complex-phase steels we investigate present a greater challenge due to the ambiguity in their microstructures. Our research addresses this by introducing advanced data augmentation techniques and precise EBSD-aided labeling that allows for more accurate phase distinction. The utilization of EBSD in annotation significantly refines the learning process, as it provides detailed pixel-wise structural orientation data, enhancing the segmentation accuracy beyond what conventional SEM or LOM data can achieve [
19]. The work presented in [
23] bears resemblance to our research objectives, laying down a foundational framework in the domain of steel microstructure segmentation. While their study also tries to tackle the segmentation of complex-phase steel, it relies on the MetalDAM [
23] dataset, which lacks EBSD label images. Instead, they employ label images that were produced by using binary mask as pre-annotations before being modified by industry experts. This approach introduces subjectivity and potential inaccuracies into the annotation process. Additionally, their model does not appear to offer scalability across different magnifications or types of steel. Despite these limitations, their work has been a valuable source of inspiration for our own research. We further contribute by comparing the performance of our approach with the results from both the MetalDAM and UCHS [
24] datasets.
The work of S. Bremuier et al. [
25] also bears quite a resemblance to our research objectives of segmentation of complex-phase steels. Their research performs segmentation using band contrast (BC) and kernel average misorientation (KAM) as input into a deep learning model and achieves an accuracy of 92%. Contrastingly, our work shows that a similar result can be achieved without needing BC or KAM. Our model directly uses the input image generated from SEM and uses the labeled image generated using EBSD and a super-labeler as ground truth for the learning task. Moreover, [
25] used low-carbon steels where our research delves into alloy steels. This distinction in steel type accounts for differences in the microstructure and properties. For instance, there seems to be a clear demarcation between structures or phases in low-carbon steels compared with alloy steels that might help in capturing the phases during the learning task. Distinct from earlier research, our proposed methodology focuses on phase fraction calculation using complex-phase steel microstructure segmentation by adapting pre-existing DNN models like UNet. We demonstrate that the deep learning model’s performance can be significantly improved by carefully selecting and tuning the parameters, rather than relying on pre-configurations. We also highlight the importance of data augmentation and precise labeling in segmentation tasks. To the best of our knowledge, we are the first to use only SEM-EBSD paired image maps as input for complex-phase segmentation and to conduct a scalability experiment, testing the performance of the model across different magnifications and steel-type images, thereby demonstrating its robustness and potential for practical application.
3. Results
This section describes the results obtained using the experimental setup, categorized into three main groups: inferencing using the same steel type and magnification as the trained model, inferencing using the same steel type but different magnification levels (scalability 1), and inferencing using different steel types and different magnification levels compared to the trained model (scalability 2).
3.1. Comparison of Stock UNet and Augmentations
We present a tabular analysis of the performance of various UNet variants on the task of steel image segmentation. We investigate three distinct configurations: the Stock UNet Model with Random Augmentation (RA), the Stock UNet Model with our Custom Augmentation (CA), and UNet with modified parameters with Custom Augmentation. Random augmentation includes operations like random flips, rotations, scaling, and brightness adjustments during training.
Table 1 demonstrates the superiority of the Modified UNet with Custom Augmentation (Mod UNet + CA) across all steel types and magnifications, signifying the advantages of tailored model configurations and augmentation techniques. Notably, the Mod UNet + CA exhibits exceptional performance, with MPA values surpassing 84% and Dice Scores above 0.83 across all test scenarios. In stark contrast, the Stock UNet with Random Augmentation (RA) delivers suboptimal results, particularly at higher magnifications (×5000 E-type) where the complexity of the microstructures challenges the model’s recognition capabilities. This underscores the inadequacy of generic augmentations in capturing the nuanced features of complex-phase steel images. The Stock UNet with Custom Augmentation (CA) shows a moderate enhancement in performance over the RA variant. However, it still falls short compared to the Mod UNet + CA, highlighting the necessity of model customization to effectively handle domain-specific challenges.
3.2. Inference on Same Steel and Same Magnification
Upon isolating a test image from the training set, the trained model was subjected to an evaluation to measure its performance. As exhibited in
Table 2, the model attained an overall pixel accuracy of 91.74%, coupled with a Dice Score of 0.9158. This high level of accuracy indicates an impressive overlap between the predicted segmentation and the actual microstructures.
A class-wise accuracy breakdown provides further insight into the model’s performance across different phases. The model demonstrates robustness in identifying ferrite with an accuracy of 96.1%, indicating its proficiency in segmenting this prevalent phase. Martensite and bainite, while posing more of a challenge due to their intricate structures, still achieved accuracies of 83.7% and 81.3%, respectively. These figures are especially significant considering the complexity of these microstructures and the difficulties inherent in their segmentation. The test SEM image, corresponding label image and the image predicted by our model is shown in
Figure 6.
The results, particularly the high Dice Score, affirm the model’s capability to discern between the intricate textures and boundaries of the steel microstructures accurately. This is crucial for practical applications where material properties are inferred from microstructural composition.
3.3. Inference on Same Steel with Different Magnification—Scalability 1
Images of the same steel type (E-type) but at different magnification levels were used for inference. As depicted in
Table 3 and illustrated in
Figure 7, the model achieved Pixel Accuracies of 84.11% and 89.29% for ×3000 and ×5000 magnifications, respectively. The enhanced accuracy at ×5000 magnification underscores the model’s adeptness in handling higher-resolution images, which often present more detailed microstructural information.
The Dice Scores of 0.8389 and 0.8892 further validate the model’s efficacy in segmenting the microstructures accurately at different magnifications. Notably, the Accuracy Per Class and Phase Fraction metrics provide a deeper perspective on the model’s performance across various classes and phases. The Error Margin in Phase Fraction reveals a marginal discrepancy between the predicted and actual phase compositions, suggesting areas where the model’s precision can be further improved.
3.4. Inference on Different Steel and Different Magnification—Scalability 2
Scalability experiment 2 involved inferencing on A-type, D3-type, and H2-type steel images at ×5000 magnification.
Table 4 and
Figure 8 detail these results. The model exhibited competent performance across these different steel types and magnifications, with accuracies of 77.78%, 68.32%, and 73.92% for A-type, D3-type, and H2-type, respectively. The Dice Scores further corroborate the model’s ability to generalize well to steel types not encountered during training.
In particular, the variation in Accuracy Per Class and Phase Fraction across different steel types is noteworthy. These metrics illuminate the model’s adaptability to different microstructural compositions and complexities. The Error Margin in Phase Fraction, though present, remains within acceptable limits, reaffirming the model’s robustness in diverse scenarios.
3.5. Comparison on Other Datasets
To further validate the effectiveness of our proposed method and benchmark its results, we compared its performance with other established models, namely PixelNet [
44], UNet, and UNet++, on two different datasets: UHCS and MetalDAM. The results of this comparison are presented in
Table 5.
Table 5 presents the MPA performance comparison of different segmentation models, including PixelNet, UNet, UNet++, and our proposed method, on two datasets: UHCS and MetalDAM. The results are evaluated based on the accuracy metric. Our proposed method outperforms the other models in both datasets, achieving an accuracy of 95.79% on UHCS and 94.21% on MetalDAM.
5. Conclusions
This study has presented a comprehensive approach to the segmentation of microstructures in steel images for the calculation of phase fraction which is critical for understanding material properties. We have demonstrated the potential of adapting the UNet architecture, a model originally designed for medical image segmentation, to the specific challenges of steel image segmentation. Our approach has shown that the performance of the model can be significantly improved by carefully selecting and tuning the model parameters, rather than using off-the-shelf configurations. We followed a systematic approach like grid search for finding the optimal values of hyper-parameters, where we tested a range of values for each parameter and observed their impact on model performance. We have also highlighted the importance of data augmentation in addressing the challenges posed by the intricate and complex nature of steel microstructures. By applying appropriate augmentations, we have been able to enhance the model’s ability to capture both the texture and structure of different phases in the microstructures.
Our work has further underscored the importance of precise labeling in segmentation tasks. We have used complex-phase EBSD images for segmentation, a method that, to our knowledge, has not been used before in this context of segmentation of alloy steels. Our work also underscores the importance of configuring the parameters of UNet models like activation function, loss function, input size, kernel size, and other techniques that can be used to adapt the model to the problem statement. Our use of a combined loss function tailored to the need of the problem addressed the issues of capturing complex microstructures in images.
Finally, we are the first to conduct such scalability experiments, testing the model at different magnifications and on different steel-type images. The results have shown that our approach is more robust as the model performs relatively well even on the unseen data, demonstrating that the model has learned the structures and textural information well.