1. Introduction
In the steel industry, challenges include producing alloys with tailored properties and minimizing waste. The mechanical properties for a given chemical composition are determined by the micro-/nanostructure of the steel, i.e., the presence of different steel phases, their size and their shape. This fact implies the need to analyze the structure and possibly correlate (the preparation process and) the structural data and properties. The correlation can be “studied” using machine learning (ML) models, sometimes also referred to by the fancier term “artificial intelligence” (AI) in the broader context; see, e.g., the recent review paper [
1]. The aforementioned challenges and (potential) use of ML puts restrictions on imaging techniques and it pushes for high-quality and detailed information about the micro-/nanostructure.
The traditional and well-established tool for acquiring microstructure data is the light optical microscope (LOM) [
2,
3,
4,
5].
The physical limitations, in particular the resolution, of this imaging method can be partially circumvented by using state-of-the-art machines or advanced “super-resolution microscopy” techniques; see, e.g., Ref. [
6] for a review of some of them. Unfortunately, these techniques—utilized very often in biology—may not always be applicable to a given objective. Quite often, LOM is used together with other methods (see, e.g., Refs. [
7,
8,
9,
10]) to acquire more complete information about the system studied. Sometimes, its capabilities complicate the task at hand [
11,
12]. Of course, one can improve the phase contrast by using special etching and carefully adapting other steps in the sample preparation before the actual imaging of the sample in question.
However, can something else be done to “push towards” higher-quality images when using LOM as the single input? We believe so, and we introduce a new approach application that is primarily aimed at the precise characterization of advanced multiphase steels. We employ a multimodal approach, well established in biology, by imaging the same field of view with different techniques. The core of the multimodal approach is to use several probes—light and electrons—and/or detection techniques to acquire more complete information about the system investigated and then train an ML model on the data to transform a LOM image. Here, we use this approach to successfully push the limits of LOM in the case of steel surfaces.
We are well aware that it is fundamentally impossible to create high-resolution images with more information than was stored in the original data without “hallucinating” the details. However, some portion of the higher-resolution information may only be hidden from human perception by the image blurriness, deformations and noise. We propose that the rest of the missing information can be completed from the knowledge of general properties of the investigated materials and their high-resolution images. This hypothesis is validated using the latter input in experiments of transforming LOM images into SEM images using deep learning techniques trained on an extensive image dataset of steels. The training data consist of LOM–SEM pairs of images of the same field of view. It is implicitly assumed that the ML model is able to generalize the necessary general properties. This is a natural consequence of the fact that ML finds statistical patterns that generalize the data outside the training dataset. Thus, the abovementioned software-based transformation of the LOM images represents more than a simple “style transfer”.
We tested different models and trained each of them to distill all of the important information from low-resolution images in order to combine it with general knowledge of the investigated materials—generalized about by the model during its training—and thus generate high-resolution images. We started with a so-called U-Net neural network architecture [
13] but eventually switched to a model based on generative adversarial networks (GANs) [
14]. This allowed us to achieve high precision and consistency in the generation process. Please see
Section 2.5 for more details. Extra care was taken to prevent the model(s) from creating any details not present in the original low-quality data. This made the resulting pseudo-SEM images suitable for further processing such as by segmentation or phase classification. To the best of our knowledge, there are no publicly available GAN-based models for converting LOM images to SEM images. Thus, we expect the present findings to be of high potential to both the experimentalists and the steel industry.
3. Results and Discussion
Quantitative evaluation of the level of image improvement is quite complicated. We used a standard evaluation metric called root mean squared error (RMSE), originally designed for the evaluation of regression models. Specifically, we measured the root mean of the square differences between the pixels of the SEM and predicted images.
Using the 8-bit depth optimizes the size of the batch in the GPU-RAM, and we are convinced that 16-bit depth is unnecessarily large.
In order to demonstrate the benefits of the GAN architecture, we compared a vanilla U-Net model with the GAN model for the CBS dataset. The vanilla U-Net model has exactly the same architecture as the standalone generator from the GAN model. We obtained RMSE = 0.2109 for the vanilla U-Net and RMSE = 0.2059 for the GAN model after 10,000 epochs of training. Both RMSE values correspond to the standardized data (see Equation (
2)).
Preliminary visual examination of the CBS predictions, with only a selection displayed in the below images, shows that the GAN model significantly outperforms the vanilla U-Net model on the CBS data. A similar procedure was repeated for the ETD data but only in the case of the GAN model.
Pearlite is composed of alternating layers of ferrite and cementite that form a lamellar structure. The lamellar structure is very fine and invisible (or hardly visible) in our light optical microscope; see
Figure 6. The lamellas in the LOM micrographs coincide with each other, and the result is only a dark, blurred area. Reliable identification of a pearlite phase in the LOM micrographs is impossible. The U-Net prediction slightly improves the visibility of the pearlite internal structure, but lamellas are still invisible. GAN predictions are markedly realistic, and they enable us to identify the pearlite phase. Let us note that the pearlite structure is visible in LOM images reported in Ref. [
11]. However the details of data acquisition are not explicitly mentioned, and we conclude that a coarser pearlite structure can be imaged using LOM.
Obviously, the LOM micrographs are hardly suitable for visualization of the complex microstructure of TRIP steel consisting of a ferrite–bainite matrix and secondary phases arising from the matrix (as a consequence of selective etching), such as martensite, retained austenite and martensite–austenite constituents. The secondary phases are very fine and hence partly blurred in the LOM. The U-Net and GAN predictions are able to depict the secondary phases and better define matrix properties. The GAN pictures present the structure more realistically than the U-Net, e.g., see the region marked by the second arrow from top in
Figure 7.
The insufficiency of the simple U-Net model is clearly seen in
Figure 8 and
Figure 9, where the U-Net manages to approximate the boundaries among the phases but visual contrast among the phases is significantly suppressed. On the other hand, the GAN model retains the visual distinction of the secondary phases and at the same time represents the phase boundaries in a better way, which can be observed in the finer features.
The displayed region of the USIBOR sample, see
Figure 10, consists mostly of the martensitic phase. We see that this pure martensite is the most difficult to describe for the models presented here.
In order to demonstrate a real-life application of LOM micrograph enhancement, the original LOM image was transformed into a CBS-like image using a GAN. The input RGB LOM image was automatically converted to an 8-bit grayscale format and then upscaled from its original size to a resolution approximately matching that of an SEM image, using bilinear interpolation. The upscaled image was then divided into 1024 × 1024 px tiles, which were suitable as input for model prediction. After the prediction of all tiles, they were stitched together using the
OpenCV library to match the field of view of the original LOM image but with the pixel resolution of an SEM image. The original RGB LOM image and the CBS-like prediction can be compared in
Figure 11, which illustrates the possible output of our model. We note that this LOM field of view has no corresponding SEM data measured.
The preparation of the dataset for training of the ML model revealed that, e.g., a high value of MS-SSIM does not necessarily guarantee well-aligned images. We decided not to base the discussion of the transformed LOM images solely on quantitative results of metrics measuring their “distance” from the target SEM (either CBS or ETD) images. We describe the transformation in terms of a steel microstructure analysis by the naked eye of an expert, one of the coauthors. Nevertheless, the metrics are still a useful tool in the postprocessing of the images, e.g., they clearly indicate that the GAN model performs better than the U-Net one.
Let us comment on the metrics and their use in more detail. As already described in
Section 2.6, the data from pictures were standardized to the interval
before training using the simple linear transformation in Equation (
2). Consider a metric proportional to a power of absolute value of difference of pixel values, i.e., of the following type:
where the power is understood element-wise and
C is a constant of proportionality which may be related to the pixel count
. Such a general case covers both the (R)MSE (
) and MAE (
) metrics.
Consider a linear transformation, specified by means of two scalar coefficients
a and
b, of the independent variables (such as the standardization mapping
and its inverse
in Equation (
2)). Then it follows that
The above considerations show that the same linear transformation applied to both images affects some of the metrics only by an overall multiplicative factor. This means that it does not alter the performance order of the different models when evaluated by metrics of the type in Equation (
3). The above statement is not valid in the case of several other metrics tested. Some of them are not implemented in the case of noninteger input data (namely, MS-SSIM), and some produce a different order of the models for as-is and standardized values.
Let us consider the images displayed in
Figure 6 through
Figure 10 as a small sample of the test dataset, with each image containing some 262,000 pixels. We use several of the reasonable metrics as implemented in Python libraries Sci-Kit Image [
29], version
, and
sewar [
30], version
. We calculate the values of metrics for all the SEM-based predictions and the as-is LOM image. The results are presented in
Appendix A in the case of images displayed in this paper, with
Figure 9 excluded since the TRIP2 steel is already represented.
Using these data only, we find that the (R)MSE and universal quality index (UQI) metrics prefer the U-Net model except on the TRIP2 steel (
512-10_T5-15-TRIP2). See
Table A1 and
Table A2 for the detailed values. The differences in RMSE values are less than 10%. Quite surprisingly, the MAE has the lowest values in the case of as-is LOM images, which we disregard, followed by the GAN-CBS model. We attribute the better performance of GAN-CBS models over U-Net-CBS to the fact that the MAE was used in the generator training; see Equation (
1).
Thus far, we have discussed the metrics of the type described in Equation (
3) except for the UQI. Let us consider other metrics that take into account other features than the differences in the individual pixels. Such metrics are as their titles suggest, e.g., SSIM and MS-SSIM; these two do prefer the GAN-CBS model except for the case of TRIP1 steel; the differences in the SSIM values do not exceed 6%.
Thus, the differences among the U-Net-CBS and GAN-CBS models as measured by the above-discussed metric do not seem to be very large. Two SSIM-based metrics that take into account “the larger picture” and not only the differences in individual pixels prefer the GAN-based model. In other words, these two metrics indicate that the predictions of GAN-based models are somewhat better.
The above-described conclusion based on the metric values was visually corroborated when examining the images in
Figure 6 through
Figure 10 in detail. This is illustrated in the following three figures, yet to be described. This means that GAN performs better in the style-transfer part of the processing.
A zoom of a pearlite-heavy region is displayed in
Figure 12. It shows that the boundaries between the pearlite and ferritic matrix are stricter in the case of both the U-Net and GAN models than in the case of the original LOM image. Both LOM and U-Net prediction misses information about pearlite’s inner structure, i.e., cementite laths are invisible. On the other hand, the GAN model indication of this inner structure is present, although the orientation is mostly incorrect. Nevertheless, this indication is enough to provide a hint that pearlite is observed. This shows that apart from more training data for pearlite, a separate model may be needed.
Figure 13 shows that the ETD is described by the GAN model accurately.
Furthermore,
Figure 14 clearly shows the superiority of the GAN model over U-Net; the former presents improvement in the contrast and visibility of the secondary phases. The secondary phases become easier to separate from the matrix, and their shape is more precise and closer to SEM micrographs.
The USIBOR material is clearly the hardest to describe, see
Figure 10. This is due to two factors. First, this mostly martensitic steel has the richest microstructure. Second, the dataset is not fully balanced, as indicated in
Table 3; the USIBOR represents the smallest part of the training dataset. We decided not to artificially (increase the) augment(ation of) this particular material to avoid performance degradation on the other, more frequently occurring materials.
Now, let us close this section by discussing the robustness of the presented ML model. We intentionally used the “simplest” etching that is widely available. Furthermore, the range of settings used in imaging using LOM and SEM was rather narrow, though the use of the autofocus tool in the case of the LOM images ensured a certain degree of variability in the imaging conditions. On the one hand, this means this model is not very robust against such changes. On the other hand, it implies that employing the typical standard sample preparation procedure—widely available at low cost—should provide the best results. We believe that this represents a fair trade-off between demands on laboratory costs and skills of the operators (including but not limited to knowledge of which etching to use on which material to achieve the best contrast among the phases) and the applicability of the model.
4. Conclusions and Outlook
We presented a software-based transformation of LOM images trained on pairs of LOM and corresponding high-resolution SEM images acquired after a standard sample preparation technique (polishing and chemical etching with Nital). The resulting output of the neural network exceeds a simple “style transfer” by making some features—previously obscured in the as-acquired LOM images—more pronounced in the predicted output, i.e., it can be regarded as a super-resolution (pixel upscaling of the original LOM). The quality of the style transfer was measured with three relevant metrics (MAE, NRMSE and SSIM), comparing the predictions to the corresponding testing SEM data, which implied that the vanilla U-Net performance is worse. Furthermore, the data were analyzed by the naked eye of experts, and the findings clearly indicate improvements such as deblurring and denoising of the phase boundaries.
Thus, we are confident that the reported GAN-based transformation can improve any subsequent processing of the resulting transformed images provided the sample preparation procedure and imaging settings are reasonably close to those described in this paper. This, of course, includes semantic segmentation. As a result, we expect improvements in techniques such as machine learning-based prediction of material properties utilizing datasets combining knowledge of both the microstructure (analysis of surface micrographs) and mechanical properties of the samples. Because we kept the steel processing to a common standard, most notably etching with Nital, we believe the presented model could be successfully applied to LOM data measured by a wide range of metallographic laboratories.
A possible continuation of this work can include exploring different sample preparation techniques, attempting to improve the transformation model itself or training the model on a larger dataset when more data are acquired. A natural extension of the here-presented work is to proceed with the semantic segmentation—our original motivation—and to compare the results from as-acquired LOM images to those from the predicted transformed images.