4.1. Classification Results
Table 8 summarizes the classification results of all the used features. With the confusion matrices showed earlier, this clearly shows that features calculated from the image texture can be used to successfully distinguish complex bainitic microstructures. Best results, an excellent classification accuracy of 91.80%, were obtained by combining Haralick parameters and a multi-scale LBP, followed by a feature selection. Picture IDs allow to trace back from classification result to textural parameters to the original microstructure images, which permits to asses which microstructure images where correctly or wrongly classified.
Figure 7 shows examples for correctly classified images for all six microstructure classes.
The main application option for this suggested classification approach based on reference samples would be the training of a model which could then be used as a pre-classification respectively labeling for other classification tasks. For a task of classifying complex steel microstructures, i.e., bainite, the assignment of the ground truth can be quite objective because bainitic phase constituents are often small or inhomogeneous, and there is often no consensus among human experts in labeling and classifying bainitic structures. This makes it difficult to extract ground truth parameters from these regions in an objective and statistically secured way. By incorporating these reference microstructures, the ground-truth assignment will be much more objective.
Considering the amount of analyzed pictures and the variety of structures even in one microstructure class which cause a broad distribution of values, it is difficult to draw definite conclusions about how the microstructures correlate with the calculated textural parameters and the classification result. Actually, this is one reason why machine learning methods are needed to be able to classify these microstructures. But still, some indications can be found when comparing textural parameters and microstructures.
Figure 8 shows four Haralick parameters, i.e., the mean values of contrast, correlation and energy as well as the amplitude of correlation. Black dots represent the single values of every single image, red dots assign the mean value of all images. Contrast is a measure of the local variations in an image [
21]. Correlation is a measure of how correlated a pixel is to its neighbor over the whole image, i.e., the joint probability occurrence of the specified pixel pairs [
29]. Energy measures the uniformity of the gray level distribution in an image [
30]. Few entries in the GLCM that have high probability, lead to a high energy value. Homogeneity measures the closeness of the distribution of elements in the GLCM to the GLCM diagonal [
29], i.e., only entries close to the diagonal have a high impact on the value of the homogeneity.
Looking at the mean value of contrast, values for pearlite and martensite are significantly higher than for the bainitic structures, so that even when considering the scattering of the data, the bainitic classes could almost be separated from pearlite and martensite by setting a threshold. Low contrast values mean fewer local variations in the image. For the bainitic structures, there is always the dark bainitic ferrite as a “background”. This dark background, which has only few local variations, represents a considerable part of the image, explaining the lower overall contrast values for the bainitic structures. In a similar way, the higher correlation values for bainitic structures can be explained. Because of the “background” more “dark pixel pairs” occur, leading to higher correlation values. Pearlite tends to have higher contrast and correlation values compared to martensite. This could be explained by the ferrite-cementite transitions that occur in the pearlitic microstructure because of the topography contrast, caused by using Everhart-Thornley Detector in the SEM. Regarding bainitic structures, not much tendencies can be seen because the scattering can overshadow the trend. However, for the amplitude value of correlation, granular and lower bainite tend to have lower values than upper and degenerated upper bainite. In general, amplitude values will be higher for structures with certain preferential directions and lower for statistically distributed structures. While upper and degenerated upper bainite have a pronounced lath structure of the carbon-rich second phase, the distribution of this second phase is more statistically distributed in granular and lower bainite, causing lower amplitude value for correlation. Lower bainite still has a preferential direction (60° arrangement of intra-lath precipitates), however it is less pronounced than in upper and degenerated upper bainite. Taking this into account, pearlite should show higher amplitude values than martensite. However, their values partly overlap. The reason for this is that not all pearlitic pictures have straight continuous cementite laths but are already a bit degenerated with less preferential orientation.
Local binary pattern are good at capturing small and fine details of images [
31], e.g., edges, corners, spots, etc. One disadvantage is that they can have problems to distinguish textures that have the same small structures but differ in their large structures. LBPs used in this work are rotation invariant uniform LBP. By using uniform LBP, the length of the histogram, i.e., the feature vector can be reduced and the performance of classifiers using these LBP features can be improved [
22,
32].
In a try to correlate LBP features, the microstructures and the classification results,
Figure 9 shows the LBP 1/8 for the six microstructure images from
Figure 6, which are representative for the six classes and show clear differences. Bainitic structures show higher values for bin 0, which represents bright spots [
22], than pearlite and martensite, correctly capturing the arrangement of bright carbon-rich second phase in a dark bainitic ferrite background of the bainitic structures, compared to the more uniform gray level distribution of pearlite and martensite. Bins 1–7, which correspond to different edges or corners of varying positive and negative curvature [
22], partly show differences which is plausible, as the images also exhibit different kind of edges. However, several bins show only marginal differences, especially for bainitic structures, already indicating that this LBP could not provide enough discrimination for the classification of bainitic structures.
Figure 10 shows the averaged histograms of the local binary pattern with R = 1 and N = 8 of all images for all six microstructure classes, which reached an accuracy of just 74.20%. For an easier visualization, histograms are averaged by calculating the mean value of the separate bins for every analyzed image. This allows for some indications how the LBPs, the microstructures and the classification results correlate. Pearlite and martensite show quite different histograms while the histograms of the four bainitic structures are similar. For a better illustration and a closer look at differences, some histogram bins (bins 1, 5, 6, and 9) are separately shown in
Figure 11a. Error bars for the standard deviation are added to indicate the scattering of all the images. Pearlite and martensite can be distinguished quite well, also when considering the scattering. Contrary, values for bainite are comparable most of the time, giving a hint that a differentiation of bainitic structures with these features will be difficult. Looking at the confusion matrix for this LBP in
Table 4, there are strong variations between the recalls and precisions of the individual classes. Indeed, F1 scores for pearlite and martensite are high and lower for bainitic structures which fits the differences that are indicated by the histograms. Also, for the R = 2.4 and N = 8 as well as the R = 4.2 and N = 16 LBP, the results for pearlite and martensite are good and clearly better than the results for the bainitic structures. When looking at the representative SEM images for all classes, pearlite and martensite are the “densest” structures compared to bainitic structures, which have more “background” (the dark bainitic ferrite). So basically, the representative area to capture to relevant features of a microstructure is smaller for pearlite and martensite and bigger for the bainitic structures. That’s why these three single-scale LBP achieve better results for pearlite and martensite.
The R = 4.2 and N = 16 LBP gave the best results for the single-scale LBP with 83.60% accuracy.
Figure 12 shows the averaged histograms for all six microstructure classes. Again, for an easier visualization, histograms are averaged by calculating the mean value of the separate bins for every analyzed image. Also, some histogram bins (bins 1, 7, 9, and 15) are separately shown in
Figure 11b for a closer look at differences. Although there is always overlap because of the scattering, a tendency of clearer differences in the bainitic classes compared to the LBP 1/8 can be recognized, explaining the better classification result with this LBP.
Looking at all four single-scale LBP, the overall accuracies are mediocre to good (maximum accuracy of 83.60% for LBP 4.2/16) and there are usually strong variations between the recalls and precisions of the individual classes. This clearly shows that by considering only a single scale for the LBP, not all relevant features of the six different microstructures can be captured. By combining the four scales from the single-scale LBP into one multi-scale LBP the overall accuracy improves to a very good 88.70%. Accuracies for separate classes, assessed by the F1 score, are high for all classes.
As LBP features capture small and fine details of an image and Haralick parameters recognize image features on a bit bigger scale, it seems reasonable to try to combine Haralick and LBP features. Furthermore, Haralick classification gives a higher accuracy for granular bainite than the LBP multi-scale classification, but is lower for other microstructure classes, so they could perhaps complement one another. Indeed, by combining those features, the overall classification accuracy gets improved to 90.60%. Because the combination of these parameters gives a big set of features (64), correlative features are removed for a better generalization. Only 31 features were kept, and by this, the classification result could again be slightly improved to 91.80%.
Figure 13 gives some examples of misclassified microstructure images. With Picture IDs, textural parameters and the original microstructure images can be traced back from the classification results. This allows to check which feature in the microstructure image could have caused a misclassification.
Figure 13a shows a martensitic microstructure that was classified as pearlite. Comparing it with a correctly classified martensitic image (
Figure 13b) the structure of the misclassified martensite is more ordered and less chaotic and thereby looking similar to a pearlitic microstructure, as shown in
Figure 13c. In
Figure 13d a lower bainite image that was misclassified as upper bainite is shown. The reason is probably that the lower bainite also exhibits some cementite precipitation on the lath boundaries in addition to the intra-lath cementite precipitation.
Figure 13e shows upper bainite that was wrongly classified as degenerated upper bainite. This could be explained by the fact that not all precipitation on the lath boundaries are straight and slender and that there are also some cementite aggregates giving it a bit of a degenerated structure. These examples show the difficulty in bainite classification as one bainite subclass can also exhibit some features that are more associated with other subclasses, making the ground truth assignment challenging even though reference samples were used in this work. In principle, such images could be left out for the classification which would improve the classification result. However, it is a goal to cover all ranges of bainitic microstructures and because multi-phase steels or industrial samples will also exhibit mixed structures it was decided to keep these images in the data set.