*3.3. Data Augmentation*

Due to different sample preparation levels, corrosion depths, shooting equipment, illumination, and other factors, the sorbite metallographic images collected were very different. It was not possible to exhaust all the possibilities and obtain images that were representative enough. Therefore, data enhancement was needed to expand the distribution of data sources and features, improve the size and quality of the training data sets, and solve the problem of limited data. At the same time, data enhancement can also cope with the problem of multi-distributed data scenarios. Data augmentation typically [20] includes flipping, cropping, rotation, zooming, color transformation, noise injection, etc. In this project, considering the characteristics of sorbite, simple flipping, rotation, cropping, zooming, and color transformation could not only enhance the data but may also destroy the label information of the sample to a certain extent, bringing negative effects to the model learning. For this paper, an enhancement method based on noise injection was designed. In the training, a certain amount of data perturbation was added to the initial sample image, and the original pixel labels were not changed incorrectly so that the model could focus more on learning the difference between the sorbite pixel values and background pixel values, rather than just learning the pixel values of the sorbite. In this way, the model was able to ignore irrelevant factors in the training process to a certain extent and make more accurate judgments. The specific perturbation expression is as follows, where **Xij** represents the pixel value of the ith row and jth column of the sorbite sample (where **R** = **rand**(−**a**, **a**)):

$$\mathbf{X\_{i\bar{j}}} = \begin{cases} \mathbf{X\_{i\bar{j}}} + \mathbf{R}, & \mathbf{0} < \mathbf{X\_{i\bar{j}}} + \mathbf{R} < \mathbf{X\_{thres}} \text{ and } \mathbf{X\_{i\bar{j}}} < \mathbf{X\_{thres}}\\\mathbf{X\_{i\bar{j}}} + \mathbf{R}, \mathbf{X\_{thres}} < \mathbf{X\_{i\bar{j}}} + \mathbf{R} < 255 \text{ and } \mathbf{X\_{i\bar{j}}} > \mathbf{X\_{thres}} \end{cases} \tag{6}$$

In the practical application process, let **a** = 10, and the samples before and after adding the noise injection enhancement mode are shown in Figure 6.

**Figure 6.** Comparison of the effects before and after sample enhancement. (**a**) Original image; (**b**) Image after noise injection.

#### *3.4. Training and Evaluation Indexes of the Model*

In this paper, preprocessed sorbite images were used as data sets, and each data set was randomly divided into training data sets and test data sets in a 9:1 ratio. Training then took place with each training set in turn, and testing was conducted on the test set of all data sets separately. All models were based on the PyTorch framework, the programming language was Python, and the graphics card was NVIDIA GeForce RTX 3090. The model learning rate used in this paper was 0.0001, the batch size was 4, and the number of iteration rounds was 100.

In this paper, the accuracy of semantic segmentation was evaluated by pixel accuracy (PA), intersection over union (IoU), and mean square error (MSE). For a binary task, the following four situations may occur. TP (true positive) indicates that a sample is predicted to be a positive class with a positive class true label. FN (false negative) indicates that a sample is predicted as a negative class with a positive class true label. FP (false positive) indicates that a sample is predicted as a positive class with a negative class true label. TN (true negative) indicates that a sample is predicted as a negative class with a negative class true label.

Assuming that there are a total of **n** classes in the test data set, **pii** indicating the number of classes predicted as class **i** in class **i** data, and **pij** indicating the number of classes predicted as class **j** in class **i** data, **PA** is defined as the ratio of the number of correctly classified pixels to the total number of pixels, and the formula is as follows:

$$\mathbf{PA} = \frac{\sum\_{\mathbf{i}=1}^{n} \mathbf{p}\_{\mathbf{ii}}}{\sum\_{\mathbf{i}=1}^{n} \sum\_{\mathbf{j}=1}^{n} \mathbf{p}\_{\mathbf{ij}}} \tag{7}$$

That is, pixel accuracy (**PA**) represents the percentage of the pixel value predicted correctly to the total pixel value, and the calculation formula is as follows:

$$\text{PA} = \frac{\text{TP} + \text{TN}}{\text{TP} + \text{FP} + \text{FN} + \text{TN}} \tag{8}$$

Mean pixel accuracy (mPA) is averaged by summing the pixel accuracy of each category.

IoU is the most commonly used semantic segmentation evaluation standard, which is the ratio of the intersection of real and predicted labels to their union. IoU can better evaluate the performance of semantic segmentation methods. The calculation formula of IoU is as follows:

$$\text{IoU} = \frac{\text{TP}}{\text{TP} + \text{FP} + \text{FN}} \tag{9}$$

The mean intersection over union ratio (MIoU) was averaged over the sum of the intersection over union for each category.

Mean absolute error (MAE) is a measure reflecting the degree of difference between the estimator and the estimated quantity, which was used to characterize the difference between the predicted proportional result and the true proportional result of the model. The MAE reflects the average distance that the predicted value deviates from the true value, that is,

$$\mathbf{MAE} = \sum\_{i=1}^{n} \frac{1}{n} |\mathbf{f}(\mathbf{x\_i}) - \mathbf{y\_i}| \tag{10}$$

#### **4. Results and Discussion**

Researchers have conducted some research and exploration in improving the accuracy of the determination of sorbite content. Wuhan Iron & Steel Group [21] used SEM and OM to observe the microstructure in the same region under different magnifications, and the analysis results indicated that the so-called "pearlite" structure was a kind of etching morphology of the sorbite structure with different orientations on the cross-section of the metallographic sample surface. Zhejiang University of Technology [22] found that the etching condition has a significant influence on the display and identification of sorbite structure, and some of the bright white areas under the optical microscope may be local flat areas of sorbite structure, while some of the "pearlite" may also be local depressions resulting in a "magnified lamellar structure", so it is unreliable to rely on light region and dark regions to determine the sorbite content. Liuzhou Steel Group [23] used digital image storage technology for the accurate detection of sorbite content. All these studies were carried out based on the traditional grayscale image, pearloid slice spacing, etc. The problem of fluctuations in the detection process and the need for manual intervention are still not well resolved. Therefore, the automatic detection of the sorbite content based on artificial intelligence is a focus of current research. Luo [1] established a library of high-carbon wire rod sorbite material for neural network learning, and successfully realized the intelligent identification of sorbite by using artificial intelligence and deep learning technology.

In this paper, DeepLabv3+ and U-Net++ semantic segmentation models were used, respectively. ResNet34 was used as the backbone network for training and verification, and the prediction of each pixel of the output sample image was judged to be correct or not. The evaluation results are shown in Table 2. It can be seen that the semantic segmentation model based on DeepLabv3+ improved the mPA by 0.7% and reduced the MAE by 10.7% compared with U-Net++ and obtained more accurate prediction results. This semantic segmentation model was also more refined compared to the classification model in Luo's article. The visualization effect is shown in Figure 7. The white part in the figure is the part of the correct prediction by the model, and the red part is the part of the incorrect prediction by the model. The segmentation results show that the model had a good segmentation effect, and an MIoU of 74.89% indicates that the predicted results of the model were more accurate than the manually annotated boundary.

Subsequently, the DeepLabv3+ semantic segmentation recognition model was used to test the unlabeled sorbite metallographic images. For each sorbite metallographic image tested, the sorbite image and the length of a single pixel point of that image were input, then the recognition model output the sorbite content of the image. The output results were compared with the manually calibrated test results, and some of the test results are shown in Table 3. The test recognition deviations were almost all less than 5%, and the recognition results were very good.

**Table 2.** Comparison of the test results of different frames.


**Figure 7.** Schematic diagram of the sorbite sample segmentation effect.



At the same time, Luo [1] did not consider the imbalance of positive and negative distributions and foreground and background imbalance in sorbite samples, while this paper used loss function and its combination to deal with the imbalance problem, and the prediction results of different loss functions were also tested and analyzed. The three sets of results were based on DeepLabv3+, and the Backbone used was ResNet-34. The results are shown in Table 4. The model of the focal loss function alone predicts a mPA of 94.18% and a MAE of 4.46%, which demonstrates a good segmentation effect. The results of using the dice loss function alone were poor. After the combination of the two, the prediction results were optimized, and the mPA and MAE were improved to 94.28% and 4.17%, respectively. This fully illustrates that the focal loss + Dice loss combined loss function selected in this paper is correct and reasonable considering the imbalance of positive and negative distributions and the foreground and background imbalance in sorbite samples analyzed in 3.2. In this paper, the detection of sorbite content in a single image only took 10 s, which was 99.9% faster than that of 10 min using the manual cut-off method. On the

premise of ensuring detection accuracy, the detection efficiency was significantly improved and the labor intensity was reduced.



Lou [1] standardized the sample preparation process without considering the diversity of data sources and feature distributions due to different sample preparation levels, corrosion depths, shooting equipment, illumination, and other factors. In this paper, a data augmentation was performed to expand the distribution of data sources and features, and the actual effects of the data augmentation were verified. The training and verification results of the data set before and after the perturbation added to the sample image were compared and analyzed. The data set was divided into three sub-datasets according to the different sample batches, and then the three sub-datasets were divided into training sets and test sets at a ratio of 9:1. The training set and test set of the first sub-dataset were written as Training set 1 and Test set 1, respectively, and so on for the others. Then training according to the respective training set followed by the testing of all test sets was conducted. In this project, the model prediction results of the original data and the model prediction results after adding perturbation were statistically analyzed, and the evaluation indexes were characterized by MAE, i.e., the absolute error between the sorbite content output by the model compared to the sample and the real sorbite content. The results are presented in Tables 5 and 6. It can be seen that the results of the training and validation of the raw data had better prediction results for the autologous test dataset. However, the difference between the prediction results of the three test sets was reduced after the perturbation was added, indicating that the generalization ability and robustness of the model were improved after the perturbation was added.


**Table 5.** Prediction results of the original model.

**Table 6.** Model prediction results after adding perturbation.


The original sample images collected in this project were all 2048 × 1536, with a high resolution and large size. In general, large-size images contain more data information and reflect more features of the predicted object. However, using large-size images requires higher computing power, and if the characteristic size of the measured object is much smaller than the image size, using large-size images as the training set cannot achieve better results. In this project, considering the characteristic size of sorbite, we used a sliding window to divide each original image into sixteen 512 × 384 images, and then trained and tested the model. This operation can not only reduce the requirements for computing power and improve the efficiency of model training and prediction, but also evaluate the uniformity of the distribution of sorbite content in the whole image through the calculation of each small image. Specifically, after the model predicted the sorbite content of the 16 local areas cut from each image, the variance was calculated, and the standard deviation was used to evaluate the sample uniformity. The formula can be expressed as:

$$\textbf{Uniformity} = \sqrt{\sum\_{i=1}^{4 \times 4} (\mathbf{R}\_i - \mu)^2 / 4 \times 4} \tag{11}$$

where **Ri** represents the ferrite content of the **i** th local region. Uniformity is the evaluation index of sample uniformity. The lower the value, the better the uniformity. Figure 8 shows that the experimental effect of uniformity evaluation can characterize the uniformity of the microstructure of the sample, so that the macroscopic performance of the original material can be inferred.

**Figure 8.** Effect of sample uniformity analysis.
