**4. Results**

#### *4.1. Preprocessing Data*

As a result of the experiments, the fatigue testing number of cycles to failure (*Nf*) was obtained. According to the experimental plan, specimens of 304 and 316L steels were burnished with 16 combinations. Low and high levels of the factors were coded with −1 and 1. Levels of two- and three-factor interactions were calculated as a multiplication of main factor levels.

The specimens of both steel batches were tested successively at each combination (replication r = 4), resulting in a total of 16 × 4 = 64 specimens (32 specimens of each kind of steel). The descriptive statistics of the fatigue data are shown in Table 4. Considering that unlike AISI 316L, specimens of AISI 304 were notched, the fatigue life results were as expected. The 316L specimens' fatigue life lay predominantly in the 10<sup>6</sup> range, while the fatigue life of 304 specimens lay predominantly in the 105 range.


**Table 4.** Results from the fatigue tests (descriptive statistics).

The fatigue life, even at constant stress amplitude, showed stochastic behavior. The materials' fatigue resistance due to the randomness of microdefect distribution, loading condition variations, and specimen preparation were the main sources of uncertainty. To model fatigue life, Normal and LogNormal distributions are commonly used.

Of particular interest in this study was the increase of the fatigue life due to burnishing operation with different combinations of regime parameters. To identify the characteristics of fatigue life gain, additional experiments with non-burnished specimens were carried out at the same loading conditions. The mean result for 304 steel was 2 × <sup>10</sup><sup>4</sup> cycles (tested four specimens). For 316L steel, the mean result was 14 × <sup>10</sup><sup>4</sup> cycles (tested three specimens). Based on these results, value *logCycles* representing gain of the fatigue life due to burnishing were formed. Cycles to failure of burnished specimens are divided into base cycles of non-burnished specimens and ones, converted to logarithmic scale, using the decibel rule to get more physical meaning, (Equation (2))

$$\log \text{Cycles} = 20 \cdot \log \left( \frac{\text{cycles to failure of burnished specimen}}{\text{cycles to failure of base specimen}} \right) \tag{2}$$

Descriptive statistics of data converted to log scale are shown in Table 4. These data are considered as primary data for the statistical analysis.

To ensure comparability of the data for different materials, the logCycles were scaled using a robust scaler from the Python library "SciKit Learn" [43]. This type of scaler uses the first and third percentile values and is more robust to outliers Equation (3). This kind of scaling can be applied to additional data (if replication of the experiment is made). The histograms of scaled data are shown in Figure 5.

$$y\_{scaled} = \frac{y\_i - q\_1(0.25)}{q\_3(0.75) - q\_1(0.25)}\tag{3}$$

**Figure 5.** Scaled data histogram.

Most of the data were in the unit range (−1; +1). The whole range of data was (−1.5; +1.5). As shown in Table 5, for the particular materials included in this research, mean and *q*1(*0.25*) and *q*3(*0.75*) quantiles were quite similar and scaling just centered the data and converted standard deviation to 1.


**Table 5.** Results from the fatigue tests, converted to log scale (descriptive statistics).

#### *4.2. Effects and T-Test*

The main effect of a given factor is the mean difference in the level of response as the input moves from the low to the high level [44]. Combined two- and three-factor effects (interactions) are calculated by multiplication of the main factor levels. For example, two-factor interaction is positive when the inputs move in the same directions and negative when the inputs move in opposite directions. To visualize these effects, linear regression plots are presented on Figures 6–8. Each kind of steel is treated separately for comparison.

In previous research [36], only the results of fatigue testing of AISI 304 steel were analyzed. It was found that the main factors A and D, and the interactions AC and AD, had the greatest impact. The new data from the 316L steel fatigue testing showed similar behavior, according to the main factors. In contrast, interaction AD was very strong, but the slope was different, comparing steel types (positive for 316L and negative for 304), as shown in Figure 7. From the three-factor interactions, ACD seems to be important (Figure 8).

**Figure 6.** Regression plots representing the main effects.

**Figure 7.** Regression plots representing two-factor interactions.

**Figure 8.** Regression plots representing three-factor interactions.

Since steels were burnished within the same experimental plan, and the fatigue testing procedure was the same, similar effects should be expected, so the new effects calculation was made considering the whole dataset. The main effects and interactions are given in descending order of their absolute values in Table 6. A Pareto chart (see Figure 9) shows that the first seven factors from the table should be considered significant (D, A, ACD, AD, AC, BCD, BD), because they covered about 80% of the total effect.


**Table 6.** Main effects, two- and three-factor interactions and *p* values.

**Figure 9.** Pareto plot factors (regime parameters) and interactions.

The obtained results were confirmed by *t*-test. This test checks the null hypothesis if the mean values of the two groups of samples are identical. The *p*-values, calculated from the *t*-test, are given in the last column of Table 2. All of them were greater than 5% (*p* > 0.05), which means that the null hypothesis cannot be rejected with 95% level of confidence for all the factors and interactions. If value *p* = 0.1 (90% confidence) is taken as a significance level, the null hypothesis can be rejected for D and A factors and these factors are considered significant for the regression model. The values of these factors confirm the general findings for the burnishing processes: that higher force (A) and low federate (D) guarantee a higher degree of plastic deformation, thus benefiting the fatigue life [28], but give no information about the influence of the regular relief degree of imbrication, connected with factors B and C.

More detailed statistical inference from such noisy data can be obtained, using the Bayesian approach. Bayesian models are called probability models, since as a result distribution rather than the point estimates for the unknown parameters are obtained. According to the Bayesian rule Equation (4), the posterior probability of parameters of interest *θ*, based on the observed data *y*, can be estimated, using our prior knowledge about these parameters (*θ*). The term *P*(*y*|*θ*), called likelihood, is a probabilistic model for the data. The denominator term *P*(*y*) is the marginal probability of the data, called evidence. Since it is just a normalizing constant, it can be omitted and stated that posterior probability is proportional to likelihood times, prior probability.

$$P(\theta|y) = \frac{P(y|\theta) \cdot P(\theta)}{P(y)},\tag{4}$$

where *P*(*θ*|*y*)—posterior; *P*(*θ*)—prior; *P*(*y*|*θ*)—likelihood; *<sup>P</sup>*(*y*) = *<sup>P</sup>*(*y*|*θ*)·*P*(*θ*)*dθ*—evidence.

*P*(*θ*|*y*) ∝ *P*(*y*|*θ*)·*P*(*θ*)

For the continuous random variables, *P*(*θ*|*y*) is a probability density function (PDF) of a certain distribution. So, Bayesian modeling requires first setting the appropriate likelihood distribution, which describes how the data can be generated, and second, choosing the prior probability distributions for all the unknown *θ* parameters. Prior distributions can be constructed as non-informative, i.e., diffuse or even improper, which PDF does not integrate to 1. If too informative (strong) prior distributions are used, there is a risk in ignoring the experimental data. Of course, in the presence of enough data, the prior choice is ignored, and likelihood dominates the posterior distribution. The Bayesian probabilistic model can be updated if new data are available just by setting the posterior distributions to prior for the new data.

For some of the simple cases, a closed form solution for the posterior density is given in the literature [37]. Nevertheless, for all probabilistic models, Bayesian inferences can be made just by simulation. To ensure random, non-correlated samples covering the whole distribution, Markov chain Monte Carlo (MCMC) sampling algorithms are used. Nowadays, numerous statistical packages and libraries for "R" and Python are available. Some of the most popular are WinBUGs, JAGS, Stan, and PyMC.

#### *4.3. Regression Model*

4.3.1. Ordinary Least Square Regression (OLS).

Ordinary least square linear regression Equation (5) assumes Gaussian distribution of the noise.

$$y = X\beta + \epsilon,\tag{5}$$

where *<sup>y</sup>* <sup>∈</sup> <sup>R</sup>*n*×1—column vector of parameter of interest (regressors); *<sup>X</sup>* <sup>∈</sup> <sup>R</sup>*n*×*k*+<sup>1</sup> <sup>=</sup> [{1}, {*x*1}, {*x*2} ... {*xk*}], predictors (or design matrix); *<sup>β</sup>* <sup>∈</sup> <sup>R</sup>*k*+1×1—column vector of regression coefficients; ∼ *N* 0, *σ*<sup>2</sup> *I* —Gaussian noise.

As the effects and interactions are calculated, a linear regression model can be formed as Equation (6), where the first term is a mean of the dependent variable *y*, and to calculate the other coefficients *i* th effect or interaction value should be divided by 2, since it shows the amount of change of the regressor as predictor *xi* moves 2 steps, from −1 to +1.

$$\mathcal{Y} = \mathbb{X}\mathcal{Y} = \mathbb{y} + \frac{1}{2} \sum\_{i=1}^{k} \sum\_{i} effect\_i \times \mathbf{x}\_i \tag{6}$$

The probabilistic model of linear regression can be expressed as Equation (7). Data *yi* comes from Normal distribution with a mean equal to OLS estimation *y*ˆ and variation *σ*<sup>2</sup> = .

$$y\_i \sim \mathcal{N}\left(X\beta, \sigma^2\right),\tag{7}$$

where *i* = 1,2, . . . ,*n* is the number of data points

From a Bayesian perspective, each regression coefficient can be treated as a random variable, coming from a Normal distribution with unknown mean and variance Equation (8).

$$
\beta\_i \sim \text{ N}\left(\mu\_{i\prime} \text{ s}\_i^2\right), \tag{8}
$$

Under a non-informative Jeffrey's prior, this problem has a closed form solution [37]. The resulting parameters of posterior are dominated by experimental data. The marginalized distribution for mean *μ*, with the integrated *s*<sup>2</sup> is a non-centered Student-t distribution Equation (9).

$$
\mu\_i \mid y \sim t\_{n-k-1} \left( m\_{i\prime} \; S\_i^2 \right) \tag{9}
$$

The location parameter mi is equal to the least squares estimation of regression coefficients, the scale parameter *S*<sup>2</sup> *<sup>i</sup>* is equal to the standard error, and the degrees of freedom are equal to the degrees of freedom of the regression model. From this distribution, credible intervals CI and/or high-density intervals (HDI) with a certain probability can be formed.

Results from the OLS regression, including the first seven factors from the Pareto plot, are given in Table 7. To perform this regression, the Python library Statsmodels was used [45].


**Table 7.** Ordinary least square (OLS) regression results.

Looking at the 95% HDI, again the conclusion is that there was no significant factor with a significance level α = 5%.
