**Appendix A**

*Gaussian Process for Regression*

A Gaussian process (GP) defines a distribution *p*(*f*) over the value of the function *f*(*x*) at a finite set of points *x*1, ... , *xn*, wherein the collection of the random variables *f*(*<sup>x</sup>*1), ... , *f*(*<sup>x</sup>n*) has a joint Gaussian distribution *p*(*f*(*<sup>x</sup>*1),..., *f*(*<sup>x</sup>n*)) [20]. The GP is expressed as

$$f(\mathbf{x}) \sim GP(m(\mathbf{x}), k(\mathbf{x}, \mathbf{x}t)) \tag{A1}$$

where

$$m(\mathbf{x}) = \mathbb{E}[f(\mathbf{x})], \; k(\mathbf{x}, \mathbf{x}t) = \mathbb{E}[(f(\mathbf{x}) - m(\mathbf{x}))(f(\mathbf{x}t) - m(\mathbf{x}t))] \tag{A2}$$

are the mean and covariance functions, respectively. The covariance function (kernel) defines the inputs of the covariance matrix *K* of the joint Gaussian distribution:

$$K = \begin{bmatrix} k(\mathbf{x}\_1, \mathbf{x}\_1) & \dots & k(\mathbf{x}\_1, \mathbf{x}\_n) \\ & \vdots \\ k(\mathbf{x}\_n, \mathbf{x}\_1) & \dots & k(\mathbf{x}\_n, \mathbf{x}\_n) \end{bmatrix} \tag{A3}$$

Because the joint Gaussian distribution is fully specified by the mean function *m* and covariance matrix *K*, the function *f* can be randomly generated. The properties of function *f* are determined by the mean function *m*(*x*) and kernel *k*(*<sup>x</sup>*, *x* ). However, the flexibility of the GP simplifies the calculation by using a zero mean function *m*(*x*) = 0 [18,20]. Without the incorporation of any knowledge about observed inputs and outputs, the GP defines a prior distribution over the functions *f* . It can be transformed into a useful posterior distribution upon the application of the training data.

In the case of fatigue life prediction of structural materials analyzed herein, the function *f*(*x*) is not accessible; only the noisy version *y* can be observed, *y* = *f*(*x*) +  ∼ N 0, *σ*<sup>2</sup> *y* , where *σ*<sup>2</sup> *y* is the variance. Thus, the diagonal of the covariance matrix *K* is increased as:

$$\mathbf{K}\_{\mathcal{Y}} = \mathbf{K} + \sigma\_{\mathcal{Y}}^2 \mathbf{I}\_{\prime} \tag{A4}$$

where *I* is an identity matrix of size *n* (the number of training points). The joint density of the observed outputs *y* and predicted function outputs *f*∗

$$
\begin{bmatrix} y \\ f\_\* \end{bmatrix} \sim \mathcal{N} \left( 0, \begin{bmatrix} \mathbf{K}\_{\mathcal{Y}} & \mathbf{K}\_{\ast} \\ \mathbf{K}\_{\ast}^T & \mathbf{K}\_{\ast \ast} \end{bmatrix} \right) \tag{A5}
$$

involves the covariance matrix computed from the training *X* and test points *X*<sup>∗</sup>, where *X* is a *d* × *n* matrix of the training inputs {*<sup>x</sup>i*}*<sup>n</sup> i*=1, and *X*∗ is the *d* × *n*∗ matrix of the test inputs (*<sup>n</sup>*∗ is the number of test points). Particularly, Equation (A5) includes the covariances *K*∗ = *k*(*<sup>X</sup>*, *<sup>X</sup>*∗), *<sup>K</sup>y* = *k*(*<sup>X</sup>*, *<sup>X</sup>*)+ *σ*<sup>2</sup> *y I*, *K*∗∗ = *k*(*<sup>X</sup>*<sup>∗</sup>, *<sup>X</sup>*∗). The posterior predictive density was obtained [18] by conditioning the joint Gaussian distribution prior to the observation, as follows:

$$p(f\_\*|X\_\*, X, y) = \mathcal{N}(f\_\*|\mu\_\*, \Sigma\_\*) \tag{A6}$$

where the searched median regression curve is represented by a vector of mean values *μ*∗ obtained for the test points *X*∗ with the final expression as

$$
\mu\_\* = \mathbb{K}\_\*^T \mathbb{K}\_y^{-1} y \tag{A7}
$$

and the variance of the function values for the test points *X*∗ is included in the diagonal of the covariance matrix **Σ**<sup>∗</sup>, denoted as

$$
\boldsymbol{\Sigma}\_{\*} = \mathbf{K}\_{\*\*} - \mathbf{K}\_{\*}^{\top} \mathbf{K}\_{y}^{-1} \mathbf{K}\_{\*} \tag{A8}
$$

The predicted outputs for the test points are specified by the above two equations, where the training data ( *X*, *y*) serve as the parameters for regression. The covariance function *k* exclusively controls the predictive performance for a given set of training data. The kernel parameters (called hyperparameters) are obtained by maximizing the marginal likelihood, which is the maximum probability of *y* with noise for *p*(*y*|*<sup>X</sup>*, *<sup>θ</sup>*), where *θ* is a vector of hyperparameters. The log marginal likelihood [20] is given by

$$\log p(y|\mathbf{X}, \theta) = -\frac{1}{2} y^T \mathbf{K}\_y^{-1} y - \frac{1}{2} \log |\mathbf{K}\_y| - \frac{n}{2} \log 2\pi,\tag{A9}$$

where *Ky* is the determinant of the *<sup>K</sup>y* matrix. The maximum marginal likelihood is found using a gradient-based optimizer. A quasi-Newton optimizer with a trust-region method was used in the analysis. The common stationary kernels were analyzed, as described in the next section. The training process was the same for all the covariance functions without the implementation of the cross-validation resampling procedure. The diagram of data flow for the fatigue life prediction based on the proposed GP fatigue model is presented in Figure A1. The diagram shows the exemplary data flow for four selected input parameters (*x* = [*<sup>γ</sup>ns*, *εn*, *τns*, *<sup>σ</sup>n*] described in Section 4.3.1). The outputs are mean values of the log of fatigue lives for each test data point and its variance.

**Figure A1.** Diagram of data flow for the fatigue life prediction based on the proposed GP fatigue model.
