*2.4. Gaussian Process Modeling*

In this study, Gaussian process regression (GPR) was employed to establish quantitative correlations between various measured quantities of interest in the extracted dataset. GPR is a nonparametric machine learning method that employs joint probability distributions to the available training data (usually a small dataset) in order to make probabilistic predictions for new inputs. This is accomplished by treating the correlations as a Gaussian process (GP) defined by only its mean and variance. Let *t* = {*t*1, *t*2, . . . , *tN*} and *y* = *y*1, *y*2, . . . , *y<sup>N</sup>* denote the target and prediction, respectively, where *N* denotes the number of training points. Then, the relationship between the target *t* and prediction *y* can be written as

$$\mathbf{t} = \mathbf{y} + \mathbf{c} \tag{9}$$

where ε is a column vector containing the residuals of *N* observations. A GP governing the joint distribution between the predictions can be written as

$$y(\mathbf{x}) \sim \mathcal{N}(\mu(\mathbf{x}), \mathbf{K}(\mathbf{x}, \mathbf{x}')) \tag{10}$$

where *x* denotes a 1 × *D* input vector, and µ(*x*) and *K x*, *x* ′ represent the mean and the covariance of the GP, respectively. In Equation (10), N() denotes a multivariate Gaussian distribution.

The covariance of the GP is generally computed using a kernel function *k x*, *x* ′ . In the present study, the %Mn and the post-heat treatment temperature are treated as the two (i.e., *D* = 2) independent variables (i.e., inputs) for the model-building effort in this study. The outputs for the study will include a number of microstructure statistics as well as the measured mechanical properties. The automatic relevance determination squared exponential (ARDSE) [100–102] was selected as the kernel for computing the covariance matrices. This ARDSE kernel is mathematically expressed as

$$k(\mathbf{x}, \mathbf{x}') = \sigma\_f^2 \exp\left(-\frac{1}{2} \left(\frac{(\mathbf{x}\_T - \mathbf{x}'\_T)^2}{l\_T^2} + \frac{(\mathbf{x}\_c - \mathbf{x}'\_c)^2}{l\_c^2}\right)\right) + \sigma\_n^2 \delta\_{\mathbf{x}\mathbf{x}'} \tag{11}$$

where σ*<sup>f</sup>* , *lT*, *l<sup>c</sup>* and σ*<sup>n</sup>* are the hyperparameters that control the fidelity of the GP model, and the subscripts *T* and *c* refer to the two input variables (i.e., the post-heat treatment temperature and %Mn). The hyperparameters in the kernel provide more valuable information about the trends and relationships between the inputs and the outputs, especially when compared to conventional correlation techniques such as the Pearson correlation coefficient [103]. More specifically:


application of the analysis protocols (e.g., image segmentation). σ*<sup>n</sup>* is assumed to be the same for the entire input domain (also called homoscedasticity [104]).

The hyperparameters in Equation (11) are generally optimized to produce the most reliable predictions for test data points. For this, one must formulate a conditional distribution of test points, *t* ∗ , given the evidence of training points, *t*. Let the train and test datapoints be represented by matrices *X* and *X* <sup>∗</sup> of sizes *N* × *D* and *N*<sup>∗</sup> × *D*, respectively, where *N*<sup>∗</sup> reflects the number of test points. The overall covariance matrix can be partitioned as

$$\mathcal{C} = \begin{bmatrix} \mathbf{K}(\mathbf{X}, \mathbf{X}) & \mathbf{k}^\*(\mathbf{X}, \mathbf{X}^\*) \\ \mathbf{k}^{\*\dagger}(\mathbf{X}, \mathbf{X}^\*) & \mathbf{K}^\*(\mathbf{X}, \mathbf{X}) \end{bmatrix} \tag{12}$$

where † represents the transpose. Each term of the covariance matrix in Equation (12) is computed using the kernel function from Equation (11). The predictive distributions for the test points, given the training points, can be expressed as [100,101].

$$\begin{aligned} \mu^\* &= k^{\*\dagger} \mathbf{K}^{-1} \mathbf{t} \\ \Sigma^\* &= \mathbf{K}^\* - k^{\*\dagger} \mathbf{K}^{-1} \mathbf{k}^\* \end{aligned} \tag{13}$$

where µ <sup>∗</sup> and **Σ** <sup>∗</sup> denote the prediction means and variances (i.e., uncertainty), respectively, for the test points. The central challenge in the computations described in Equation (13) comes from the need to perform an inverse on the *N* × *N* covariance matrix of the training points, which requires *O N*3 computations. Once *K* −1 is obtained, predictions for the test points can be realized through standard matrix multiplication/addition operations, which require only *O N*2 computations [100,101]. Note, also, that in the applications explored in this work, the number of data points is quite small. Therefore, the one-time computational cost of the inversion operation in Equation (13) does not represent a major challenge for the present study.
