*2.1. Preliminary Processing Step: Input Variable Selection*

A parsimonious set of input variables is required to develop a model [20]. For a common model structure, one can represent the expectation function of the response as *yi* = *fi*(**x***i*, *θ*), where *yi* is the expected response variable at the *i*th measurement, *i* = 1,..., *n*, **x***<sup>i</sup>* is the input vector at the *i*th measurement, and *θ* is the vector of unknown model parameters with *θ* = - *θ*<sup>1</sup> ... *θ<sup>q</sup> T* . It is assumed that the element in the *i*th row and *j*th column of the Jacobian matrix , **J**, is *∂η<sup>i</sup> ∂θ<sup>j</sup>* i.e., **J** <sup>=</sup> *∂η<sup>i</sup> ∂θ<sup>j</sup>* . Note that the *j*th column represents *θ<sup>j</sup>* and its column vector reflects the variation in the response space as *θ<sup>j</sup>* varies over a specific set of experimental conditions. If *j* and *k* are two orthogonal columns, their correlation coefficient (*r*) must be zero, meaning that the information used to estimate *θ<sup>j</sup>* is independent from the information used to estimate *θ<sup>k</sup>* and vice versa. The benefit of using orthogonal input variables is that not only does it result in consolidation of causative effects of inputs on the output but it also maximizes parameter accuracy and therefore estimation accuracy of the predicted output.

According to the literature [11–14,16,17,23,24], the stiffness characteristic of an asphalt mixture presented by a dynamic modulus can be estimated by its component properties. In this study, the input variables vector (**x**) defines the asphalt mixture's component properties. A summary of the selected input variables and their ranges in the dataset is presented in Table 2 with the *xi*'s and *y* being the input and output variables, respectively.


**Table 2.** Selected Input Variables (*x*) And Output Variable (*y*) For Model Development.

Cross-correlation analysis is performed on the 14 selected predictor variables and the obtained pairwise correlation matrix is given in Table 3 along with the schematic heat map of the correlation matrix given by Figure 1. Correlation coefficients with absolute values of 0.5 or higher are displayed in bold red text. The corresponding cells in the correlation heat map are shown in dark blue and dark red as shown in Figure 1. According to Table 3, the absolute values of the 130 correlation coefficients are greater than 0.1, with 50 of them greater than 0.5, indicating that several of the input variables give an impression of being highly correlated. The correlation heat map also clearly indicates that a high level of correlation (dark blue and dark red cells) exists within the input variables. If the correlated input variables are detected, to enable accurate mapping of the inputs to the response variable, it would be useful to produce a smaller set of orthogonal pseudo-variables using the PCA method and use them in model development [20].

**Figure 1.** Heat map depict a visualization for the pairwise correlation matrix of the input variables. Each column *i* and row *j* shows correlation coefficient *rij*. Cells with colors otherthan gray indicate some level of correlation. As the colors get darker the level of correlation goes higher.


**Table 3.** Pairwise Correlation Matrix for the Selected Input Variables.

#### *2.2. Orthogonal Transformation Using PCA*

In multivariate statistics, PCA is an orthogonal transformation of a set of (possibly) correlated variables into a set of linearly uncorrelated ones, and the uncorrelated (pseudo-) variables, called principal components (PCs), are linear combinations of the original input variables. This orthogonal transformation is performed such that the first principal component has the greatest possible variance (variation within the dataset). This procedure is then followed for the second component, then the third component, etc. This means that each succeeding component in turn has the highest variance when it is orthogonal to the preceding components [25–28]. To help visualize the PCA transformation, a schematic dataset with three input variables is presented in Figure 2 (left). As shown in this figure, in conducting PCA the data points are transferred from the original 3D original input space (on the left) to a 2D principal component space (on the right).

**Figure 2.** Schematic of the PCA transformation. Original data space presented on the left with 3 (input) variables transformed to a component space with lower dimension and *pc*<sup>1</sup> and *pc*<sup>2</sup> being the axes of the coordinate.

PCA can be performed either by eigenvalue decomposition of a data covariance (or correlation) matrix or by singular value decomposition. The process usually begins with mean centering the data matrix (and normalizing or using *Z*-scores) for each attribute as follows:

$$\mathbf{X} = \begin{bmatrix} x\_{11} & x\_{12} & \cdots & x\_{1p} \\ x\_{21} & x\_{22} & \cdots & x\_{2p} \\ \vdots & \vdots & \ddots & \vdots \\ x\_{n1} & x\_{n2} & \cdots & x\_{np} \end{bmatrix} \qquad \mathbf{Z} = \begin{bmatrix} \frac{x\_{11} - \overline{x\_{1}}}{s\_{1}} & \frac{x\_{12} - \overline{x\_{2}}}{s\_{1}} & \cdots & \frac{x\_{1p} - \overline{x\_{p}}}{s\_{p}} \\ \frac{x\_{21} - \overline{x\_{1}}}{s\_{1}} & \frac{x\_{22} - \overline{x\_{2}}}{s\_{2}} & \cdots & \frac{x\_{2p} - \overline{x\_{p}}}{s\_{p}} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{x\_{n1} - \overline{x\_{1}}}{s\_{1}} & \frac{x\_{n2} - \overline{x\_{2}}}{s\_{2}} & \cdots & \frac{x\_{np} - \overline{x\_{p}}}{s\_{p}} \end{bmatrix} \tag{1}$$

where for *k* = 1 to *n* and *j* = 1 to *p*, *xkj* is the *k*th measurement for the *j*th variable, *xk* is the sample mean for the *k*th variable, and *sk* is sample standard deviation for the *k*th variable. As discussed in the previous section, highly correlated input variables lead to inflation of the standard error of estimate, negatively affecting the accuracy of the estimation. PCA will help us not only reduce the dimensionality of the modeling problem, but will also produce orthogonal pseudo-variables to be used in solving the problem. To perform PCA in this study we used eigenvalue decomposition of the correlation matrix of the data. The eigenvalues of the data correlation matrix are calculated, ranked, and sorted in descending order (representing their quota of the total variation within the dataset), as presented in Table 4. According to the eigenvalues, the first five PCs represent 95.8% of the existing variation within the dataset.

Recalling the fact that the PCs are linear combinations of the original input variables, the PCs can be defined as in Equation (2):

$$pc\_i = \sum\_{j=1}^{14} \alpha\_{ij} \mathbf{x}\_j + \boldsymbol{\beta}\_i \tag{2}$$

where *i* = 1 to 14 , the *αij* is the corresponding coefficients, the *β<sup>i</sup>* are constants, and the *xj* are the original input variables. Equation (2) can be stated in matrix notation as in Equation (3):

$$\mathbf{p} = \mathbf{M}\mathbf{z} + \mathbf{n} \tag{3}$$

where

$$\mathbf{P} = \begin{bmatrix} pc\_1 \\ pc\_2 \\ pc\_3 \\ pc\_4 \\ pc\_5 \end{bmatrix}$$

$$\begin{bmatrix} 0.03 & 0.19 & -0.08 & -0.06 & -0.09 \\ 0.03 & 0.11 & -0.04 & -0.05 & 0.04 \\ 0.06 & 0.04 & 0.00 & 0.00 & 0.10 \\ 0.05 & 0.05 & 0.02 & 0.04 & 0.07 \\ 0.13 & 0.06 & 0.05 & 0.09 & -0.01 \\ 0.33 & 0.01 & 0.09 & 0.16 & -0.20 \\ 0.71 & -0.54 & 0.05 & 0.13 & -0.20 \\ -0.03 & 0.52 & -0.20 & -0.07 & -0.53 \\ -0.01 & -0.05 & -0.4 & 0.43 & 0.12 \\ 0.00 & 0.01 & 0.06 & -0.02 & 0.00 \\ -1.26 & 0.64 & 0.52 & 0.61 & 0.67 \\ -0.75 & 0.37 & 0.21 & 0.28 & 0.31 \\ -0.34 & 0.17 & 0.09 & 0.12 & -0.04 \\ -4.41 & 18.38 & 47.08 & 76.24 & -40.48 \end{bmatrix}$$

(4)

$$\mathbf{n} = \begin{bmatrix} -55.95 \\ -58.54 \\ -218.20 \\ -352.79 \\ 174.78 \end{bmatrix}$$

**Table 4.** Eigenvalues of the Normalized Input Variables Matrix Sorted in Descending Order (Based on Their Contribution to Total Variation).


Further modeling efforts will be performed using the first five PCs.

#### *2.3. Holdout Cross Validation*

In prediction problems, cross validation will be used to estimate model accuracy. Cross validation is a model validation technique that can be used to prevent overfitting as well as to assess how the results of a statistical analysis can be generalized to an independent dataset. In this study, a holdout cross validation technique is used in which the given dataset is randomly assigned to two subsets, *d*<sup>0</sup> and *d*1, the training set and the test set, respectively. Since the training set contains 80% of the data points and the test set contains 20% of the data points, 80% of the data points are used to train the model and the remainder are used to evaluate the trained model's performance.

#### *2.4. Principal Component Regression (PCR)*

Linear regression attempts to model the relationship between response variables and explanatory variables by fitting a linear equation to observed data. In regression analysis, the least-squares method is used to calculate the best fitting line for the observed data by minimizing the sum of the squares of the residuals (differences between the measured responses and the fitted values by a linear function of parameters).

All possible regression structures were considered for representing the relationship between the response variable, *log*|*E*∗|, and predictor variables (*pc*1, *pc*2, *pc*3, *pc*4, and *pc*5). To estimate the values of the unknown coefficients in the model, the least-squares criterion of minimizing the sum of squared residuals (SSE) is used. Finally, after eliminating redundant terms, the reduced third order cubic and interaction terms were developed and selected as the best-fitted model. The developed model is called "Principal Component Regression (PCR)".

#### *2.5. Principal Component Neural Network (PCNN)*

A predictive model called "Principal Component Neural Network (PCNN)" is developed as briefly described in this section. ANNs are brain-inspired systems intended to replicate the way humans learn. Neural network structures consist of several layers, including input layers, output layers, and hidden layer(s), with nodes (neurons) in each layer [29–31]. A three-layer feed-forward neural network

is developed for this study. It consists of an input layer of five neurons (five input variables), a hidden layer of 10 neurons, and an output layer of one neuron (one response variable). A trial-and-error procedure of optimizing the computational time and cost function is used to choose the number of hidden neurons. In this study supervised learning is used in which a training dataset, including inputs and outputs, is presented to the network. The network adjusted its weights in such a way that the adjusted set of weights produces an input/output mapping resulting in the smallest error. This iterative process is carried on until the sum of square residuals (SSE) increases. After the learning or training phase, the performance of the trained network must be measured against an independent (unseen) testing data [29,32]. Let the input of each processing node be *pci*, the adjustable connection weight be *wij*, and let the bias at output layer be *b*0, so that the network transfer (activation) function is *f*(.). The *j*th output of the first layer can be obtained using Equation (5)

$$\nu\_{\dot{j}} = f\_1(pc\_i, w\_{i\dot{j}}), \quad i = 1, \ldots, 5 \quad and \quad j = 1, \ldots, 10 \tag{5}$$

and the response will be

$$
\hat{y} = f\_2\left(f\_1(pc\_{i\prime}, w\_{i\prime})\right).
\tag{6}
$$

If we assume that

$$f\_2(\nu\_{j\prime}, w\_{Hj}) = b\_0 + \sum\_j \nu\_j w\_{Hj\prime} \tag{7}$$

and for each *j*,

$$f\_1(pc\_{i\prime}, w\_{i\dot{j}}) = b\_{H\dot{j}} + \sum\_{\dot{j}} pc\_i w\_{i\dot{j}\prime} \tag{8}$$

then a feed-forward neural network can be formulated as follows:

$$\mathcal{G} = f\_2 \left\{ b\_0 + \sum\_{j=1}^n \left[ w\_{Hj} \cdot f\_1 \left( b\_{Hj} + \sum\_{i=1}^m p c\_i w\_{ij} \right) \right] \right\},\tag{9}$$

where *pci* is pseudo input parameter *i*, *wij* is the weight of connection between input variable *i* (for *i* = 1 to 5) and neuron *j* of the hidden layer, *b*<sup>0</sup> is a bias at the output layer, *wHj* is the weight of connection between neuron *j* of the hidden layer and output layer neuron, *bHj* is a bias at neuron *j* of the hidden layer (for *j* = 1 to 10), and *f*1(*t*) and *f*2(*t*) are transfer functions of the hidden layer and output layer, respectively.

It should be pointed out that iteration proceeds until the convergence criterion is met. Thus, similar to the linear regression model, the validation set is not used. The Bayesian Regularization algorithm is used to achieve network training efficiency.

#### *2.6. Effective Variable Space*

It is widely known that the use of an empirical predictive model outside the convex hull containing the data points is prohibited. In this context, *effective variable space* is referred to the space where the uncertainty of the developed models is bound to their already calculated thresholds. In other words, outside of this region, the extrapolated behavior of the models may not be predictable. To guard against extrapolation, Ghasemi et al. [22] concluded that the space containing input data could be interpreted as a symmetrical convex space, then demonstrated how this space can be used in the design procedure.

Following the approach in [22], a normal distribution is assumed for each input variable (*xi*), resulting in their joint distribution being bi-variate normal, and such distributions are usually represented in form of a contour diagram. Since a contour curve on such a diagram contains the points on a surface with the same distance from the *xixj* plane, these points have a constant density function [33] (see Figure 3 for an example of such distribution). The cross section is obtained by slicing a bi-variate normal surface at a constant distance from the *xixj* plane. As indicated in Figure 3, the *n*-dimensional hyperspace is a hyper-ellipsoid with minimum volume (to avoid any gaps in the edges).

**Figure 3.** A schematic of a bivariate normal distribution (**left**) [33] and elliptic cross-section of this distribution (**right**). Projecting the cross-sectional area on the (*x*1, *x*2) plane results in an ellipsoid.

Khachiyan's work [34] formulates the problem of finding an approximate minimum volume enclosing ellipsoid (E) given *p* data points in *n*-dimensions as an optimization problem. In Ghasemi et al. [22], the authors detailed the derivation of a procedure for solving this problem and obtaining its effective variable space. For brevity, the flowchart in Figure 4 summarizes this iterative method for finding the minimum volume enclosing ellipsoid. This algorithm was used to find two enclosing ellipsoids in the primary space (14-dimensional) and the pseudo space (5-dimensional) of the dataset. It should be pointed out that this space is independent of the predictive models and is used only to solve the optimal (and inverse) design problems.

**Figure 4.** Iterative method to solve the problem of finding minimum volume enclosing ellipsoid [22].

#### *2.7. Guideline for Implementation*

A summary of the methodologies used to develop the framework is presented in Figure 5. The procedure begins with collecting experimental data from the laboratory, followed by the pre-processing step of input variable evaluation. The flowchart continues with the model development and the addition of a constraint on the n-dimensional input variable hyperspace to the modeling

problem. The developed models can then be used to predict pavement performance, solve design-based optimization problems, etc. There are a number of aspects of the proposed framework that can be achieved using free and commercially available software like *MATLAB*-<sup>R</sup> *, Python, and R packages*, and one may implement many parts of the framework in the language of their interest. For example, the algorithm to find the n-dimensional hyper-ellipsoid is very straightforward using the flowchart in Figure 4.

**Figure 5.** A summary of the methodologies used to develop the machine learning-based framework for predicting dynamic modulus (as an example of performance related property of asphalt mixture).
