*Appendix A.5. MLP*

A neural network is a network of simple processing elements, also called neurons. The neurons are arranged in layers. In a fully-connected multi-layer network, a neuron in one layer is connected to every neuron in the layer before and after it. The number of neurons in the input layer is the number of input features *p* and the number of neurons in the output layer is the number of targets *q* [48]. MLPs have several theoretical advantages, compared to other ML algorithms. Due to the universal approximation theorem, an MLP can approximate any function if the activation functions of the network are appropriate [49–51]. The MLP makes no prior assumptions about the data distribution, and in many cases, can be trained to generalize to new data not yet seen [52]. However, finding the right architecture and finding the setting of training parameters is not straightforward and usually done by trial and error influenced by the literature and guidelines.

A neural network output *y*ˆ corresponding to an input *x* can be represented as a composition of functions, where the output of layer *L* − 1 acts as input to the following layer *L*. For example, for non-linear activation function *σL*, weight matrix *WL*, and bias vector *bL* of the respective layer *L*, we obtain the following:

$$\hat{\mathcal{Y}}(\mathbf{x}) = t\_L(\mathbf{x}) = \sigma\_L(\mathcal{W}\_L^T t\_{L-1}(\mathbf{x}) + b\_L) \tag{A17}$$

With the neural network estimate *y*ˆ(*x*) and the respective target *y* of an input *x*, we can denote a loss function L. A very common loss function for MLPs for regression tasks is the mean-squared error:

$$\mathcal{L}(\mathcal{W}, b) = \frac{1}{n} \sum\_{i=1}^{n} \left(\hat{y}(\mathbf{x}\_i) - y\_i\right)^2 \tag{A18}$$

where *W* and *b* are the collections of all weight matrices and bias terms, respectively. Optimal weight *W*∗ and bias *b*∗ terms for each layer are identified with minimizing the loss function L via back-propagation [53].

$$\mathcal{W}^\*/b^\* = \underset{\mathcal{W}, b}{\text{argmin }} \mathcal{L}(\mathcal{W}, b) \tag{A19}$$

#### *Appendix A.6. PINN*

In PINNs, the network is trained simultaneously on data and governing differential equations. PINNs are regularized such that their function approximation *y*ˆ(*x*) obeys known laws of physics that apply to the observed data. This type of network is well suited for solving and inverting equations that control physical systems and find application in fluid and solid mechanics as well as in dynamical systems [21,35].

PINNs share similarities with common ANNs, but the loss function has an additional part that describes the physics behind the use case setting. More specifically, the loss L is composed of the data-driven loss L*data* and the physics-informed loss L*physics*:

$$
\mathcal{L} = \mathcal{L}\_{\text{data}} + \mathcal{L}\_{\text{physics}} \tag{A20}
$$

While the data-driven loss is often a standard mean-squared error, the physicsinformed loss accounts for the degree to which the function approximation solves a given system of governing differential equations. For further details, we refer the reader to [23,35,54] in general and to the Python package of [21,22] in particular for simple implementation of structural mechanics use cases.

#### **Appendix B. Hyperparameters**

**Table A1.** Best performing hyperparameters GBDTR.


**Table A2.** Best performing hyperparameters KNNR.


**Table A3.** Best performing hyperparameters GPR.


**Table A4.** Best performing hyperparameters SVR.



**Table A5.** Best performing hyperparameters MLP.

**Table A6.** Best hyperparameters PINN.


#### **Appendix C. Detailed Results**

**Table A7.** Detailed results for the plate elongation use case Simulation 1.



**Table A8.** Detailed results for the plate elongation use case Simulation 4.



**Table A10.** Detailed results for the plate elongation use case Simulation 9.

**Table A11.** Detailed results for the bending beam use case Simulation 1.



**Table A12.** Detailed results for the bending beam use case Simulation 4.

**Table A13.** Detailed results for the bending beam use case Simulation 6.



**Table A14.** Detailed results for the bending beam use case Simulation 9.

**Table A15.** Detailed results for the block compression use case Simulation 1.



**Table A16.** Detailed results for the block compression use case Simulation 2.

**Table A17.** Detailed results for the block compression use case Simulation 7.



**Table A18.** Detailed results for the block compression use case Simulation 12

**Table A19.** Detailed results for the block compression use case Simulation 13.

