3.2.4. Neural Networks

Another supervised ML technique is neural networks (NN), which rely on the concept of the human brain to build interconnected multilayer perceptrons (MLP) capable of predicting arbitrary feature–target correlations. The basic building block of such MLP are neurons based on activation functions which allow the neuron to fire when different threshold values are reached [46]. When training a NN, the connections and the parameters of those activation functions are optimized to minimize training errors; this process is called backpropagation [31].

#### 3.2.5. Gaussian Process Regression

The Gaussian processes are supervised generic learning methods, which were developed to solve regression and classification problems [43]. While classical regression algorithms apply a polynomial with a given degree or special models like the ones mentioned above, GPR uses input data more subtly [47]. Here, the Gaussian process theoretically generates an infinite number of approximation curves to approximate the training data points as accurately as possible. These curves are assigned probabilities and Gaussian normal distributions, respectively. Finally, the curve which fits its probability distribution best to that of the training data is selected. In this way, the input data gain significantly more influence on the model, since in the GPR altogether fewer parameters are fixed in advance than in the classical regression algorithms [47]. However, the behavior of the different GPR models can be defined via kernels. This can be used, for example, to influence how the model should handle outliers and how finely the data should be approximated.

In Figure 1, two different GPR models have been used to approximate a sinusoid. The input data points are sinusoidal but contain some outliers. The model with the lightblue approximation curve has an additional kernel extension for noise suppression compared to the darkblue model. Therefore, the lightblue model is less sensitive to outliers and has a smoother approximation curve. This is also the main advantage when using GPR compared to other regression models like linear or polynomial regression. GPR are more robust to outliers or messy data and are also relatively stable on small datasets [47] like the one used for this contribution. That is why they were mainly selected for the later-described use case.

**Figure 1.** Gaussian process regression for the regression of a sinusoid.

#### 3.2.6. Python

The Python programming language was chosen for the present work, as it is the de-facto standard language for ML and Data Science. This programming environment is particularly suitable in the field of machine learning, as it allows the easy integration of external libraries. In order to use machine learning algorithms in practice, many libraries and environments have been developed in the meantime. One of them is the open-source Python library scikit-learn [48]. For the above-described methods, the following scikitlearn libraries were used: the scikit-learn module Polynomial Features for the modeling of the PR models, which was combined with the Linear Regression module to facilitate a PR model for prediction of coating parameters. For modeling via SVM, the SVR or support vector regressor module of scikit-learn was used. The NN were modeled via the MLP Regressor module and lastly the GPR were implemented using the Gaussian Process Regressor module of scikit-learn. All models were trained using the standard parameters, and only for the GPR model was the kernel function smoothed via adding some white noise; this was necessary because the GPR of scikit-learn has no real standard parameters.

#### **4. Use Case with Practical Example in a-C:H Coating Design**
