3.1.3. Mechanical Characterization

According to [36,37], the indentation modulus *E*IT and the indentation hardness *H*IT were determined by nanoindentation with Vickers tips (Picodentor HM500 and WinHCU, Helmut Fischer, Sindelfingen, Germany). For minimizing substrate influences, care was taken to ensure that the maximum indentation depth was considerably less than 10% of the coating thicknesses [38,39]. Considering the surface roughness, lower forces also proved suitable to obtain reproducible results. Appropriate distances of more than 40 μm were maintained between individual indentations. For statistical reasons, 10 indentations per specimen were performed and evaluated. A value for Poisson's ratio typical for amorphous carbon coatings was assumed to determine the elastic–plastic parameters [40,41]. The corresponding settings and parameters are shown in Table 2. In Section 3, the results of nanoindentation are presented and discussed.

**Table 2.** Settings for determining the indentation modulus *E*IT and the indentation hardness *H*IT.


*3.2. Machine Learning and Used Models*

3.2.1. Supervised Learning

The goal of machine learning is to derive relationships, patterns and regularities from data sets [42]. These relationships can then be applied to new, unknown data and problems to make predictions. ML algorithms can be divided into three subclasses: supervised, unsupervised and reinforced learning. In the following, only the class of supervised learning will be discussed in more detail, since algorithms from this subcategory were used in this paper, namely Gaussian process regression (GPR). Supervised ML was used because of the available labelled data.

In supervised learning, the system is fed classified training examples. In this data, the input values are already associated with known output data values. This can be done, for example, by an already performed series of measurements with certain input parameters (input) and the respective measured values (output). The goal of supervised learning is to train the model or the algorithms using the known data in such a way that statements and predictions can also be made about unknown test data [42]. Due to the already classified data, supervised learning represents the safest form of machine learning and is therefore very well suited for optimization tasks [42].

In the field of supervised learning, one can distinguish between the two problem types of classification and regression. In a classification problem, the algorithm must divide the data into discrete classes or categories. In contrast, in a regression problem, the model is to estimate the parameters of pre-defined functional relationships between multiple features in the data sets [42,43].

A fundamental danger with supervised learning methods is that the model learns the training data by role and thus learns the pure data points rather than the correlations in data. As a result, the model can no longer react adequately to new, unknown data values. This phenomenon is called overfitting and must be avoided by choosing appropriate training parameters [31]. In the following, basic algorithms of supervised learning are presented, ranging from PR and SVM to NN and GPR.

#### 3.2.2. Polynomial Regression

At first, we want to introduce polynomial regression (PR) for supervised learning. PR is a special case of linear regression and tries to predict data with a polynomial regression curve. The parameters of the model are often fitted using a least square estimator and the overall approach is applied to various problems, especially in the engineering domain. A basic PR model can lead to the following equation [44]:

$$\mathbf{x}\_{i} = \beta\_{0} + \beta\_{1}\mathbf{x}\_{i1} + \beta\_{2}\mathbf{x}\_{i2} + \dots + \beta\_{k}\mathbf{x}\_{ik} + e\_{i} \text{ for } i = 1, 2, \dots, n \tag{1}$$

with *β* being the regression parameters and *e* being the error values. The prediction targets are formulated as *y*<sup>i</sup> and the features used for prediction are described as *x*i. A more sophisticated technique based on regression models are support vector machines, which are described in the next section.

#### 3.2.3. Support Vector Machines

Originally, support vector machines (SVM) are a model commonly used for classification tasks, but the ideas of SVM can be extended to regression as well. SVM try to find higher order planes within the parameter space to describe the underlying data [45]. Thereby, SVM are very effective in higher dimensional spaces and make use of kernel functions for prediction. SVM are widely used and can be applied to a variety of problems. In this regard, SVM can also be applied nonlinear problems. For a more detailed theoretical insight, we refer to [45].
