4.2.3. Training the Model

As described before, one of the main tasks of machine learning algorithms was the training of the model. The scikit-learn environment offers the function fit(X,y), with the input variables X and y. Here, X was the feature vector, which contains the feature data of the test data set (the control variables of the coating plant). The variable y was defined as the target vector and contains the target data of the test data set (the characteristic values of the coating characterization). By calling the method reg\_model.fit(X,y) with the available data and the selected regression model (GPR, in general reg\_model) the model was trained and fitted on the available data.

Particularly with small datasets, there was the problem that the dataset shrank even further when the data was divided into training and test data. For this reason, the k-fold cross-validation approach could be used [31]. Here, the training data set was split into k smaller sets, with one set being retained as a test data set per training run. In the following runs, the set distributions change. This approach can be used to obtain more training datasets despite small datasets, thus significantly improving the training performance of the model.
