*2.6. Model Evaluation*

To compare the performance of the different models on the respective data sets, different measures were applied depending on the model type. All measures were calculated on a test set that was not used in training. For classification models that determined the discrete classes y as a function f(X) of the data X, the accuracy was defined as the percentage of correctly classified data points. The F1 score, in contrast, was based on the precision and recall of each class. In regression tasks with continuous target variables, the coefficient of determination R2, correlation, and root mean square error (RMSE) were applied. Due to the limited number of data, we applied leave-one-out cross validation to generate the test predictions. This procedure learns a model on the whole data set except for one sample. This was repeated for all samples in the data set.
