4.2.3. Avoiding Overfitting

Small subsets of data can be prone to overfitting their input data, i.e., adapting to their characteristics too well. Cross-validation is a methodology which allows to counteract overfitting, as the available data of input features and known correct output classifications is not simply divided into disjunct subsets of training and testing data, but broken down into multiple smaller sets, so-called folds. The classifier is then trained in a leave-one-out manner: Each fold is once used as the test subset, while all other folds are used for training. The results of these training phases are averaged and given as the Cross-Validation Score (CVS) [29,32].
