*3.2. Classifier Accuracy for Limited Training Data*

It is of interest to keep the calibration time before BCI operation as short as possible. We mimic this problem by training the classifier with as few training epochs as possible. We evaluate the performance of all classifiers for different levels of available training data and apply the cross-validation procedure nine times (the number of blocks in the training fold) for all subjects, keeping the corresponding number of blocks in the training folds and dropping the rest. Figures 3 and A1 show each classifier's accuracy relative to the data availability. We statistically compare the two newly proposed classifiers, STBF-STRUCT and STBF-SHRUNK, for different levels of training data availability using a one-sided paired Wilcoxon rank-sum test with Holm correction for the multiple pairwise comparisons between classifiers. We performed this analysis three times: by only using the first trial of a block, by averaging epochs across the first two trials of a block, and across the first five trials of a block. Results validated on one trial are reported in Table 1, two-trial results in Table 2, and five-trial results in Table 3.

**Table 1.** *p*-values calculated via the one-sided paired Wilcoxon rank-sum test with Holm correction using one testing trial for different classifiers and levels of data availability. *p*-values < 0.05 are considered significant and marked bold.



**Table 2.** *p*-values as in Table 1, averaging over two testing trials.

**Table 3.** *p*-values as in Table 1, averaging over five testing trials.


**Figure 3.** Accuracy of the different classifiers for all 21 subjects relative to the number of blocks available for training. One block consists of 135 epochs and corresponds to 27 seconds of stimulation. Accuracies are shown for the evaluation settings averaging over 1, 2, and 3 trials of testing stimuli. Figure A1 contains results for all numbers of trials. Although STBF-EMP is unstable when few training data are available, regularization of the covariance matrix (STBF-SHRUNK and STBF-STRUCT) drastically improves performance.

The tables show that STBF-STRUCT has a significant advantage over STBF-SHRUNK when the number of training blocks is low. This effect is present for 1-, 2-, and 5-trial evaluations. This advantage decreases (the *p*-value increases) when adding more training blocks. Both STBF-STRUCT and STBF-SHRUNK perform significantly better than STBF-EMP for all evaluated settings. Compared to xDAWN+RG, STBF-STRUCT also has significantly higher accuracy in almost all evaluated settings, except when using only one training block. STBF-SHRUNK does not outperform xDAWN+RG when training data are scarce but gains a significant advantage when using more training data.
