**4. Discussion**

## *4.1. Classification Accuracy*

As evidenced by Figure 3 and Tables 1–3, the regularized classifiers STBF-SHRUNK and STBF-STRUCT significantly improve the classification accuracy compared to the original STBF-EMP for all the numbers of training blocks indicated. We believe there are three reasons for this. First and foremost, the empirical covariance matrix in STBF-EMP becomes ill-conditioned when the number of available training epochs is smaller than the number of features (*n* < *cs*), rendering its inversion with the Moore–Penrose pseudoinverse unstable. This is the case for STBF-EMP when *n* = *cs* = 32 × 17 = 544, after which the accuracy of STBF-EMP starts to increase. This effect is visible in Figure 3, where the accuracy starts increasing when using more than four training blocks, amounting to 540 epochs. The noticeable dip in accuracy when using around 540 epochs can be explained by numerical effects in the pseudoinverse for very small eigenvalues [60–63]. Regularization of the covariance matrix with shrinkage ensures that the covariance matrix is non-singular and better conditioned so that it can stably be inverted. Second, covariance regularization introduces a trade-off between the variance and bias of the model [32]. Better performance on unseen data can be achieved when some model variance is traded for extra bias. Regularization reduces the extreme values present, as shown in Figure 1, resulting in a classifier with better generalization. Third, the true spatiotemporal covariance matrix may vary throughout BCI sessions, e.g., due to movement of the EEG-cap, changing impedances of electrodes, subject fatigue, the introduction of new spatiotemporal noise sources, and other possible confounds. A regularized covariance matrix should better account for changes in true covariance. Note that the LOOCV method in principle assumes that the covariances of the training data and unseen data are the same. Because the covariance might have changed for unseen data, the shrinkage estimate obtained with LOOCV is probably still an underestimation of the optimal—but unknown—shrinkage coefficient that would yield the best classification accuracy for the unseen data.

Another observation is the significantly better accuracy score of STBF-STRUCT over STBF-SHRUNK when the amount of available training data is small. This property is an attractive advantage in a BCI setting since it is desirable to keep the calibration (training) phase as short as possible without losing accuracy. The accuracy advantage of the structured estimator is a consequence of the Kronecker–Toeplitz covariance structure, which is informative for the underlying process generating the epochs, if it is assumed that the EEG signal is a linear combination of stationary activity generated by random dipoles in the brain with added noise [24,35,41]. Hence, STBF-STRUCT can utilize this prior information to better estimate the inverse covariance. The increase in accuracy for small training set sizes can also be explained by the smaller number of parameters necessary to estimate the inverse covariance (see Section 4.2), increasing the stability of matrix inversions.

When compared to the state-of-the-art xDAWN+RG classifier, we conclude that STBF-STRUCT reaches similar accuracy when using only one block of training data. The authors suspect this is due to both classifiers having insufficient training information to reach satisfactory classification accuracy. When more data are available, STBF-STRUCT reaches significantly better accuracies. Combined with the benefits laid out in Sections 4.2 and 4.3, this makes it an attractive option for ERP classification. STBF-SHRUNK does not show decisive accuracy improvements over xDAWN+RG using a few training blocks, but this improves as the training data increases.
