*4.2. Time and Memory Complexity*

As mentioned above, inverting the full *cs* × *cs* dimensional covariance matrix to construct STBF-EMP and STBF-SHRUNK can be costly and unstable, in particular in highresolution settings with many EEG channels or time samples. Constructing the full covariance and inverse covariance matrices also requires a considerable amount of memory. The structured covariance estimator of STBF-STRUCT has two advantages here.

First, because of Properties 1 and 2 there is no need to calculate the full *cs* × *cs* symmetric covariance and inverse covariance matrices for STBF-STRUCT or keep them in memory; they can instead be replaced by two smaller symmetric matrices of dimensions *c* × *c* and *s* × *s*, respectively. Furthermore, since the temporal component of the Kronecker product is Toeplitz-structured, it only requires *s* parameters to estimate. Although the inverse covariance of STBF-EMP and STBF-SHRUNK is defined by *cs*(*cs*+1) <sup>2</sup> <sup>=</sup> <sup>32</sup>×17(32×17+1) <sup>2</sup> = 122.128 parameters accounting for the symmetric nature of covariance, the structured estimator only requires *<sup>c</sup>*(*c*+1) <sup>2</sup> <sup>+</sup> *<sup>s</sup>* <sup>=</sup> <sup>32</sup>(32+1) <sup>2</sup> + 17 = 545 unique parameters. This reduction in parameters to estimate reduces memory usage and contributes to the regularization effect for low-data-availability settings. The inverse covariances of STBF-EMP and STBF-STRUCT, represented as 32 × 17 × 32 × 17 symmetric matrices of single-precision real floating point numbers for weight calculation, use 9.03 MiB of memory. The 32 × 32 and 17 × 17 matrices of STBF-STRUCT only require 5.12 KiB.

Second, structured estimation has better time complexity. Covariance estimation and inversion occupy the largest part of the STBF training time. For STBF-EMP and STBF-SHRUNK, the time complexity of this process is O(*nc*2*s*<sup>2</sup> + *<sup>c</sup>*3*s*3). Thanks to Property 1, the complexity can be reduced to O(*nc*2*s*<sup>2</sup> + *<sup>c</sup>*<sup>3</sup> + *<sup>s</sup>*3) for the structured estimator of STBF-STRUCT. The results presented in Figure 4 confirm these calculations. It can be observed that the training time of STBF-STRUCT stays low compared to STBF-EMP and STBF-SHRUNK when dimensionality increases.

The results shown in Figure 4 also confirm that the STBF-based estimators are very fast to train compared to the state-of-the-art estimator xDAWN+RG, which confirms the results in [21]. Since the training times of all STBF-based classifiers are already of the order of tenths of seconds, the question arises as to whether the improvements achieved using the structured estimator would be relevant in practice. However, the authors believe that these results could significantly impact some use-cases of the spatiotemporal beamformer, such as high-spatial- or temporal-resolution ERP analyses. One example is single-trial ERP analysis with a high temporal resolution to extract ERP time features. Such higher-resolution analyses can later be incorporated into an ERP classification framework. In addition, the speed provided by structured estimation yields a faster off-line evaluation of the STBF ERP classifier, in which multiple cross-validation folds, subjects, and hyperparameter settings often need to be explored, which can quickly increase runtime. Improvements in computation speed and memory usage can remove the need for dedicated computation hardware and enable group analyses to be run on a personal computer.

#### *4.3. Interpreting the Weights*

The weight matrix of the STBF determines how each spatiotemporal feature of a given epoch should contribute to enhancing the SNR of the discriminating signal in the classification task. Alternatively, the activation pattern can be regarded as a forward EEG model of the activity, generating the discriminating signal and the weights as a backward model [60,64]. Regularization enables a researcher to interpret better the distribution of the weight over space and time after reshaping the weight vector **w** to its spatiotemporal matrix equivalent, *W*, such that vec(*W*) = **w**. Figure 5 compares the weights calculated in STBF-EMP and STBF-SHRUNK with the weights from STBF-STRUCT.

**Figure 5.** Spatiotemporal beamformer weights calculated using four blocks of data (of 1215 epochs) from *Subject 01* from 0.2 s before until 1.0 s after stimulus onset. Regularized weights show an interpretable sparse pattern, whereas the empirical weights appear noisier. (**A**) Spatiotemporal activation pattern with spatial and temporal global field power. (**B**) STBF-STRUCT weights with spatial and temporal averages of absolute values. (**C**) STBF-SHRUNK weights. The shrinkage factor *α* = 0.05 was determined with the closed-form LOOCV method. (**D**) STBF-EMP weights.

Since the linear filter's noise suppression and signal amplification functions are deeply entangled, it is not necessarily true that features with a high filter weight directly correlate to features containing discriminatory information [64]. However, it is still possible to interpret the weights in terms of which features contribute most to the classification process, be it through noise suppression, signal amplification, or—most likely—a combination of both. The weights obtained by STBF-EMP seem to be randomly distributed over space and time; the regularized estimator used by STBF-SHRUNK and STBF-STRUCT reveal a more interpretable weight distribution. The STBF-SHRUNK weights show a sparse spatial distribution, whereas the STBF-STRUCT weights show a sparse distribution in both space and in time.

As expected, Figure 5b and d exhibit weight around the central and parietal regions, where the P300 ERP component is present. Especially the spatial weights of STBF-SHRUNK in Figure 5d correspond to the spatial activation pattern in Figure 5a. This is not surprising, since shrinkage transforms the covariance matrix closer to the identity matrix and assuming identity covariance in Figure 4 yields weights identical to the activation pattern (up to a scaling factor). Additionally, Figure 5b shows that weights in the baseline interval and after 0.6 s, which should contain no response information, are close to zero for the structured

estimator. Meanwhile, these weights are high in the occipital region between 0.1 s and 0.2 s, containing early visual components with relatively low SNR. This high weight for the early visual components confirms the results from Treder and Blankertz [65] that state that, in addition to the P300, the early N1 and P2 ERP components are also modulated by oddball attention and contain discriminatory information between attended and nonattended stimuli.

Using an interpretable classification model has many advantages. For instance, one can use the weight matrix to determine relevant time samples and EEG channels for persubject feature selection to refine the model further. The number of channels is also an important cost factor in practical BCI applications. Determining which channels do not contribute to the classification accuracy helps to reduce the number of required electrodes. Spatially clustered weights indicate that some electrodes are not used by the classifier and can be discarded accordingly with no substantial accuracy reduction. As another example, information about the timing and spatial distribution of the discriminatory information in the response can be extracted from the weights and linked to neurophysiological hypotheses.
