2.3.2. Shrunk Covariance Estimation

The shrinkage covariance estimator creates a better-conditioned inversion matrix problem and generally performs better when applied to unseen data. The estimators for the covariance and inverse covariance are given by:

$$
\hat{\mathbf{C}}\_{a} = (1 - a)\hat{\mathbf{C}}\_{\text{emp}} + a \frac{\text{Tr}(\hat{\mathbf{C}}\_{\text{emp}})}{\text{cs}} \mathbb{I} \tag{8}
$$

$$
\widehat{\mathsf{C}^{-1}}\_{\mathfrak{a}} = \mathsf{C}^{+}\_{\mathfrak{a}} \tag{9}
$$

with 0 < *α* < 1. Analogous to L2 regularization of the beamforming problem, shrinkage reduces the ratio between the smallest and largest eigenvalues of the covariance matrix by strengthening the diagonal. Figure 1c,d show examples of the shrunk estimator of the covariance and the inverse covariance matrices, respectively.

**Figure 1.** Different estimators of the covariance and inverse covariance of 100 epochs of data from *Subject 01* for channels *Fz*, *Cz*, *Pz*, and *Oz* and time samples between 0.1 s and 0.6 s. Regularized estimators of the inverse covariance exhibit less extreme values and have a sparser structure. (**A**,**B**) Empirical covariance and inverse covariance matrices. (**C**,**D**) Shrunk covariance and inverse covariance matrices with *α* = 0.14 as determined by the closed-form leave-one-out crossvalidation (LOOCV) method. (**E**,**F**) Kronecker–Toeplitz-structured covariance and inverse covariance matrices. (**G**,**H**) Spatial Kronecker factor of the Kronecker–Toeplitz-structured shrunk estimator and its inverse. (**I**,**J**) Temporal Kronecker factor of the Kronecker–Toeplitz-structured shrunk estimator and its inverse.

Earlier work [23] applied shrinkage regularization to ERP decoding with the spatiotemporal beamformer and showed competitive performance compared to other state-of-the-art decoding techniques such as stepwise LDA or SVM. The abovementioned researchers chose the shrinkage coefficient *α* as a fixed hyperparameter. However, its optimal value depends on the number of training epochs, the covariance matrix's dimensionality, and the independence and variance of the data, which can vary across evaluation settings and per session. The optimal value for *α* can be found with a line search using crossvalidation method, but this can be a costly procedure. Methods exist to estimate an optimal shrinkage value directly from the data. Most notable among these are the Ledoit–Wolf procedure [32], the Rao–Blackwell Ledoit–Wolf method [33], and the oracle approximating shrinkage method [33]. A more recent estimation method [34] emulates a leave-one-out cross-validation (LOOCV) scheme expressed by the data-driven closed-form estimate:

$$\alpha = 1 - \frac{\frac{n}{n-1} \text{Tr}(\mathbf{\hat{C}}\_{\text{emp}}^2) - \frac{2}{cs} \left[ \text{Tr}(\mathbf{\hat{C}}\_{\text{emp}}) \right]^2 + \frac{1}{cs} \text{Tr}(\mathbf{\hat{C}}\_{\text{emp}}^2) - \frac{1}{n(n-1)} \sum\_{i=1}^n ||\mathbf{x}\_i||\_2^4}{\frac{n^2 - 2s}{(n-1)^2} \text{Tr}(\mathbf{\hat{C}}\_{\text{emp}}^2) - \frac{2}{cs} \left[ \text{Tr}(\mathbf{\hat{C}}\_{\text{emp}}^2) \right]^2 + \frac{1}{cs} \text{Tr}(\mathbf{\hat{C}}\_{\text{emp}}^2) + \frac{1}{n(n-1)^2} \sum\_{i=1}^n ||\mathbf{x}\_i||\_2^4} \tag{10}$$

Herein, we opt for the LOOCV shrinkage estimator because it avoids some of the assumptions made by [32,33] and because it generalizes to structured covariance estimations, as described in Section 2.3.3.

2.3.3. Spatiotemporal Beamforming with Kronecker–Toeplitz-Structured Covariance

Exploiting prior knowledge about the spatiotemporal structure of the EEG signal leads to a more regularized estimator of the covariance. When viewing the example of empirical spatiotemporal EEG covariance in Figure 1a, it becomes clear that this matrix consists of a block pattern of repeated, similar matrices. Due to the multi-channel nature of the signal, we assume that the covariance of spatiotemporal EEG epochs is a Kronecker product of two smaller matrices [35–37], as expressed by:

$$
\hat{\mathsf{C}}\_{\text{struct}} = \hat{\mathsf{S}} \otimes \hat{\mathcal{T}} \tag{11}
$$

with <sup>⊗</sup> denoting the Kronecker product operator. *<sup>S</sup>*<sup>ˆ</sup> <sup>∈</sup> <sup>R</sup>*c*×*<sup>c</sup>* and *<sup>T</sup>*<sup>ˆ</sup> <sup>∈</sup> <sup>R</sup>*s*×*<sup>s</sup>* correspond to estimators of the spatial and temporal covariance of the data, respectively. Furthermore, because the temporal covariance of the EEG-signal is stationary (i.e., it is only dependent on interval length between covarying time samples) [38], it is assumed to have a Toeplitzmatrix structure:

$$
\hat{\mathfrak{f}}\_{i,j} = \mathfrak{f}\_{i+1,j+1} \tag{12}
$$

Property 1 then leads to Equation (13) to estimate the inverse covariance.

**Property 1.** (*<sup>U</sup>* <sup>⊗</sup> *<sup>V</sup>*)<sup>+</sup> <sup>=</sup> *<sup>U</sup>*<sup>+</sup> <sup>⊗</sup> *<sup>V</sup>*<sup>+</sup> *for any non-singular matrices U and V [39].*

$$
\bar{\mathcal{L}}^{-\bar{1}}{}\_{\text{struct}} = \hat{\mathcal{S}}^{+} \otimes \mathcal{T}^{+} \tag{13}
$$

Finally, based on Property 2, Equation (4) can be reformulated more efficiently as Equation (14).

**Property 2.** (*<sup>U</sup>* ⊗ *<sup>V</sup>*) · *vec*(*W*) = *vec*(*VWU*-) *for any matrices <sup>U</sup>* <sup>∈</sup> <sup>R</sup>*p*×*p, <sup>V</sup>* <sup>∈</sup> <sup>R</sup>*q*×*q, and <sup>W</sup>* <sup>∈</sup> <sup>R</sup>*p*×*<sup>q</sup> [40].*

$$\mathbf{\hat{w}}\_{\text{struct}} = \frac{\mathbf{\hat{S}}^{+} A^{T} \mathbf{\hat{T}}^{+}}{\mathbf{a} \cdot \text{vec} (\mathbf{\hat{S}}^{+} A^{T} \mathbf{\hat{T}}^{+})} \tag{14}$$

Using Equation (14) removes the need to calculate the full, high-dimensional Kronecker product *<sup>S</sup>*ˆ<sup>+</sup> <sup>⊗</sup> *<sup>T</sup>*<sup>ˆ</sup> <sup>+</sup>. Figure 1e,f show examples of the structured covariance and inverse covariance estimators, respectively, consisting of a spatial Kronecker factor (Figure 1g,h) and a temporal component (Figure 1i,j).

The Kronecker approach has shown significant performance yields in different linear spatiotemporal EEG and MEG applications [24,37,41–43]. Van Vliet and Salmelin [25] applied a Kronecker-structured covariance estimator to ERP classification with linear models in a post hoc fashion. Our work goes further by embedding the Kronecker structure in the spatiotemporal beamformer training process, using a data-adaptive shrinkage method, and regularizing the covariance further by imposing a Toeplitz structure on the temporal covariance.
