2.3.4. Kronecker–Toeplitz-Structured Covariance Estimation

The question remains how to estimate *S*ˆ and *T*ˆ. Although the flip-flop and non-iterative flip-flop algorithms [44–46] can estimate Kronecker or Kronecker–Toeplitz-structured covariances, new results show that a fixed point iteration is more efficient [47,48]. After each iteration, the spatial and temporal covariance matrices are scaled to unit variance to ensure that the fixed-point iteration converges. Finally, shrinkage can also be introduced in the fixed-point iteration to improve stability and achieve more robust regularization [42,48–50].

The spatial and temporal covariance matrices are shrunk at every fixed-point iteration with shrinkage factors *β<sup>k</sup>* and *γ<sup>k</sup>* before matrix inversion in the next iteration. Combined, this leads to the iterative estimation algorithm described by the following equations:

$$\bar{S}\_{k+1} = \frac{1}{n} \sum\_{i=1}^{n} X\_i^{\mathsf{T}} \hat{T}\_k^{+} X\_i \tag{15a}$$

$$\mathcal{T}\_{k+1} = \frac{1}{n} \sum\_{i=1}^{n} X\_i \mathbb{S}\_k^+ X\_i^\mathsf{T} \tag{15b}$$

$$\tilde{S}\_{k+1}^{(\mathcal{J})} = (1 - \beta\_{k+1})\tilde{S}\_{k+1} + \beta\_{k+1} \frac{\text{Tr}(\mathcal{S}\_{k+1})}{\mathfrak{c}} \mathbb{I} \tag{16a}$$

$$\mathcal{T}\_{k+1}^{(\gamma)} = (1 - \gamma\_{k+1})\mathcal{T}\_{k+1} + \gamma\_{k+1}\frac{\text{Tr}(\mathcal{T}\_{k+1})}{s}\mathbb{I} \tag{16b}$$

$$\mathcal{S}\_{k+1} = \frac{c}{\text{Tr}\left[\mathcal{S}\_{k+1}^{(\beta)}\right]} \mathcal{S}\_{k+1}^{(\beta)} \tag{17a}$$

$$
\hat{T}\_{k+1} = \frac{\mathbf{s}}{\text{Tr}\left[\mathcal{T}\_{k+1}^{(\gamma)}\right]} \vec{T}\_{k+1}^{(\gamma)} \tag{17b}
$$

*S*ˆ <sup>0</sup> and *T*ˆ <sup>0</sup> can be initialized to any positive definite matrix. We choose to use the identity matrices I*c*×*<sup>c</sup>* and I*s*×*<sup>s</sup>* . After each iteration, all diagonals of *R*ˆ *<sup>k</sup>*+<sup>1</sup> are set to their mean values to ensure that *R*ˆ *<sup>k</sup>*+<sup>1</sup> and *T*ˆ *<sup>k</sup>*+<sup>1</sup> are Toeplitz-structured.

Xie et al. [51] show that the LOOCV estimates for the optimal values of *βk*+<sup>1</sup> and *γk*+<sup>1</sup> also yield a closed-form solution for the Kronecker fixed-point-iteration algorithm:

$$\beta\_{k+1} = 1 - \frac{\frac{n}{n-1} \text{Tr}(\mathbf{S}\_{k+1}^2) - \frac{2}{\varepsilon} \left[ \text{Tr}(\mathbf{S}\_{k+1}) \right]^2 + \frac{1}{\varepsilon} \text{Tr}(\mathbf{S}\_{k+1}^2) - \frac{1}{\overline{n}(n-1)} \sum\_{i=1}^n \left[ \text{Tr}(\mathbf{X}\_i \mathbf{\hat{T}}\_k^+ \mathbf{X}\_i^\top)^2 \right]}{\frac{n^2 - 2n}{(n-1)^2} \text{Tr}(\mathbf{S}\_{k+1}^2) - \frac{2}{\varepsilon} \left[ \text{Tr}(\mathbf{S}\_{k+1}) \right]^2 + \frac{1}{\varepsilon} \text{Tr}(\mathbf{S}\_{k+1}^2) + \frac{1}{\overline{n}(n-1)^2} \sum\_{i=1}^n \left[ \text{Tr}(\mathbf{X}\_i \mathbf{\hat{T}}\_k^+ \mathbf{X}\_i^\top)^2 \right]} \tag{18a}$$

$$\gamma\_{k+1} = 1 - \frac{\frac{\text{n}}{\text{n}-1} \text{Tr} \left( \tilde{T}\_{k+1}^{2} \right) - \frac{2}{s} \left[ \text{Tr} \left( \tilde{T}\_{k+1} \right) \right]^2 + \frac{1}{s} \text{Tr} \left( \tilde{T}\_{k+1}^{2} \right) - \frac{1}{\text{n}(\text{n}-1)} \sum\_{i=1}^{n} \left[ \text{Tr} \left( X\_{i}^{\text{T}} \tilde{S}\_{k}^{+} X\_{i} \right)^{2} \right]}{\frac{\text{n}^{2} - 2 \text{n}}{(\text{n}-1)^{2}} \text{Tr} \left( T\_{k+1}^{2} \right) - \frac{2}{s} \left[ \text{Tr} \left( T\_{k+1} \right) \right]^{2} + \frac{1}{s} \text{Tr} \left( T\_{k+1}^{2} \right) + \frac{1}{n(\text{n}-1)^{2}} \sum\_{i=1}^{n} \left[ \text{Tr} \left( X\_{i}^{\text{T}} \tilde{S}\_{k}^{+} X\_{i} \right)^{2} \right]} \tag{18b}$$

The shrinkage parameters 0 < *βk*+<sup>1</sup> < 1 and 0 < *γk*+<sup>1</sup> < 1 should be re-determined after each iteration. The oracle approximation shrinkage method can also be used to determine *βk*+<sup>1</sup> and *γk*+<sup>1</sup> [51,52] but performs worse for spatiotemporal EEG data since not all assumptions are met.
