*2.2. Kernel-Based Renyi's Phase Transfer Entropy*

In [10] we propose a TE estimator based on kernel matrices that approximate Renyi's entropy measures of order *α*. This data-driven approach has the advantage of sidestepping the need for probability distribution estimation in TE computation. First, we show that TE can be expressed as

$$TE\_{\mathfrak{a}}(\mathbf{x}\rightarrow\mathbf{y}) = H\_{\mathfrak{a}}\left(\mathbf{y}\_{t-1}^{\mathrm{dy}}, \mathbf{x}\_{t-\mathfrak{u}}^{\mathrm{dx}}\right) - H\_{\mathfrak{a}}\left(y\_{t\prime}, \mathbf{y}\_{t-1}^{\mathrm{dy}}, \mathbf{x}\_{t-\mathfrak{u}}^{\mathrm{dx}}\right) + H\_{\mathfrak{a}}\left(y\_{t\prime}, \mathbf{y}\_{t-1}^{\mathrm{dy}}\right) - H\_{\mathfrak{a}}\left(\mathbf{y}\_{t-1}^{\mathrm{dy}}\right). \tag{4}$$

where *Hα*(*X*) stands for Renyi's *α* entropy, a generalization of Shannon's entropy [26,27], defined as

$$H\_{\mathfrak{a}}(X) = \frac{1}{1-\mathfrak{a}} \log \left( \sum\_{\mathfrak{x}} p(\mathfrak{x})^{\mathfrak{a}} d\mathfrak{x} \right), \tag{5}$$

with *α* = 1 and *α* ≥ 0. In the limiting case where *α* → 1, it tends to Shannon's entropy. Then, using the kernel-based formulation for Renyi's *α* entropy introduced in [28],

$$H\_{\mathfrak{k}}(\mathbf{A}) = \frac{1}{1-\mathfrak{a}} \log \left( \text{tr}(\mathbf{A}^{\mathfrak{k}}) \right), \tag{6}$$

where **<sup>A</sup>** <sup>∈</sup> <sup>R</sup>*n*×*<sup>n</sup>* is a Gram matrix with elements *aij* <sup>=</sup> *<sup>κ</sup>*(*xi*, *xj*), *<sup>κ</sup>*(·, ·) <sup>∈</sup> <sup>R</sup> stands for a positive definite and infinitely divisible kernel function, *n* for the number of realizations of *X*, and tr(·) for the matrix trace; along with the accompanying formulation for the Renyi's *α* entropy of joint probability distributions,

$$H\_{\mathbf{f}}(\mathbf{A}, \mathbf{B}) = H\_{\mathbf{f}}\left(\frac{\mathbf{A} \circ \mathbf{B}}{\text{tr}(\mathbf{A} \circ \mathbf{B})}\right) = \frac{1}{1 - \alpha} \log\left(\text{tr}\left(\left(\frac{\mathbf{A} \circ \mathbf{B}}{\text{tr}(\mathbf{A} \circ \mathbf{B})}\right)^{\alpha}\right)\right),\tag{7}$$

where **<sup>B</sup>** <sup>∈</sup> <sup>R</sup>*n*×*<sup>n</sup>* is a second Gram matrix and the operator ◦ stands for the Hadamard product, we estimate the TE*α* from **x** to **y** as:

$$\begin{split} TE\_{\mathbf{x}\mathbf{z}}(\mathbf{x}\rightarrow\mathbf{y}) &= H\_{\mathfrak{a}}\Big(\mathbf{K}\_{\mathbf{y}\_{t-1}^{d\mathbf{y}}}, \mathbf{K}\_{\mathbf{x}\_{t-u}^{d\mathbf{x}}}\Big) - H\_{\mathfrak{a}}\Big(\mathbf{K}\_{\mathbf{y}\_{t}^{d\mathbf{y}}}, \mathbf{K}\_{\mathbf{y}\_{t-1}^{d\mathbf{x}}}, \mathbf{K}\_{\mathbf{x}\_{t-u}^{d\mathbf{x}}}\Big) \\ &+ H\_{\mathfrak{a}}\Big(\mathbf{K}\_{\mathbf{y}\_{t'}}, \mathbf{K}\_{\mathbf{y}\_{t-1}^{d\mathbf{y}}}\Big) - H\_{\mathfrak{a}}\Big(\mathbf{K}\_{\mathbf{y}\_{t-1}^{d\mathbf{y}}}\Big), \end{split} \tag{8}$$

where the kernel matrices **<sup>K</sup>***yt* , **Ky***dy t*−1 , **Kx***dx t*−*u* <sup>∈</sup> <sup>R</sup>(*D*−*u*)×(*D*−*u*) hold elements *kij* <sup>=</sup> *<sup>κ</sup>*(**a***i*, **<sup>a</sup>***j*). For **<sup>K</sup>***yt* , *ai*, *aj* <sup>∈</sup> <sup>R</sup> are the values of the time series **<sup>y</sup>** at times *<sup>i</sup>* and *<sup>j</sup>*. While for **Ky***dy t*−1 , the vectors **<sup>a</sup>***i*, **<sup>a</sup>***<sup>j</sup>* <sup>∈</sup> <sup>R</sup>*<sup>d</sup>* contain the time embedded version of **<sup>y</sup>**, **<sup>y</sup>***dy <sup>t</sup>* , at times *i* and *j*, adjusted according to the time indexing of TE. The same logic holds true for **Kx***dx* .

*t*−*u* In this study, we hypothesize that the above-described TE estimator, having previously displayed robustness to common issues that affect connectivity analyses [10], could overcome many of the problems associated with single-trial phase TE estimation [8]. Hence, we propose a kernel-based Renyi's phase TE estimator defined as:

$$\begin{split} TE\_{\kappa a}^{\theta} \left( \mathbf{x} \rightarrow \mathbf{y}, f \right) &= H\_{\mathfrak{a}} \Big( \mathbf{K}\_{\mathfrak{o}\_{t-1}^{\mathbf{y}, \mathbf{y}, \mathbf{y}}} \mathbf{K}\_{\mathfrak{o}\_{t-u}^{\mathbf{x}, \mathbf{d}}} \Big) - H\_{\mathfrak{a}} \Big( \mathbf{K}\_{\mathfrak{o}\_{t} \cdot \mathbf{x}} \mathbf{K}\_{\mathfrak{o}\_{t-1}^{\mathbf{y}, \mathbf{d}\_{t}} \cdot \mathbf{K}\_{\mathfrak{o}\_{t-u}^{\mathbf{x}, \mathbf{d}}} \Big) \\ &+ H\_{\mathfrak{a}} \Big( \mathbf{K}\_{\mathfrak{o}\_{t} \cdot \mathbf{K}\_{\mathfrak{o}\_{t-1}^{\mathbf{y}, \mathbf{d}}} \Big) - H\_{\mathfrak{a}} \Big( \mathbf{K}\_{\mathfrak{o}\_{t-1}^{\mathbf{y}, \mathbf{d}}} \Big), \end{split} \tag{9}$$

where the kernel matrices **K***θ<sup>t</sup>* , **<sup>K</sup>***<sup>θ</sup> y*,*dy t*−1 , **K***θx*,*dx t*−*u* <sup>∈</sup> <sup>R</sup>(*D*−*u*)×(*D*−*u*) hold elements analogous to those of matrices **<sup>K</sup>***yt* , **Ky***dy t*−1 , and **Kx***dx t*−*u* in Equation (8), while replacing the time series **x** and **y** for their instantaneous phase time series *θ<sup>x</sup>* and *θ<sup>y</sup>* at frequency *f* , respectively.
