1. Introduction
The brain is a highly complex system [
1,
2]. Multiple interconnected brain regions with specific information processing capabilities interact to support the cognitive tasks [
3,
4], and the strength and the direction of interactions change dynamically [
5,
6]. For example, the dynamic interactions between the hippocampus (Hp) and posterior parietal cortex (PPC) have been detected in mental arithmetic tasks [
5]. The strength of the information flow from the Hp to the dorsal PPC reaches the maximum during mental arithmetic. The maximum value from the Hp to the central PPC is found in verbal memory recall. In rodent spatial associative tasks, the information flows from Hp to the prefrontal cortex (PFC), but the direction reverses in the sampling period [
6]. Therefore, a complete description of these interactions, in terms of both strength and directionality, is necessary to reveal the function and the cooperative work of brain regions.
As a measurement of the information interaction between two signals, transfer entropy (TE) is model-free and does not assume any signal or interaction structure [
7,
8]. Therefore, it has been widely used in neuroscience [
9,
10,
11,
12]. However, TE is the average of information transfer over time. Th application of a sliding window is the most common way to explore the dynamic interaction process within and between brain regions. The neural signals should be divided into continuous (non)overlapping segments [
13] or be separated into different epochs according to the task [
14,
15]; then, TE for each segment (epoch) is calculated. To ensure enough samples for TE estimation, the choice of window length is usually a compromise between the estimation precision and the temporal resolution of the dynamic process. The larger the time window, the lower the resolution.
To improve the temporal resolution, a technique (called the ensemble method) takes advantage of multiple realizations of the dynamic process [
16], such as numerous recordings of evoked or event-related potentials/fields [
17,
18]. By estimating TE from the ensemble members instead of individual trials [
19], it allows for a time-resolved analysis of the interaction dynamics. Based on this technique, Gómez-Herrero and his colleagues proposed a data-efficient estimator of probability densities to calculate TE from an ensemble of realizations [
20]. Additionally, Wollstadt [
21] combined the ensemble method with the TE self-prediction optimality (
) estimator, which was introduced by Wibral [
22] to achieve the optimal estimation of delayed information transfer in an ensemble of independent repetition trials. In the following, we use the ensemble transfer entropy
to indicate the transfer entropy estimated from an ensemble of realizations with the
estimator. Instead of TE estimation for each trial, a single
metric value can be accurately estimated from the ensemble members.
is not only suitable for short-time data but also for non-stationary signals, which are commonly observed in neuroscience. So,
can be used for analyzing the dynamic interaction processes between neural signals.
However, the
algorithm is still at the theoretical level and cannot be applied to the actual neural signals due to its enormous amount of calculation [
21]. Firstly, to estimate the
metric value accurately, a mutual information estimation method proposed by Kraskov, Stogbauer, and Grassberger (KSG) is used [
23]. The KSG estimator spends most of the CPU time searching for neighbors, especially in high-dimensional spaces. The complexity of this algorithm is
, where
and
are the number of independent repetitions and the sample size in a trial, respectively. The computational complexity is much larger than the methods based on partitioning the observation space (the complexity is
) [
24]. Secondly, constructing the null hypothesis distribution in
leads to the calculation increasing several orders of magnitude. In TE, the null hypothesis distribution can be constructed by one set of surrogate data (
trials) [
9]. However, only a single metric value can be obtained from
trials in the ensemble method, so multiple sets (usually more than 500) of surrogate data are needed to construct the null hypothesis distribution [
21]. The computational complexity of TE with KSG estimator is
, but for
, the complexity is
, where
is the number of surrogate data sets. So,
is much more complex than TE. For a neuroscience experiment with channel pairs (100)
the number of surrogate data sets (1000)
stimulus conditions (4)
subjects (15), the elapsed time of
is 240 weeks [
21].
One approach to reducing the time consumption for
is to use faster hardware devices such as FPGA [
25], graphic processing unit (GPU) [
21], and computer cluster [
26].
working on GPU is one of the most effective methods to alleviate the time-consuming problem. However,
still requires extended running time even with GPU when enormous data is involved. The experimental data in neuroscience takes about 4.8 weeks on a single GPU (NVIDIA GTX Tian) and will take longer when it searches for information transfer delay. However, the use of multiple GPUs puts forward a higher requirement on computer performance. Another approach is to use the simple estimation method. In the phase transfer entropy proposed by Lobier [
27], mutual information is estimated for phase time series using a simple binning method. This approach effectively reduces the running time and can calculate the strength and the direction of interaction [
28,
29,
30]. However, simple discretization by partitioning the observation space ignores the neighborhood relationship in continuous data, which may cause the loss of important information [
31], leading to the failure of mutual information estimation in real-valued data.
Hence, to reduce the computational cost, a fast, efficient with a simple statistical test method is proposed here. Based on the characteristic that is the average value of its local transfer entropy (), we use a simple t-test for the of the raw data against the from the surrogate data (one set) as the statistical test method in the novel . Because just one set of surrogate data is used, the time consumption of the novel is significantly reduced. Then, we employ a widely used neural mass model (NMM) to produce neural signals through which the characteristics of the novel are compared with those of the traditional method. The results show that the time consumption is reduced by two or three magnitudes in the novel . Importantly, the proposed robustly detects the strength and the direction of interaction, and it reaches stability with the increase in the sample size, which is slower than the traditional . Furthermore, the novel can track the dynamic interaction processes between signal pairs and the effectiveness of the novel has also been verified in the realistic neural signals recorded from pigeons.
This paper is organized as follows:
Section 2 introduces the novel
and the NMM we used.
Section 3 investigates the characteristics of the novel
on the simulated signal pairs and the actual neural signals and compares the performance of the novel
with that of the traditional method.
Section 4 discusses the results, and
Section 5 is a conclusion.
2. Materials and Methods
In information-theoretic framework, Shannon entropy defines the measurement of information uncertainty. For a random variable
probability distribution
, its Shannon entropy is:
Shannon entropy can be extended to two random variables. For the
and
probability distribution
and
, the joint entropy can be defined as in Equation (2):
The conditional entropy in Equation (3) is the average uncertainty about
that remains when the value of
is known:
The mutual information between
and
measures the reduction of one variable’s uncertainty by the knowledge of another one:
By assuming a third random variable
, the conditional mutual information of
and
is:
Mutual information has been widely used in neuroscience [
32]. However, the major problem is that mutual information contains no directionality. Transfer entropy, which describes the uncertainty reduction in predicting the target variable by adding the historical information of a new variable [
33], is proposed to solve this deficiency. For
and
,
defines the conditional mutual information between
and
(historical information of
) under
(historical information of
):
Suppose (1) there is an interaction delay
between
and
;
(2) and can be approximated by a Markov process of order and , respectively.
With these assumptions, TE can be rewritten in a more general form as in Equation (7):
Equation (7) can be viewed as the average of information transfer over time. Based on this, the local transfer entropy (te) is proposed [
34], realizing the local or pointwise interaction (Equation (8)):
From Equations (7) and (8), we know that the transfer entropy is the average value of the local transfer entropy.
2.1. Ensemble Transfer Entropy () and Ensemble Local Transfer Entropy ()
When independent repetition trials of an experimental condition meet the cyclo-stationarity, these trials are taken as an ensemble of realizations, and various probability density functions (PDFs) can be accurately estimated from the ensemble members. In this paper, we use the subscript
to indicate the ensemble transfer entropy with the TE self-prediction optimality estimator:
where
is the number of independent repetition trials.
When the number of repetitions is sufficient to provide the necessary amount of data to estimate various PDFs in the time window
reliably, the
in
can be estimated:
With these definitions in place, we can obtain the ensemble local transfer entropy (
):
2.2. Estimating Ensemble Transfer Entropy
A TE estimator KSG with less bias has been widely used [
24]. This method is based on the nearest neighbor estimator of Kozachenko and Leonenko [
35]. The distance of
-th nearest neighbor in the high-dimensional spaces are projected to the low-dimensional spaces so that the deviations caused by the different spatial scales in low-dimensional spaces are significantly reduced. In this paper, the KSG estimator is applied to
and
. Instead of searching for the nearest neighbors in the state space constructed by the individual trial, we proceed in all repetitions [
36]:
where
is the Digamma function,
;
means average;
,
, and
are the number of samples falling into the strip of the marginal space
,
,
, respectively. The strip is defined by the distance to its
-th nearest neighbors. In general,
is 4 [
23]:
2.3. Parameter Selection
The information transfer delay
between
and
, the embedded dimension (Markov approximation order
and
) and embedded delay
, have a significant impact on
estimation. We use the TE self-prediction optimality estimator to obtain the transfer delay [
21]. When the
is maximal, the assumed delay
is equal to the true information transfer delay
(Equation (14)) [
22].
,
, and
are calculated by using the Rawdgitz criterion [
37]:
2.4. Surrogate Data and The Improved Statistical Test Method
is a biased estimation with no upper bound [
9], so it is necessary to generate surrogate data and construct the null hypothesis distribution to test the statistical significance of the
metric value. In the surrogate data, it is assumed that there is no information transfer between the source variable
and the target variable
. The commonly used method is to shuffle
, which destroys the dependence between
and
while retaining the probability distribution of the variables [
38]. Here, the source signals of each independent repetition trial are separated into two segments (
Figure 1). Then, these segments are shuffled to ensure each segment is not in the same position as before. We can obtain the surrogate data (
,
).
In the traditional
method, at least 500 sets of surrogate data are required to generate and then
metrics are estimated to construct the null hypothesis distribution. The null hypothesis can now be rejected or retained by comparing the
metric value of the raw data to the null hypothesis distribution at the 1% (5%) level of significance [
21].
Here, we modify the statistical test method in the traditional
. The
t-test is a parametrical statistical significance test, which is used to test whether there is any difference in the mean values of two groups. Based on the characteristic that
is the average value of
[
39], a
t-test is performed on the
values of the raw and the surrogate data. If the null hypothesis is rejected, it indicates that there is a significant difference between the
of the raw data and the
from the surrogate data.
Due to its high power, the
t-test has been widely used to measure the difference in the mean values from two groups. In small samples, the
t-test is valid only for data that is normally distributed [
40]. However, because of the central limit theorem, the t-statistic is normally distributed with unit variance when the sample size is large, no matter what distribution the data has. Thus, the
t-test will always be appropriate for large enough samples [
41,
42]. However, how large is large? The sample size relates to the difference in variance and the prevalence of extreme outliers. A large body of literatures indicate that “sufficiently large” is often less than 500 in extremely non-normally distributed data [
41].
In the ensemble method, although the distributions of
values (which are estimated from the raw and the surrogate data, respectively) are non-normally distributed (
Figure S1), the samples of
are often substantially larger than 500. Therefore, the
t-test is applicable to the
values. We also compare the
t-test and the Wilcoxon rank sum test (
Figure S2). The results of the two methods are almost the same.
2.5. Neural Mass Model
Signal pairs are generated by NMM described in [
43], which simulates the connectivity between multiple regions of interesting (ROIs) through long-range excitatory connections. In the NMM, the average spike density of pyramidal neurons of the presynaptic area (
) affects the target region by a weight factor
and a time delay
(Equation (15)):
where
is a Gaussian white noise. The superscripts
and
are represented by the presynaptic and target region, respectively.
Signal pairs generated by the NMM are nonlinear and have significant β (about 20 Hz) activity. By changing the information transfer delay and weight factor , , we obtain the simulated signal pairs with directional interaction.
4. Discussion
To reduce the time consumption of the ensemble transfer entropy () and explore the dynamic interaction process in neuroscience, we proposed a fast, efficient in which we modified the traditional statistical method. A t-test for the values that were estimated from the raw and the surrogate data was performed to test whether there was a significant difference in their mean values-. Because just one set of surrogate data was used, the time consumption of the novel was significantly reduced. To validate the improved efficiency, the coupled signal pairs generated by a neural mass model were used. First, the novel in this paper robustly detected the strength and the direction of the interaction between signal pairs with moderate noises (SNR was above 0 dB) and its performance decreased dramatically when SNR was −10 dB. It yielded almost the same false positive rate and sensitivity as those of the traditional . Second, with the increase in the window length and the number of trials, the novel reached its stable state, but it was slower than that the traditional method. Third, the novel could accurately track the dynamic interaction process and its computation time was reduced by two to three orders of magnitude compared with the traditional method. Finally, the applicability of the novel in the realistic neural signals was verified on the LFP signals of Hp and NCL when pigeons performed goal-directed decision-making tasks. Therefore, the novel in this paper may be a suitable way to investigate the dynamic interaction process between brain regions.
is a biased estimation and does not have a meaningful upper bound [
9], so it is necessary to construct the null hypothesis distribution to test the statistical significance of
, which is an essential part in
[
21]. In TE, the null hypothesis distribution can be constructed by one set of surrogate data (
trials) [
9]. However, in the ensemble method, only a single metric value can be obtained from
trials [
20], so the null hypothesis distribution is built by multiple sets (usually more than 500) of surrogate data, which dramatically increases the amount of calculation [
21]. In this paper, we introduced a simple statistical method in the novel
by performing a
t-test for
of the raw and the surrogate data (one set). Because just one set of surrogate data was needed, the computation time was reduced significantly and the computational complexity in
was fundamentally solved. However, there is still a large amount of calculation in the novel
with the KSG estimator. For the construction of a multi-brain dynamic interaction network, one workaround is that the novel
runs on GPU. Another way is to use the TE estimator with a small amount of calculation, for instance, the symbolic version of TE based on ordinal pattern symbolization, kernel-based transfer entropy, and the transfer entropy rate through Lempel–Ziv complexity. The next step is to generalize these estimators to the ensemble method and compare their performance in an ensemble of realizations.
One may wonder whether the service conditions of the
t-test are met in the novel
. In fact, the
of the raw data and the surrogate data are not normally distributed (
Figure S1a,b). However, based on the central limit theorem, the
t-test is always appropriate for large enough samples, regardless of the distribution of the data [
41,
42,
48]. In the ensemble method, the sample size of
is lager (generally more than 5000). So, it is stable to use the
t-test to detect whether there is any significant difference between
of the raw data and
from the surrogate data. Meanwhile, we compared the false positive rate and the sensitivity of the novel
with the
t-test and Wilcoxon rank-sum test. The two methods obtained the same false positive rate and CDT values (
Figure S2a,b). Therefore, in this paper, it is possible to use a
t-test of
values from the raw data against
values from the surrogate data to detect the significant difference between the
values of the raw and the surrogate data.
In the novel
, we chose
p = 0.002 as the significance level for the
t-test. The false positive rate fluctuated around 0.01 when
p was 0.002 and increased to 0.05 when
p was 0.02. Someone may doubt whether the novel
is able to control the false positive rate at the desired level and they believe there is just a 1% chance of their result being a false alarm when
p is 0.01. However, the false positive rate is not only related to
p value but also intimately connected to the sample size. When the sample size is large, the sensitivity of the statistical test method is very high. The result is positive even for the two groups with a small difference [
49]. In the ensemble method, the samples of
are usually larger than 5000. If we expect the false positive rate to be 0.01, we should reduce the
p value instead of
p = 0.01. Meanwhile, the results in
Section 3.1.1 also confirmed this conclusion. Only when
p is 0.002, the false positive rate is around 0.01, and it is 0.03 instead of 0.01 when
p is 0.01. So, the
p-value does not measure the probability that the studied hypothesis is true. It reflects our level of tolerance for the false positive rate [
50,
51].
One of the major challenges in brain science is that the neural signals are corrupted by technical noises (power line interference, impedance fluctuation, motion artifacts, etc.) and biological artifacts (volume conductor, eye movement, eyeblink, muscular, etc.) [
52,
53,
54,
55]. The presence of noise can mask the features of the neural signals and affect the analysis of the interactions between brain regions. Various methods have been proposed to eliminate noise. For instance, the elimination of noise at the source by standardizing the experimental operation [
45], reduction in the power line interference by the adaptive notch filter [
56], removal of muscle artifact by ensemble empirical mode decomposition and multiset canonical correlation analysis [
57], and so on. However, some filters that are obtained by convolution of the input with their impulse responses may blur the temporal or causal relations between signal and external events [
45]. We should be cautions when using them. If signals are recorded on multiple channels, spatial filters may be applied to remove noise [
46]. However, some noises are sufficiently complex so that we cannot disentangle them completely from neural processing, which we really need. Therefore, the analysis methods we use should be robust to noise. In
Section 3.1.1, we investigated the robustness of the novel
to noise. The results show that the novel
can measure the strength and the direction of interaction robustly when the SNR is above 0 dB. Therefore, the novel
we propose is valid in the presence of moderate noises.
based on the dependent repetition trials detects interaction within a short time window. It has high temporal resolution and is suitable for analyzing the dynamic interaction process between neural signals [
20]. However, the selection of the time window length needs to pay attention to the following points: First, we used a scanning method to obtain the interaction time delay
. However, the time delay can only be estimated accurately when the sample size is greater than 10,000 [
21]. Therefore, in order to estimate the time delay accurately in collapse trials, the length of the time window should be selected to ensure that there are enough samples in the ensemble members. Second, in the novel
, to obtain an accurate estimation of future information, enough historical information should be involved. We used the Ragwitz criterion to calculate the embedded window (embedded dimension
* embedded delay
), which includes the past information of the source and the target signals and has the ability to predict the future of the target signal [
37]. So, the larger the embedded window, the longer the time window that should be picked. Finally, the selection of the time window length is limited by the number of independent repetition trials. Based on the analysis in
Section 3.1.2, the performance of the novel
is affected by the sample size. The larger the sample size, the better the stability of the novel
. The sensitivity and the CDT values reached stability when the sample size was more than 20,000 with moderate noise. Therefore, we can select a small window to improve the temporal resolution with more independent repetitions. When the number of independent repetitions is less, a large time window should be used to ensure the stability of the performance.
In TE, the KSG estimator requires the signal to be stationary to obtain accurate results [
32]. However, this is difficult to realize in neuroscience and most neural signals are non-stationary.
solves this problem by estimating using independent repetitions in which equivalent events (or equivalent brain activity) occur periodically. The neural signals recorded from these trials are assumed to be cyclo-stationary [
58]. In general, independent repetitions meet this hypothesis [
21]. However, the brain activity in the learning tasks changes gradually and its neural signals are not cyclo-stationary, so in this case, caution should be exercised when using the novel
.
Local transfer entropy is a time-varying version of TE, which was proposed by Lizier to realize the local or pointwise interaction estimation, and TE can be expressed as an average of local transfer entropy [
39]. In recent years, local transfer entropy has been used in neuroscience [
59]. Ramón demonstrated that the local mutual information is suitable for measuring the dynamics of cross-frequency coupling in brain electrophysiological signals [
32]. Local transfer entropy has been applied to explore the dynamic coupling of the phase–amplitude during seizures [
59]. However, Sezen questions the local causality measure of local transfer entropy because the causal nature does not necessarily remain in each part [
60]. In addition, there is high-frequency leakage into the local transfer entropy, and it is difficult to explain this phenomenon theoretically. The use of local transfer entropy to measure dynamic interaction needs further research. In this paper, local transfer entropy is not directly used to investigate the dynamic interaction between signal pairs. We used the characteristic that TE is the average value of the local transfer entropy for the statistical analysis in the novel
.
Research in rodents and avian has shown that goal-directed behavior involves multiple brain regions. Among them, Hp and PFC/NCL play important roles. Hp participates in goal-directed behavior by recognizing key locations in space [
61]. PFC/NCL is involved in weighing conflicting, then making a decision [
62]. Hp and PFC/NCL have very close functional interactions that contribute to goal-directed behavior’s successful execution [
63]. In a previous work, we investigated the interaction between Hp and NCL of pigeons in a goal-directed task using the local function network and partial directional coherence. The results show that during the turning area, the functional interaction of the Hp-NCL increases significantly, and the information flows from Hp to NCL [
48], which was also detected using the novel
. However, the whole decision-making period is as long as 2 s, in which the dynamics of Hp–NCL interaction is unknown. The novel
solves this problem and it was found that the
metric value reaches the maximum at about 1 s after the animal entered the turning area. This may be due to the spatial location information forming in Hp when the animals saw the light stimulation in the turning area, and then it being transferred from Hp to NCL for decision-making. The results show that the novel
is suitable for investigating the dynamic interaction between the actual neural signals.