1. Introduction
In many fields of scientific inquiry, the analysis of causal relationships in complex systems represents an important task, which depends on the synergistic interplay between theory and application. An important contribution in this area has been the proposal of Granger Causality (GC) [
1], which tests the significance of dynamic influence between pairwise time series. Building upon this foundation, Geweke [
2] expanded GC to encompass multivariate time series, introducing conditional Granger Causality (cGC). Bressler et al. [
3] and Schiatti et al. [
4] further refined cGC, enabling the estimation of causality among multivariate time series using Vector AutoRegressive (VAR) models.
Granger causality and its generalizations represent a statistical method that depends on the comparison of the residual variances resulting from applying different models to the same time series; a wide array of predictive models, including non-linear variants, may be employed. As an alternative approach to quantifying causality from predictive modeling, Impulse Response Function (IRF) analysis has been developed as a tool for examining variable responses to shocks within an identified model.
However, it is worth noting that Granger causality and IRF analysis always depend on the particular chosen model classes; they are not suitable for model discovery since they are lacking invariance theorems, as they exist for the more general concept of algorithmic complexity [
5].
Conceptually, the statistical significance of numerically calculated cGC and IRF should be assessed using the asymptotic method. However, real-world data analysis may yield incorrect results due to potential distribution discrepancies. In cases where the distribution function is unknown, such as with partial Granger causality [
6], the asymptotic method is inapplicable. The bootstrap strategy serves as a widely used alternative for cGC and IRF significance testing. However, with numerous bootstrap methods available, each with varying sensitivity and specificity, selecting the appropriate method remains a critical challenge when applying time series models to real data analysis.
In a causal analysis of time series data, failing to apply an appropriate bootstrap method can lead to several important issues. First, as causal analysis is based on the estimation of causal relationships, not using a suitable bootstrap method could potentially result in inaccuracies in estimating these relationships, thereby undermining the reliability of statistical inferences. Furthermore, bootstrap methods are often used for estimating confidence intervals, and if an unsuitable method is chosen, the confidence intervals could be under- or over-estimated, leading to inappropriate assessments of uncertainty. Additionally, bootstrap methods are used to estimate the magnitude of causal effects, and without a suitable method, the magnitude of these effects could be inaccurately estimated, which could affect the appropriateness of subsequent actions. Therefore, to avoid these problems, it is important to select and apply suitable bootstrap methods for causal analysis and choose methods based on understanding the characteristics of the time series data and the objectives of the causal analysis.
In this study, we thoroughly evaluate the performance of four bootstrap methods designed for time series analysis. The first method we present is an adaptation of the Phase Randomize Bootstrap (FRB), which generates surrogate data for hypothesis testing through phase randomization in the Fourier transform of original time series data. We propose a modification, the uncorrelated Phase-Randomized Bootstrap (uPRB), which randomizes phases for each variable in the spectral domain, for causality detection among variables. As a counterpart, we consider a technique based on Circular Block Bootstrap (CBB), a method for transfer randomization in the time domain [
7]. This approach remedies the under-representation of the beginning and the end of the time series in surrogate data by creating a circular time series. Time-Shifted Surrogates (TSS) [
8,
9], another technique we discuss, ensures the preservation of all characteristics of the initial signal by maintaining the same stat–space trajectory. The Stationary Bootstrap (SB) [
10] uses random block lengths for standard error calculations and confidence region construction in weakly dependent stationary observations. Last, we discuss the AR-Sieve Bootstrap (ARSB), a method that approximates the underlying data-generating process by fitting an AR model, useful for handling dependent data [
11,
12].
3. Bootstrap Methods
3.1. Uncorrelated Phase Randomization Bootstrap (uPRB)
The Phase Randomization Bootstrap method has been proposed as a technique to generate surrogate time series data for hypothesis testing [
13,
14]. Using the Fourier transform, the method transforms the original time series data into the frequency domain. Then, the phase of each frequency component is randomized while preserving the original amplitude spectrum. Finally, the surrogate data are obtained by applying the inverse Fourier transform to the randomized frequency components.
This process generates surrogate time series data with the same power spectrum as the original data but with disrupted temporal correlations. By comparing the original data’s non-linearity measures or other statistical properties to those of the surrogate data, researchers can test hypotheses and detect the presence of non-linear dynamics in the original time series data. This method has been applied in various fields, including the study of physiological signals [
15], economics and finance [
16], climate data [
9], and so on.
In order to preserve all linear auto-correlations and cross-correlations, the Phase Randomization Bootstrap adds a common random sequence to the phases of all variables. Thus, since this method cannot detect causality among variables, we prepare random sequences independently for each variable. In this paper, we call this method uncorrelated Phase Randomization Bootstrap (uPRB).
The procedure for generating surrogate datasets by uPRB is as follows.
- Step 1
Transform the original time series by applying Fourier transform to each variable as
where
l is the index of the variable, and
and
denote the amplitude and the phase, respectively.
- Step 2
For each frequency
f, add an independent random value
following a uniform distribution throughout the interval
to
, while satisfying the symmetry property
. That is,
- Step 3
Transform the spectral domain representation back to the time domain by applying the inverse Fourier transform to each variable as
- Step 4
Repeat Steps 2–3 for all variables.
- Step 5
Repeat Steps 2–4 a large number of times, thereby generating a set of surrogate datasets.
3.2. Stationary Bootstrap (SB)
The conventional Non-overlapping Block Bootstrap (NBB) has been proposed by Carlstein [
17]. By improving this conventional method, Künsch [
18] has proposed a Moving-Blocks Bootstrap (MBB). This method is useful, especially for small sample data having a wider range of blocks than the conventional method. However, in the process of random sampling, there is an edge effect of the uneven weighting of the selection at the beginning and end of the data. To compensate for this shortcoming, Politis and Romano [
7] proposed a Circular Block Bootstrap (CBB) that concatenates the start and end points of the original data. In the block bootstrap method, the stationarity of the sample is an important assumption. However, surrogate data obtained by resampling using the above-mentioned method are not necessarily stationary. Therefore, the Stationary Bootstrap (SB) method has been proposed, in which the surrogate data are also stationary. This method is similar to the TSS method, except that the block width is not fixed but is resampled as a random variable following a geometric distribution [
10]. The procedure for generating surrogate data by SB is as follows.
- Step 1
Set the mean block width to w. Then, in the geometric distribution used in Step 3, we have .
- Step 2
Duplicate the original data and merge it to the end of the original data such that .
- Step 3
Generate a sequence of natural random numbers corresponding to each block width following a geometric distribution so that the probability of the event is for . Here, the value of K is determined to satisfy the condition .
- Step 4
Generate a sequence of natural random numbers , corresponding to the index of the starting point of the block, following a uniform distribution over the interval .
- Step 5
The blocks, , constructed according to Steps 1 and 2, are arranged in the order in which they were extracted, and a pair of resamples is obtained with the first N elements as .
- Step 6
Repeat Steps 3–5 a large number of times, thereby generating a set of surrogate datasets.
In the case of multivariate time series, there are two ways to perform Steps 3–5. One way is to use the same blocks for all variables for data shuffling, as described above, and the other is to perform Steps 3–5 for each variable independently. In this paper, we refer to the former as correlated SB (cSB) and to the latter as uncorrelated SB (ucSB).
3.3. Time Shift Surrogates (TSS)
While PRB is a method for generating surrogate datasets by randomizing the phases in the frequency domain, we evaluated the performance of TSS as a method of randomizing the phase in the time domain. This method corresponds to the special case of the CBB, where the number of blocks is limited to 1. The procedure for generating surrogate datasets by TSS is as follows.
- Step 1
Duplicate the original data and merge it to the end of the original data such that .
- Step 2
Generate a natural random number s following a uniform distribution over the interval , extract a sequence of the data
, and use it as surrogate dataset for the lth variable.
- Step 3
Repeat Step 2 for all variables.
- Step 4
Repeat Steps 2–3 a large number of times, thereby generating a set of surrogate datasets.
3.4. AR-Sieve Bootstrap (ARSB)
The AR-Sieve Bootstrap (ARSB) method generates surrogate datasets by feeding a VAR model, which employs re-estimated parameters, with residuals from modeling [
11,
12].
The procedure for generating surrogate datasets using ARSB is as follows.
- Step 1
Fit the VAR model of Equation (
1) to the original time series, obtaining estimates for the parameters and the residuals
.
- Step 2
Compute centered residuals , where
, and generate bootstrap residuals by shuffling the indices according to a sequence of natural random numbers , which was drawn randomly with replacement from a uniform distribution over the interval .
- Step 3
Compute the surrogate time series recursively by
where
.
- Step 4
Re-estimate the VAR parameter matrices based on the bootstrap time series.
- Step 5
Repeat Steps 2–4 a large number of times, thereby generating a set of surrogate datasets.
Similar to the SB method, there are two ways to process Step 2 in the ARSB method. One is to use the same sequence of random values for all variables for data shuffling, as mentioned above, and the other is to perform Step 2 independently for each variable. In this paper, we refer to the former as correlated ARSB (cARSB) and to the latter as uncorrelated ARSB (ucARSB).
4. Simulation
In order to verify the performance of the methods for causal analysis and of the bootstrap methods, as discussed above, we prepared a simulation model by modifying a model proposed by [
19]. We set up seven oscillatory variables as depicted in
Figure 1, five of which generated their stochastic oscillations locally from driving white noise via self-feedback; furthermore, the seven variables were coupled directly or indirectly within a global Vector AutoRegressive (VAR) process, such that a dynamic feedback system results. We fixed the frequency of each variable to 0.1 Hz and randomly selected a damping coefficient for each variable from a normal distribution. The sixth and seventh variables were isolated from the other five. For instance, the first and third variables were directly connected, while the first and fourth variables were indirectly connected through the third variable. The fourth and fifth variables were directly and bidirectionally connected.
The sixth variable generated a local oscillation at 0.1 Hz via self-feedback, like the other five, while the seventh variable did not have any self-feedback, but was driven only by the sixth variable. If causality would be detected between either the sixth or seventh variables and any of the first to fifth variables, this would represent a false positive.
We set the connectivity among variables as shown in
Figure 1 and generate a simulated time series of length
through the VAR process of Equation (
1), with
. The nominal sampling frequency was 10 Hz, and the series of shocks (disturbances)
was sampled from a 7-dimensional multivariate Gaussian noise distribution with zero mean and diagonal covariance matrix given by
.
Table 1 displays the parameters of the VAR model for the simulations.
Figure 2 displays the simulated time series.
5. Results
In our pursuit of accurately estimating the 99% confidence intervals (CI) for both conditional Granger Causality (cGC) and Impulse Response Function (IRF) analyses, we generated 2000 surrogate datasets using each of the previously discussed bootstrap methods. For the SB method, we determined the optimal mean block width
w, selecting values of 5, 10, 20, and 40. As illustrated in
Figure 3a, the cSB method consistently yielded correct causality results, irrespective of the chosen mean block width. In contrast, the ucSB method produced correct causality results for
, but the detection rate of self-feedback diminished as
w increased (see
Figure 3b,c).
While the uPRB method provided accurate causality results concerning variable interactions, it failed to detect the self-feedbacks of the first through fifth variables, yielding results akin to ucSB(20) and ucSB(40) (see
Figure 3c). If the time shift
s in the TSS method was excessively small or large, the data closely resembled the original data, consequently minimizing the phase randomization effect. The selection of interval
in the Time Shift Surrogates (TSS) method has potential implications on the statistical characteristics of the generated surrogate data, particularly its correlation structure. For causality detection between variables, it is necessary to set the interval large enough beyond the lag where cross-correlation occurs. In this simulation, we set
.
Notably, the TSS method’s detection performance is inferior to that of the ucSB(20) and ucSB(40) methods, failing to detect causality from the fifth to the first variable. Importantly, our study identified no false positives within any bootstrap method.
Table 2 concisely summarizes the sensitivity and specificity for each combination of causal analysis and bootstrap methods.
As shown in Equation (
9), the IRF is denoted as
, which represents the impact of the
mth shock
on the
lth variable
at time
s. We evaluated the significance of
using each bootstrap method.
Figure 4 provides a two-dimensional representation of significant
.
In the IRF analysis, only the ARSB method, whether correlated or uncorrelated, successfully detects both self-feedback and causality among variables in their entirety.
Figure 4a displays the IRFs for the impact assigned to each variable (
), with ucARSB evaluating the significance. This outcome aligns with the connectivity diagram depicted in
Figure 1. The cSB method demonstrates results were almost identical to the ARSB method for the first through fifth variables. However, it detects a response from the sixth to seventh variable, constituting a false positive, which was observed for all tested values of the mean block width. The uPRB method primarily detects causality between directly coupled variables (see
Figure 4c). Additionally, false positives emerge where responses occur between uncoupled variables, such as the reciprocal response between the first and seventh variables and between the fifth and sixth variables. For all tested values of the mean block width, the ucSB method exhibits the weakest detection performance among the evaluated bootstrap methods, detecting only self-feedback (see
Figure 4d).
To summarize the results of these IRFs, such as GC, we compared the true IRFs generated through the VAR parameters in
Table 1 with the IRFs detected by the respective bootstrap methods and summarized the sensitivity and specificity in
Table 2. The time interval of the IRFs used for comparison was the maximum time step before feedback occurs, i.e., the IRFs up to three time steps before the impact on the third variable propagates to the first variable to the fourth and fifth variables (see
Figure 1).
6. Discussion
In this study, we present a thorough comparison of the performance of four different bootstrap methods (encompassing twelve variations, including different values of mean block width, as well as correlated and uncorrelated derivations) to evaluate the results from causal analyses.
Our findings reveal that, for the cSB method, the Confidence Interval (CI) width decreases as the mean block width increases, which is particularly evident in
Figure 5a,b. A possible explanation is that a shorter block width introduces more discontinuities in the surrogate time series, causing the dynamic properties to deviate further from those of the original time series due to the random shuffling of blocks disrupting temporal forward/backward relationships. This disruption may lead to falsely detecting causality from the seventh to sixth variables in the IRF analysis, as seen in
Figure 4b. In contrast, the CI width for the ucSB method remains wider than that of the cSB method, irrespective of the mean block width (see
Figure 5a,c), or even increases with the mean block width (
Figure 5b). This stems from each block’s starting point being determined independently for each variable, causing unnecessary destruction of causality among variables in the surrogate time series. Consequently, no causality among variables is detected in the corresponding IRF analysis (see
Figure 4d).
The TSS method accurately detects significance for the Granger causality
, but the CI widths for
and
are so large that they include zero, resulting in false negatives. This could be attributed to the
th element of
having the smallest value among the VAR parameters corresponding to causality among variables, and the
th element of
having the smallest value among the VAR parameters corresponding to the attenuation rate of self-feedback. A prime example is
, which should be numerically ignorable since our simulation model does not contain causality from the second to fifth variable (see
Table 1). Although the TSS method correctly assesses non-significance, the CI width is substantially larger than that estimated by the other bootstrap methods. These results suggest that the TSS method may not provide stable estimates of bootstrap statistics, compared to other methods, when the statistic value of the original time series is small.
In contrast to other methods, the uPRB method’s algorithm does not require a priori tuning of parameters, such as a mean block width for the SB method. Upon providing the original time series, surrogate time series can be generated almost instantaneously. The uPRB method is characterized by its simplicity and the ability to generate surrogate time series devoid of discontinuous time points, while preserving stationarity. However, a limitation of this method is its inability to detect self-feedback (see
Figure 5b). Although it accurately identifies causality among variables, some of the Confidence Intervals (CIs) have considerable width, which questions its reliability (see
Figure 5a). By constraining the interval of phase randomization in Step 2, the CI width may be reduced, thereby enhancing performance; however, adjusting the degree of restriction remains arbitrary. A key distinction between the TSS and uPRB methods is that the former shifts the phases in all frequency bands simultaneously, while the latter does so for each frequency band. As the uPRB method can selectively disturb the phases corresponding to a specific frequency band, it may prove valuable for evaluating causality in the spectral domain.
For both cGC and IRF analyses, the ARSB method outperforms the other bootstrap methods, regarding detection performance. For this method, determining the VAR model order
p is essential, which can be accomplished using the Akaike Information Criterion (AIC). As outlined in
Section 3.4, there are two derivations: correlated ARSB (cARSB), which randomly shuffles the residuals across all variables synchronously, and uncorrelated ARSB (ucARSB), which shuffles the residuals independently for each variable. Our simulation study demonstrates that both derivatives yield nearly identical results. However, observational errors (artifacts) may be concurrently superimposed onto the data when analyzing actual time series, rendering ucARSB potentially more effective for canceling noise correlation among variables.
Finally, we emphasize that beyond Granger causality, as discussed in this paper, other approaches to causality estimation have been proposed; as an example, we mention Convergent Cross Mapping, which has been shown to be effective in deterministic non-linear dynamical systems [
20]. Furthermore, Transfer Entropy has been defined as an extension of mutual information, representing the causality from one random variable to another [
21]. Moreover, Zenil et al. [
22] have developed a very general framework for model discovery, which may overcome the inherent limitations of statistical methods based on predictive models and entropy-like measures; their method is based on algorithmic probability, i.e., on decomposing observations into the most likely algorithmic generative models. Future research should investigate whether this framework can compensate for or replace the incompleteness of Granger causality.