Next Article in Journal
A Novel Numerical Approach for a Nonlinear Fractional Dynamical Model of Interpersonal and Romantic Relationships
Next Article in Special Issue
Quantal Response Statistical Equilibrium in Economic Interactions: Theory and Estimation
Previous Article in Journal
Objective Weights Based on Ordered Fuzzy Numbers for Fuzzy Multiple Criteria Decision-Making Methods
Previous Article in Special Issue
An Entropic Approach for Pair Trading
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Transfer Entropy for Nonparametric Granger Causality Detection: An Evaluation of Different Resampling Methods

1
CeNDEF, Amsterdam School of Economics, University of Amsterdam, 1018 WB Amsterdam, The Netherlands
2
Tinbergen Institute, 1082 MS Amsterdam, The Netherlands
*
Author to whom correspondence should be addressed.
Current address: Faculty of Economics and Business, University of Amsterdam, Roetersstraat 11, 1018 WB Amsterdam, The Netherlands.
Entropy 2017, 19(7), 372; https://doi.org/10.3390/e19070372
Submission received: 16 May 2017 / Revised: 14 July 2017 / Accepted: 17 July 2017 / Published: 21 July 2017
(This article belongs to the Special Issue Entropic Applications in Economics and Finance)

Abstract

:
The information-theoretical concept transfer entropy is an ideal measure for detecting conditional independence, or Granger causality in a time series setting. The recent literature indeed witnesses an increased interest in applications of entropy-based tests in this direction. However, those tests are typically based on nonparametric entropy estimates for which the development of formal asymptotic theory turns out to be challenging. In this paper, we provide numerical comparisons for simulation-based tests to gain some insights into the statistical behavior of nonparametric transfer entropy-based tests. In particular, surrogate algorithms and smoothed bootstrap procedures are described and compared. We conclude this paper with a financial application to the detection of spillover effects in the global equity market.

1. Introduction

Entropy, introduced by Shannon [1,2], is an information theoretical concept with several appealing properties, and therefore wide applications in information theory, thermodynamics and time series analysis. Based on this classical measure, transfer entropy (TE) has become a popular information theoretical measure for quantifying the flow of information. This concept, which was coined by Schreiber [3], was applied to distinguish a possible asymmetric information exchange between the variables of a bivariate system. When based on appropriate non-parametric density estimates, the TE is a flexible non-parametric measure for conditional dependence, coupling structure, or Granger causality in a general sense.
The notion of Granger causality was developed by the pioneering work of Granger [4] to capture causal interactions in a linear system. In a more general model-free world, the Granger causal effect can be interpreted as the impact of incorporating the history of another variable on the conditional distribution of a future variable in addition to its own history. Recently, various nonparametric measures have been developed to capture such difference between conditional distributions in a more complex, and typically nonlinear system. There is a growing list of such methods, based on, among others, correlation integrals [5], kernel density estimation [6], Hellinger distances [7], copula functions [8] and empirical likelihood [9].
In contrast with the above-mentioned methods, TE-based causality tests do not attempt to capture the difference between two conditional distributions explicitly. Instead, with the information theoretical interpretation, the TE offers a natural way to measure directional information transfer and Granger causality. We refer to [10,11] for detailed reviews of the relation between Granger causality and directed information theory. However, the direct application of entropy and its variants, though attractive, turns out to be difficult, if not impossible altogether, due to the lack of asymptotic distribution theory for the test statistics. For example, Granger and Lin normalize the entropy to detect serial dependence with critical values obtained from simulations [12]. Hong and White provide the asymptotic distribution for the Granger–Lin statistic with a specific kernel function [13]. Barnett and Bossomaier derive a χ 2 distribution for the TE at the cost of the model-free property [14].
On the other hand, to obviate the asymptotic problem, several resampling methods on TE have been developed for providing empirical distributions of the test statistics. Two popular techniques are bootstrapping and surrogate data. Bootstrapping is a random resampling technique proposed by Efron [15] to estimate the properties of an estimator by measuring those properties from approximating distributions. The “surrogate” approach developed by Theiler et al. [16] is another randomization method initially employing Fourier transforms to provide a benchmark in detecting nonlinearity in a time series setting. It is worth mentioning that the two methods are different with respect to the statistical properties of the resampled data. For the surrogate method the null hypothesis is maintained, while the bootstrap method does not seek to impose the null hypothesis on the bootstrapped samples. We refer to [17,18] for detailed applications of the two methods.
However, not all resampling methods are suitable for entropy-based dependence measures. As Hong and White [13] put it, a standard bootstrap fails to deliver a consistent entropy-based statistic because it does not preserve the statistical properties of a degenerate U-statistic [13]. Similarly, with respect to traditional surrogates based on phase randomization of the Fourier transform, Hinich et al. [19] criticize the particularly restrictive assumption of linear Gaussian process, and Faes et al. [20] point out that it cannot preserve the whole statistical structure of the original time series.
As far as we are aware, there are several applications of both methods in entropy-based tests, for example, Su and White [7] propose a smoothed local bootstrap for entropy-based test for serial dependence, Papana et al. [21] apply stationary bootstrap in partial TE estimation, Quiroga et al. [22] use time-shifted surrogates to test the significance of the asymmetry of directional measures of coupling, and Marschinski and Kantz [23] introduce the effective TE, which relies on random shuffling surrogate in estimation. Kugiumtzis [24] and Papana et al. [21] provide some comparisons between bootstrap and surrogate methods for entropy-based tests.
In this paper, we adopt TE as a test statistic for measuring conditional independence (Granger non-causality). Being aware of the fact that the analytical null distribution may not always be accurate or available in an analytically closed form, we resort to resampling techniques for constructing the empirical null distribution. The techniques under consideration include smoothed local bootstrap, stationary bootstrap and time-shifted surrogates, all of which are shown in literature to be applicable to entropy-based test statistics. Using different dependence structures, the size and power performance of all methods are examined in simulations.
The remainder of this paper is organized as follows. Section 2 first provides the TE-based testing framework and a short introduction to kernel density estimation; then bandwidth selection rules are discussed. After presenting the resampling methods, including the smoothed local bootstrap and time-shifted surrogates for different dependence structure settings. Section 3 examines the empirical performance of different resampling methods, presenting the size and power of the tests. Section 4 considers a financial application with the TE-based nonparametric test and Section 5 summarizes.

2. Methodology

2.1. Transfer Entropy and Its Estimator

Information theory is a branch of applied mathematical theory of probability and statistics. The central problem of classical information theory is to measure transmission of information over a noisy channel. Entropy, also referred to as Shannon entropy, is one key measure in the field of information theory brought by [1,2]. Entropy measures the uncertainty and randomness associated with a random variable. Supposing that S is a random vector with density f S ( s ) , its Shannon entropy is defined as
H ( s ) = f S ( s ) log f S ( s ) d s .
There is a long history of applying information theoretical measures in time series analysis. For example, Robinson [25] applies the Kullback–Leibler information criterion [26] to construct a one-sided test for serial independence. Since then, nonparametric tests using entropy measures for dependence between two time series are becoming prevalent. Granger and Lin [12] normalize the entropy measure to identify the lags in a nonlinear bivariate time series model. Granger et al. [27] study dependence with a transformed metric entropy, which turns out to be a proper measure of distance. Hong and White [13] provide a new entropy-based test for serial dependence, and the test statistic follows a standard normal distribution asymptotically.
Although those heuristic approaches work for entropy-based measures of dependence, these methodologies do not carry over directly to measures of conditional dependence, i.e., Granger causality. The term TE was coined by Schreiber [3], although it appeared in the literature earlier under different names, is a suitable measure to serve this purpose. The TE quantifies the amount of information contained in one series at k steps ahead from the state of another series, given the current and past state of itself. Suppose we have two series { X t } and { Y t } , for brevity put X = { X t } , Y = { Y t } and Z = { Y t + k } , further we define a three-variate vector W t as W t = ( X t , Y t , Z t ) , where Z t = Y t + k ; and W = ( X , Y , Z ) is used when there is no danger of confusion. Within this bivariate setting, W is a three dimensional continuous vector. In this paper, we limit ourselves to k = 1 for simplicity, but the method can be generalized into multiple steps easily. The quantity TE X Y is a nonlinear measure for the amount of information explained in Z (future Y) by X, accounting for the information on Z already contained in Y. Although TE defined in [3] applies to discrete variables, it is easily generalized to continuous variables. Conditional on Y, TE X Y is defined as
TE X Y = E W log f Z , X | Y ( Z , X | Y ) f X | Y ( X | Y ) f Z | Y ( Z | Y ) = f X , Y , Z ( X , Y , Z ) log f Z , X | Y ( Z , X | Y ) f X | Y ( X | Y ) f Z | Y ( Z | Y ) d x d y d z = E W log f X , Y , Z ( X , Y , Z ) f Y ( Y ) log f X , Y ( X , Y ) f Y ( Y ) log f Y , Z ( Y , Z ) f Y ( Y ) = E W log f X , Y , Z ( X , Y , Z ) + log f Y ( Y ) log f X , Y ( X , Y ) log f Y , Z ( Y , Z ) .
Using conditional mutual information I ( Z , X | Y = y ) , the TE can be equivalently formulated in terms of four Shannon entropy terms as
TE X Y = I ( Z , X | Y ) = H ( Z | Y ) H ( Z | X , Y ) = H ( Z , Y ) H ( Y ) H ( Z , X , Y ) + H ( X , Y ) .
In order to construct a test for Granger causality based on the TE, one first needs to show quantitatively that the TE is a proper basis for detecting whether the null hypothesis is satisfied. The following theorem, as a direct application of the Kullback–Leibler criterion, lays the quantitative foundation for testing based on the TE.
Theorem 1.
TE X Y 0 with equality if and only if f Z , X | Y ( Z , X | Y ) = f X | Y ( X | Y ) f Z | Y ( Z | Y ) .
Proof. 
The proof of Theorem 1 is given in [28]. ☐
It is not difficult to verify that the condition for TE X Y = 0 coincides with the null hypothesis of Granger non-causality defined in Equation (4), also referred to as conditional independence or no coupling. Mathematically speaking, the null hypothesis of Granger non-causality, H 0 : { X t } is not a Granger cause of { Y t } , can be phrased as
H 0 : f X , Y , Z ( x , y , z ) f Y ( y ) = f Y , Z ( y , z ) f Y ( y ) f X , Y ( x , y ) f Y ( y ) ,
for ( x , y , z ) in the support of W. A nonparametric test for Granger non-causality seeks to find statistical evidence of violation of Equation (4). There are many nonparametric measures available for this purpose, some of which are mentioned above. Equation (4) provides the basis for a model-free test without imposing any parametric assumptions about the data generating process or underlying distributions for { X t } and { Y t } . We only assume two things here. First, { X t , Y t } is a strictly stationary bivariate process. Second, the process has finite memory, i.e., variable lags l X , l Y . The second (finite Markov order) assumption is needed in this nonparametric setting to make conditioning on past information feasible by conditioning on a finite number of past observations. Moreover, strict stationarity and the mixing properties implied by the finite Markov order assumption ensure that the transfer entropy can be estimated consistently through kernel density estimation of the underlying densities.
As far as we are aware, the direct use of TE to test Granger non-causality in nonparametric setting is difficult, if not impossible at all, due to the lack of asymptotic theory for the test statistic. As Granger and Lin [12] put it, very few asymptotic distribution results for entropy-based estimators are available. Although over the years several break-throughs have been made with application of entropy to testing serial independence, the limiting distribution of TE statistic is still unknown. One may wish to use simulation techniques to overcome the lack of asymptotic distributions. However, as noted by Su and White [7], there are estimation biases of the TE statistics for non-parametric dependence measures under the smoothed bootstrap procedure. Even for the parametric test statistic used by Barnett and Bossomaier [14], the authors noticed that the TE-based estimator is generally biased.

2.2. Density Estimation and Bandwidth Selection

The non-negativity property in Theorem 1 makes TE X Y a desirable measure for constructing a one-sided test of conditional independence; any positive divergence from zero is a sign of conditional dependence of Y on X. To estimate TE there are several different approaches, such as histogram-based estimators [29], correlation sums [30] and nearest neighbor estimators [31]. However, the optimal rule for the number of neighbor points is unclear, and as Kraskov et al. [31] comment, a small value of neighbor points may lead to large statistical errors. A more natural method, kernel density estimators, the properties of which have been well studied, is applied in this paper. With the plug-in kernel estimates of densities, we may replace the expectation in Equation (2) by a sample average to get an estimate for TE X Y .
A local density estimator of a d W -variate random vector W at W i is given by
f ^ W ( W i ) = ( ( n 1 ) h ) d W j , j i n K W i W j h ,
where K is a kernel function and h is the bandwidth. We take K ( . ) to be a product kernel function defined as K ( W ) = s = 1 d W κ ( w s ) , where w s is s t h element in W. Using a standard univariate Gaussian kernel, κ ( w s ) = ( 2 π ) 1 / 2 e 1 2 ( w s ) 2 , K ( . ) is the standard multivariate Gaussian kernel as described by Wand and Jones [32] and Silverman [33]. Using Equation (5) as the plug-in density estimator, and replacing the expectation by the sample mean, we obtain the estimator for the TE given by
I ^ ( Z , X | Y ) = 1 n i = 1 n log f ^ ( x i , y i , z i ) + log f ^ ( y i ) log f ^ ( x i , y i ) log f ^ ( y i , z i ) .
If we estimate the Shannon entropy in Equation (1) based on a sample of size n from the d W -dimensional random vector W, by the sample average of the plug-in density estimates, we obtain
H ^ ( W ) = 1 n i = 1 n log ( f ^ W ( W i ) ) ,
then Equation (6a) can be equivalently expressed in terms of four entropy estimators, that is,
I ^ ( Z , X | Y ) = H ^ ( x i , y i , z i ) H ^ ( y i ) + H ^ ( x i , y i ) + H ^ ( y i , z i ) .
To construct a statistical test, we develop the asymptotic properties of I ^ ( Z , X | Y ) defined in Equations (6a) and (6c) through two steps. In the first step, given the density estimates the consistency of entropy estimates is achieved and then the linear combination of four entropy estimates would converge in probability to the true value. The following two theorems ensure the consistency of I ^ ( Z , X | Y ) .
Theorem 2.
Given the kernel density estimate f ^ W ( W i ) for f W ( W i ) , where W is a d W -dimensional random vector with length n, let H ^ ( W ) be the plug-in estimate for the Shannon entropy as defined in Equation (6b). Then H ^ ( W ) P H ( W ) .
Proof. 
The proof of Theorem 2 is given in [12] using results from [34]. ☐
The basic idea of the proof is to take the Taylor series expansion of log ( f ^ W ( W i ) ) around the true value log ( f W ( W i ) ) and use the fact that f ^ W ( W i ) , given an appropriate bandwidth sequence, converges to f W ( W i ) pointwise to obtain consistency. In the next step, the consistency of I ^ ( Z , X | Y ) is provided by the continuous mapping theorem.
Theorem 3.
Given H ^ ( W ) P H ( W ) , with I ^ ( Z , X | Y ) defined as in Equation (6c), I ^ ( Z , X | Y ) P I ( Z , X | Y ) .
Proof. 
The proof is straightforward if one applies the Continuous Mapping Theorem. See Theorem 2.3 in [35]. ☐
Before we move to the next section, it is worth having a careful look at the issue of bandwidth selection. The bandwidth h for kernel estimation determines how smooth the density estimation is; a smaller bandwidth reveals more structure of the data, whereas a larger bandwidth delivers a smoother density estimate. Bandwidth selection is essentially a trade-off between the bias and variance in density estimation. A very small value of h could eliminate estimation bias, with a large variance. On the other hand, a large bandwidth reduces estimation variance at the expense of incorporating more bias. See Chapter 3 in [32] and [36] for details.
However, the bandwidth selection for the TE statistic is more involved. To the best of our knowledge, there is a blank in the field of optimal bandwidth selection in kernel-based TE estimator. As He et al. [37] show, when estimating the entropy estimator, two types of errors would be generated, one is from entropy estimation and the other from density estimation, and the optimal bandwidth for density estimation may not coincide with the optimal one for entropy estimation. Thus, rather than the rule-of-thumb bandwidth in [33], which aims at optimal density estimation, the bandwidth in our study should provide an accurate estimator for I ( Z , X | Y ) in the minimal mean squared error (MSE) sense, say. In [28] we develop such a bandwidth rule for a TE-based estimator. In that paper, rather than directly developing asymptotic properties for TE, we study the properties of the first order Taylor expansion of TE under the null hypothesis. The suggested bandwidth is shown to be MSE optimal and allows us to obtain asymptotic normality for the test statistic. In principal, the convergence rate of the TE estimator should be the same as the leading term of its Taylor approximation. We therefore propose to use the same rate also here, giving
h = C n 2 / 7 ,
where C is an unknown parameter. This bandwidth would deliver a consistent test since the variance of local estimate of I ^ ( Z i , X i | Y i ) will dominate the MSE. In [28] we suggest to use C = 4.8 based on simulations, while Diks and Panchenko suggest to set C 8 for autoregressive conditional heteroskedasticity (ARCH) processes [6]. Our simulations here also may possibly prefer a larger value of C because the squared bias is of higher order and hence less concern for the TE-based statistic. A larger bandwidth could better control the estimation variance and deliver a more powerful test. As a robustness check, we adopt C = 8 as well as C = 4.8 suggested by our other simulation study [28]. To match the Gaussian kernel, we standardize the data before estimate Equation (6a)–(6c) such that the transformed time series have mean zero zero and unit variance; very similar results are obtained by matching the mean absolute deviation instead of the variance of the standard Gaussian kernel for TE estimation.

2.3. Resampling Methods

To develop simulation-based tests for the null hypothesis, given in Equation (4), of no Granger causality from X to Y, or equivalently, for conditional independence, we consider three resampling techniques, i.e., (1) time shifted surrogates developed by Quiroga et al. [22], (2) the smoothed bootstrap of Su and White [7] and (3) the stationary bootstrap introduced by Politis and Romano [38]. The first technique is widely applied in coupling measures, as for example by Kugiumtzis [39] and Papana et al. [40], while the latter two have already been used for detecting conditional independence for decades. It worth mentioning that the surrogates and bootstrap methods treat the null quite differently. Surrogate data are supposed to preserve the dependence structure imposed by H 0 while bootstrap data are not restricted to H 0 . It is possible to bootstrap the dataset without imposing the conditional independence structure of {X, Y, Z} implied by the null hypothesis; see, for instance, [41] for more details. To avoid resampling errors and to make different methods more comparable, we limit ourselves to methods that impose the null hypothesis on the resampled data. The following three different resampling methods are implemented with different sampling details.
  • Time-Shifted Surrogates
    • (TS.a) The first resampling method only deals with the driving variable X. Suppose we have observations { x 1 , , x n }, the time-shifted surrogates are generated by cyclically time-shifting the components of the time series. Specifically, an integer d is randomly generated within the interval ( [ 0.05 n ] , [ 0.95 n ] ) , and then the first d values of { x 1 , , x n } would be moved to the end of the series, to deliver the surrogate sample X * = { x d + 1 , , x n , x 1 , , x d } . Compared with the traditional surrogates based on phase randomization of the Fourier transform, the time-shifted surrogates can preserve the whole statistical structure in X. The couplings between X and Y are destroyed, although the null hypothesis of X not causing Y is imposed.
    • (TS.b) The second scheme resamples both the driving variable X and the response variable Y separately. Similar to (TS.a), Y * = { y c + 1 , , y n , y 1 , , y c } is created given another random integer c from the range ( [ 0.05 n ] , [ 0.95 n ] ) . In contrast with the standard time-shifted surrogates described in (TS.a), in this setting we add more noise to the coupling between X and Y.
  • Smoothed Local Bootstrap
    The smoothed bootstrap selects samples from a smoothed distribution instead of drawing observations from the empirical distribution directly. See [42] for a discussion of the smoothed bootstrap procedure. Based on rather mild assumptions, Neumann and Paparoditis [43] show that there is no need to reproduce the whole dependence structure of the stochastic process to get an asymptotically correct nonparametric dependence estimator. Hence a smoothed bootstrap from the estimated conditional density is able to deliver a consistent statistic. Specifically, we consider two versions of the smoothed bootstrap that are different in dependence structure to some extent.
    • (SMB.a) In the first setting, Y * is resampled without replacement from the smoothed local bootstrap. Given the sample Y = { y 1 , y n } , the bootstrap sample is generated by adding a smoothing noise term ε i Y such that y ˜ i * = y i * + h b ε i Y , where h b > 0 is the bandwidth used in bootstrap procedure, ε i Y represents a sequence of i.i.d. N ( 0 , 1 ) random variables. Without random replacement from the original time series, this procedure does not disturb the original dynamics of Y = { y 1 , y n } at all. After Y * is resampled, both X * and Z * are drawn from the smoothed conditional densities f ( x | Y * ) and f ( z | Y * ) as described in [44].
    • (SMB.b) Secondly, we implement the smoothed local bootstrap as in [7]. The only difference between this setting and (SMB.a) is that the bootstrap sample Y * is drawn with replacement from the smoothed kernel density.
  • Stationary Bootstrap
    Politis and Romano [38] propose the stationary bootstrap to maintain serial dependence within the bootstrap time series. This method replicates the time dependence of original data by resampling blocks of the data with randomly varying block length. The lengths of the bootstrap blocks follows a geometric distribution. Given a fixed probability p, the length L i of block i is decided as P ( L i = k ) = ( 1 p ) k 1 p for k = 1 , 2 , , and the starting points of block i are randomly and uniformly drawn from the original n observations. To restore the dependence structure exactly under the null, we combine the stationary bootstrap with the smoothed local bootstrap for our simulations.
    • (STB) In short, firstly y 1 * is picked randomly from the original n observations of Y = { y 1 , y n } , denoted as y 1 * = y s where s [ 1 , n ] . With probability p, y 2 * is picked at random from the data set; and with probability 1 p , y 2 * = y s + 1 , so that y 2 * would be the next observation to y s in original series Y = { y 1 , y n } . Proceeding in this way, { y 1 * , , y n * } can be generated. If y i * = y s and s = n , the “circular boundary condition” would kick in, so that y i + 1 * = y 1 . After Y * = { y 1 * , , y n * } is generated, both X * and Z * are randomly drawn from the smoothed conditional densities f ( x | Y * ) and f ( z | Y * ) as in (SMB.b).
The resampling procedure works as the follows: once the TE statistic I ^ for the original data W = { ( X i , Y i , Z i ) , i = 1 , , n } is estimated according to Equation (6a)–(6c), we start to generate the resampled data set, which is denoted by W j * with j = 1 , , B , where B is the number of simulations. Using the simulated sample, for each j we compute the TE statistic I ^ j * , in exactly the same way as I ^ was computed. The p-value for the one-sided test is calculated as
p ^ = 1 B + 1 1 + j = 1 B 1 ( I ^ j * I ^ ) ,
where the constant 1 is added to avoid p-values equal to zero.
A final remark concerns the difference between this paper and [21]; there both time-shifted surrogate and the stationary bootstrap are implemented for an entropy-based causality test. However, our paper provides additional insights into several aspects. Firstly, the smoothed bootstrap, being shown in the literature to work for nonparametric kernel estimators under general dependence structure, is applied in our paper. Secondly, they treat the bootstrap and surrogate sample in a similar way, but as we noted above, the bootstrap method is not designed to impose the null hypothesis but designed to keep the dependence structure present in the original data. The stationary bootstrap procedure in [21] might be incompatible with the null hypothesis of conditional independence since it destroys the dependence completely. Because they restore independence between X and Y rather than conditional independence between X | Y and Z | Y during resampling, the distribution of the estimated statistics from the resampled data may not necessary correspond to that of the statistic under the null of only conditional independence. Thirdly, we provide rigorous size and power result in our simulations, which is missing in their paper.

3. Simulation Study

In this section, we investigate the performance of the five resampling methods in detecting conditional dependence for several data generating processes. In Equations (9)–(16), we use a single parameter a to control the strength of the conditional dependence. The size assessment is obtained based on testing Granger non-causality from { X t } to { Y t } , and for the power we use the same process but we test for Granger non-causality from { Y t } to { X t } . We set a = 0.4 to represent moderate dependence in the size performance investigation and a = 0.1 to evaluate the power of the tests. Further, Equation (17) represents a stationary autoregressive process with regime switching, and Equation (18) is included to investigate the power performance in the presence of two-way causal linkages, where the two control parameters are b = 0.2 and c = 0.1 .
In each experiment, we run 500 simulations for sample sizes n = { 200 , 500 , 1000 , 2000 } . The surrogate and the bootstrap sample size is set to B = 999 . For fair comparisons between (TS.a) and (TS.b), as well as between (SMB.a) and (SMB.b), we fix the seeds of the random number generator in the resampling functions to eliminate the potential effect of randomness. Besides, we use the empirical standard deviation of { Y t } as the bootstrapping bandwidth and C = { 4.8 , 8 } in the bandwidth equation Equation (7) for the kernel density estimation.
The processes under consideration include a linear vector autoregressive process (VAR) in Equation (9), a nonlinear VAR process in Equation (10), a bivariate ARCH process in Equation (11), a bilinear process in Equation (12), a bivariate AR(2)-GARCH process in Equation (13) where “GARCH” stands for “generalized ARCH”, a bivariate autoregressive-moving average (ARMA)-GARCH process in Equation (14), a bivariate AR(1)-EGARCH process in Equation (15) where “EGARCH” represents “exponential GARCH”, a vector error correction (VECM) process in Equation (16), a threshold AR(1) process in Equation (17), and a two-way VAR process in Equation (18).
It is worth mentioning that the data generating processes in Equations (9)–(12), (17) and (18) are stationary and of finite memory as we assumed earlier. However, it is also important to be aware of the behavior of the proposed non-parametric test for robustness consideration when the two assumptions are not satisfied. The finite memory assumption is violated in Equations (13)–(15) since the GARCH process, being equivalent to an infinite ARCH process, strictly speaking is of infinite Markov order; and the stationarity assumption does not hold in Equation (16) where X t and Y t are cointegrated of order one. Since for the VECM process the two time series { X t } and { Y t } are not stationary, we can not directly apply our nonparametric test. In this case, we perform the Engle–Granger approach [45] first to eliminate the influence of the co-integration, and then perform the nonparametric test on the collected stationary residuals from the linear regression of Δ X t and Δ Y t on a constant and the co-integration term. The procedure is similar to that in [46].
  • Linear vector autoregressive process (VAR).
    X t = a Y t 1 + ε x , t , ε x , t N ( 0 , 1 ) Y t = a Y t 1 + ε y , t , ε y , t N ( 0 , 1 ) .
  • Nonlinear VAR. This process is considered in [47] to show the failure of linear Granger causality test.
    X t = a X t 1 Y t 1 + ε x , t , ε x , t N ( 0 , 1 ) Y t = 0.6 Y t 1 + ε y , t , ε y , t N ( 0 , 1 ) .
  • Bivariate ARCH process.
    X t N ( 0 , 1 + a Y t 1 2 ) Y t N ( 0 , 1 + a Y t 1 2 ) .
  • Bilinear process considered in [48].
    X t = 0.3 X t 1 + a Y t 1 ε y , t 1 + ε x , t , ε x , t N ( 0 , 1 ) Y t = 0.4 Y t 1 + ε y , t , ε y , t N ( 0 , 1 ) .
  • Bivariate AR(2)-GARCH process.
    X t = 0.5 X t 1 0.2 X t 2 + ε x , t , ε x , t = h x , t υ x , t , υ x , t t ( 5 ) , h x , t = 0.9 + a ε y , t 1 2 + 0.2 h x , t 1 ; Y t = 0.3 Y t 1 + 0.2 Y t 2 + ε y , t , ε y , t = h y , t υ y , t , υ y , t t ( 5 ) , h y , t = 0.3 + 0.1 ε y , t 1 2 + 0.2 h y , t 1 .
  • Bivariate ARMA-GARCH process.
    X t = 0.3 X t 1 + 0.3 ε x , t 1 + ε x , t , ε x , t = h x , t υ x , t , υ x , t t ( 5 ) , h x , t = 0.5 + a ε y , t 1 2 + 0.3 h x , t 1 ; Y t = 0.4 Y t 1 0.2 ε y , t 1 + ε y , t , ε y , t = h y , t υ y , t , υ y , t t ( 5 ) , h y , t = 0.8 + 0.05 ε y , t 1 2 + 0.4 h y , t 1 .
  • Bivariate AR(1)-EGARCH process.
    X t = 0.5 X t 1 + ε x , t , ε x , t = h x , t υ x , t , υ x , t t ( 5 ) , log ( h x , t ) = 0.5 + a | ε y , t 1 2 h x , t 1 | + 0.2 ε y , t 1 2 h x , t 1 + 0.9 log ( h x , t 1 ) ; Y t = 0.6 Y t 1 + ε y , t , ε y , t = h y , t υ y , t , υ y , t t ( 5 ) , log ( h y , t ) = 0.6 + 0.05 | ε y , t 1 2 h x , t 1 | + 0.02 ε y , t 1 2 h x , t 1 + 0.8 log ( h x , t 1 ) .
  • VECM process. Note that in this situation both { X t } and { Y t } are not stationary.
    X t = 1.2 + 0.6 Y t 1 + ε x , t ε x , t = 1 a 2 υ x , t + a ε Y , t 1 , υ x , t N ( 0 , 1 ) Y t = Y t 1 + ε y , t , ε y , t N ( 0 , 1 ) .
  • Threshold AR(1) process.
    X t = 0.2 X t 1 + ε x , t ε x , t N ( 0 , 1 ) , if Y t 1 < 0 0.9 X t 1 + ε x , t ε x , t N ( 0 , 1 ) , if Y t 1 0 Y t = ε y , t , ε y , t N ( 0 , 1 ) .
  • Two-way VAR process.
    X t = 0.7 X t 1 + b Y t 1 + ε x , t , ε x , t N ( 0 , 1 ) Y t = c X t 1 + 0.5 Y t 1 + ε y , t , ε y , t N ( 0 , 1 ) .
The empirical rejection rates are summarized in Table 1, Table 2, Table 3, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9 and Table 10. The top panels in each table summarize the empirical rejection rates obtained for the 5% and 10% nominal significance levels for processes (9)–(18) under the null hypothesis, and the bottom panels report the corresponding empirical power under the alternatives. Generally speaking, the size and power are quite satisfactory for almost all combinations of the constant C, sample size n and nominal significance level. The performance differences for the various resampling schemes are not substantial.
With respect to the size performance, most of the time we see that the realized rejection rates stay in line with the nominal size. Besides, the bootstrap methods outperform the time-shifted surrogate methods in that their empirical size is slightly closer to the nominal size. Lastly, the size of the tests is not very sensitive to the choice of the constant C apart from the cases for the models given in Equations (13)–(15), where the data generating process has infinite memory.
From the point of view of power, (TS.a) and (SMB.a) seem to outperform their counterparts, yet, the differences are subtle. Along the dimension of the sample size, clearly we see that the empirical power increases in the sample size in most cases. Furthermore, the results are very robust with respect to choices for the constant C in the kernel density estimation bandwidth. For the VAR and nonlinear processes given by Equations (9), (10) and (12) a smaller C seems to give more powerful tests while a larger C is more beneficial for detecting conditional dependence structure in the (G)ARCH processes of Equations 11 and (13)–(15).
Finally, Table 10 presents the empirical power for the two-way VAR process, where the two variables { X } and { Y } are inter-tangled with each other. Due to the setting b = 0.2 and c = 0.1 in Equation (18), it is obvious that { Y } is a stronger Granger cause for { X } than the other way around. As a consequence, the reported rejection rates in Table 10 are overall higher when testing Y X than X Y .
To visualize the simulation results, Figure 1, Figure 2, Figure 3, Figure 4, Figure 5, Figure 6, Figure 7, Figure 8, Figure 9 and Figure 10 report the empirical size and power against the nominal size. Since the performance of the five difference resampling methods is quite similar, we only show the results for (SMB.a) for simplicity. In each figure, the left (right) panels show the realized size (power), and we choose C = 4.8 ( C = 8 ) for the top (bottom) two panels. We can see from the figures that the empirical performance of the TE test are overall satisfactory, apart for those (G)ARCH processes where a small C may lead to conservative testing size for large sample sizes (see Figure 3a, Figure 5a, Figure 6a and Figure 7a). The under-rejection problem is caused by the inappropriate choice C = 4.8 , which makes the bandwidth for kernel estimation too small. The influence of an inappropriately small bandwidth can also be seen in Figure 5b and Figure 6b, where the test has limited power for the alternative.

4. Application

In this section, we apply the TE-based nonparametric test on detecting financial market interdependence, in terms of both return and volatility. Diebold and Yilmaz [49] performed a variance decomposition of the covariance matrix of the error terms from a reduced-form VAR model to investigate the spillover effect in the global equity market. More recently, Gamba-Santamaria et al. [50] extended the framework and considered the time-varying feature in global volatility spillovers. Their research, although providing simple and intuitive methods for measuring directional linkages between global stock markets, may suffer from the limitation of the linear parametric modeling, as discussed above. We revisit the topic of spillovers in the global equity market by the nonparametric method.
For our analysis, we use daily nominal stock market indexes from January 1992 to March 2017, obtained from Datastream, for six developed countries including the US (DJIA), Japan (Nikkei 225), Hong Kong (Hangseng), the UK (FTSE 100), Germany (DAX 30) and France (CAC 40). The target series are weekly return and volatility for each index. The weekly returns are calculated in terms of diffferenced log prices multiplied by 100, from Friday to Friday. Where the price for Friday is not available due to a public holiday, we use the Thursday price instead.
The weekly volatility series are generated following [49] by making use of the weekly high, low, opening and closing prices, obtained from the underlying daily high, low, opening and closing data. The volatility σ t 2 for week t is estimated as
σ t ^ 2 = 0.511 ( H t L t ) 2 0.019 [ ( C t O t ) ( H t + L t 2 O t ) 2 ( H t O t ) ( L t O t ) ] 0.383 ( C t O t ) 2 ,
where H t is the Monday-Friday high, L t is the Monday–Friday low, O t is the Monday-Friday open and C t is the Monday-Friday close (in natural logarithms multiplied by 100). Futher, after deleting the volatility estimates for the New Year week in 2002, 2008 and 2013 due to the lack of observations for Nikkei 225 index, we have 1313 observations in total for weekly returns volatilities. The descriptive statistics, Ljung Box (LB) test statistics and Augmented Dickey Fuller (ADF) test statistics for both series are summarized in Table 11. From the ADF test results, it is clear that all time series are stationary for further analysis (we also performed the Johansen cointegration test pair-wisely on the price levels and no cointegration was found for the six market indexes.).
We firstly provide a full-sample analysis of global stock market return and volatility spillovers over the period from January 1992 to March 2017, summarized in Table 12 and Table 13. The two tables report the pairwise test statistics for conditional independence between index X and index Y, given the constant C in the bandwidth for kernel estimation is 4.8 or 8 and 999 resampling time series. In other words, we test for the absence of the one-week-ahead directional linkage from index X to Y by using the five resampling methods described in Section 2. For example, the first line in the top panel in Table 12 reports the one-week-ahead influence of DJIA returns upon other indexes by using the first time-shifted surrogates method (TS.a). Given C = 8 , DJIA return is shown to be a strong Granger cause for Nikkei, FTSE and CAC at the 1% level, and for DAX at the 5% level.
Based on Table 12 and Table 13, we may draw several conclusions. Firstly, the US index and German index are the most important return transmitters and Hong Kong is the largest source for volatility spillover, judged by the numbers of significant linkages. Note that this finding is similar as the result in [49], where the total return (volatility) spillovers from US (Hong Kong) to others are found to be much higher than from any other country. Figure 11 provides a graphical illustration of the global spillover network based on the result of (SMB.a) from Table 12 and Table 13. Apart from the main transmitters, we can clearly see that Nikkei and CAC are the main receivers in the global return spillover network, while DAX is the main receiver of global volatility transmission.
Secondly, the result obtained is very robust, no matter which re-sampling method is applied. Although the differences between the five resampling methods are small, (TS.a) is seen to be slightly more powerful than (TS.b) in Table 12. The three different bootstrap methods are very consistent almost all the time, similarly to what we observed in Section 3.
However, the summary results in Table 12 and Table 13 are static in the sense that they do not take into account possible time-variation. The statistics are measurements for averaged-out directional linkages over the whole period from 1992 to 2017. The conditional dependence structure of the time series, at any point in time, can be very different. Hence, the full-sample analysis is very likely to oversee the cyclical dynamics between each pair of stock indices. To investigate the dynamics in the global stock market, we now move from the full-sample analysis to a rolling-window study. Considering a 200-week rolling window starting from the beginning of the sample and admitting a 5-week forward step for the iterative evaluation of the conditional dependence, we can assess the variation of the spillover in the global equity market over time.
Taking the return series of the DJIA as an illustration, we iteratively exploit the local smoothed bootstrap method for detecting Granger causality from and to the DJIA return in Figure 12 (all volatility series are extremely skewed, see Table 11. In a small sample analysis, the test statistics turn out to be sensitive to the clustering outliers, which typically occur during the market turmoil. As a result, the volatility dynamics are more radical and less informative than that of returns). The red line represents the p-values for the TE-based test on DJIA weekly return as the information transmitter while the blue line shows the p-values associated with testing on DJIA as a receiver of information spillover on a weekly basis from others. The plots displays an event-dependent pattern, particularly for the recent financial crisis; from early 2009 until the end of 2012, all pairwise tests show the presence of a strong bi-directional linkage. Besides, the DJIA is strongly leading the Nikkei, Hangseng and CAC during the first decade of this century.
Further, we see that the influence from other indices to the DJIA are different, typically responding to economics events. For example, the blue line in the second panel of Figure 12 plunges below the 5% level twice before the recent financial crisis, meaning that the Hong Kong Hangseng index causes fluctuations in the DJIA during those two periods; first in the late 90’s and again by the end of 2004. The timing of the first fall matches the 1997 Asian currency crisis and the latter one was in fact caused by China’s austerity policy in October 2004.
Finally, the dynamic plots provide additional insights into the sample period that the full sample analysis may omit. The DAX and CAC are found to be less relevant for future fluctuations in the weekly return of the DJIA, according to Table 12 and Figure 11. However, one can clearly see that since 2001, the p-values for DAX→DJIA and CAC→DJIA are consistently below 5% for most of the time, suggesting an increase of the integration of global financial markets.

5. Conclusions

This paper provides guidelines for the practical application of TE in detecting conditional dependence, i.e., Granger causality in a more general sense, between two time series. Although there already is a tremendous literature that tried to apply the TE in this context, the asymptotics of the statistic and the performance of the resampling-based measures are still not understood well. We have considered tests based on five different resampling methods, all of which were shown in the literature to be suitable for entropy-related tests, and investigated the size and power of the associated tests numerically. Two time-shifted surrogates and three smoothed bootstrap methods are tested on simulated data from several processes. The simulation results in this controlled environment suggest that all five measures achieve reasonable rejection rates under the null as well as the alternative hypotheses. Our results are very robust with respect to the density estimation method, including the procedure used for standardizing the location and scale of the data and the choice of the bandwidth parameter, as long as the convergence rate of the kernel estimator of TE is consistent with its first order Taylor expansion.
In the empirical application, we have shown how the proposed resampling techniques can be used on real world data for detecting conditional dependence in the data set. We use global equity data to carry out the detection in pairwise causalities in the return and volatility series among the world leading stock indexes. Our work can be viewed as a nonparametric extension of the spillover measures considered by Diebold and Yilmaz [49]. In accordance with them, we found evidence that the DJIA and the DAX are the most important return transmitters and Hong Kong is the largest source for volatility spillover. Furthermore, the rolling window-based test for Granger causality in pairwise return series demonstrated that the causal linkages in the global equity market are time-varying rather than static. The overall dependence is more tight during the most recent financial crisis, and the fluctuations of the p-values are shown to be event dependent.
As for future work, there are several directions for potential extensions. On the theoretical side, it would be practically meaningful to consider causal linkage detection beyond the single period lag and to deal with the infinite order issue in a nonparametric setting. Further nonparametric techniques need to be developed to play a similar role as the information criterion does for order selection of an estimation model in the parametric world. On the empirical side, it will be interesting to further exploit entropy-based statistics in testing conditional dependence when there exists a so-called common factor, i.e., looking at multivariate systems with more than two variables. One potential candidate for this type of test in the partial TE has been coined by Vakorin et al. [51], but its statistical properties still need to be thoroughly studied yet.

Acknowledgments

We thank the editor and two anonymous referees for their help and constructive comments. We would also like to thank seminar participants at the University of Amsterdam and the Tinbergen Institute. The authors also wish to extend particular gratitude to Simon Broda and Chen Zhou for their comments on an earlier version of this paper. We also acknowledge SURFsara for providing the computational environment of the LISA cluster.

Author Contributions

Both authors contributed equally to this article.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423; 623–656. [Google Scholar] [CrossRef]
  2. Shannon, C.E. Prediction and entropy of printed English. Bell Syst. Tech. J. 1951, 30, 50–64. [Google Scholar] [CrossRef]
  3. Schreiber, T. Measuring information transfer. Phys. Rev. Lett. 2000, 85, 461–464. [Google Scholar] [CrossRef] [PubMed]
  4. Granger, C.W. Investigating causal relations by econometric models and cross-spectral methods. Econometrica 1969, 37, 424–438. [Google Scholar] [CrossRef]
  5. Hiemstra, C.; Jones, J.D. Testing for linear and nonlinear Granger causality in the stock price-volume relation. J. Financ. 1994, 49, 1639–1664. [Google Scholar] [CrossRef]
  6. Diks, C.; Panchenko, V. A new statistic and practical guidelines for nonparametric Granger causality testing. J. Econ. Dyn. Control 2006, 30, 1647–1669. [Google Scholar] [CrossRef]
  7. Su, L.; White, H. A nonparametric Hellinger metric test for conditional independence. Econom. Theory 2008, 24, 829–864. [Google Scholar] [CrossRef]
  8. Bouezmarni, T.; Rombouts, J.V.; Taamouti, A. Nonparametric copula-based test for conditional independence with applications to Granger causality. J. Bus. Econ. Stat. 2012, 30, 275–287. [Google Scholar] [CrossRef]
  9. Su, L.; White, H. Testing conditional independence via empirical likelihood. J. Econom. 2014, 182, 27–44. [Google Scholar] [CrossRef]
  10. Hlaváčková-Schindler, K.; Paluš, M.; Vejmelka, M.; Bhattacharya, J. Causality detection based on information-theoretic approaches in time series analysis. Phys. Rep. 2007, 441, 1–46. [Google Scholar] [CrossRef]
  11. Amblard, P.O.; Michel, O.J. The relation between Granger causality and directed information theory: A review. Entropy 2012, 15, 113–143. [Google Scholar] [CrossRef]
  12. Granger, C.; Lin, J.L. Using the mutual information coefficient to identify lags in nonlinear models. J. Time Ser. Anal. 1994, 15, 371–384. [Google Scholar] [CrossRef]
  13. Hong, Y.; White, H. Asymptotic distribution theory for nonparametric entropy measures of serial dependence. Econometrica 2005, 73, 837–901. [Google Scholar] [CrossRef]
  14. Barnett, L.; Bossomaier, T. Transfer entropy as a log-likelihood ratio. Phys. Rev. Lett. 2012, 109, 138105. [Google Scholar] [CrossRef] [PubMed]
  15. Efron, B. Computers and the theory of statistics: Thinking the unthinkable. SIAM Rev. 1979, 21, 460–480. [Google Scholar] [CrossRef]
  16. Theiler, J.; Eubank, S.; Longtin, A.; Galdrikian, B.; Farmer, J.D. Testing for nonlinearity in time series: The method of surrogate data. Phys. D Nonlinear Phenom. 1992, 58, 77–94. [Google Scholar] [CrossRef]
  17. Schreiber, T.; Schmitz, A. Surrogate time series. Phy. D Nonlinear Phenom. 2000, 142, 346–382. [Google Scholar] [CrossRef]
  18. Horowitz, J.L. The Bootstrap. Handb. Econ. 2001, 5, 3159–3228. [Google Scholar]
  19. Hinich, M.J.; Mendes, E.M.; Stone, L. Detecting nonlinearity in time series: Surrogate and bootstrap approaches. Stud. Nonlinear Dyn. Econom. 2005, 9. [Google Scholar] [CrossRef]
  20. Faes, L.; Porta, A.; Nollo, G. Mutual nonlinear prediction as a tool to evaluate coupling strength and directionality in bivariate time series: Comparison among different strategies based on k nearest neighbors. Phys. Rev. E 2008, 78, 026201. [Google Scholar] [CrossRef] [PubMed]
  21. Papana, A.; Kyrtsou, K.; Kugiumtzis, D.; Diks, C. Assessment of resampling methods for causality testing: A note on the US inflation behavior. PLoS ONE 2017, 12, e0180852. [Google Scholar] [CrossRef] [PubMed]
  22. Quiroga, R.Q.; Kraskov, A.; Kreuz, T.; Grassberger, P. Performance of different synchronization measures in real data: A case study on electroencephalographic signals. Phys. Rev. E 2002, 65, 041903. [Google Scholar] [CrossRef] [PubMed]
  23. Marschinski, R.; Kantz, H. Analysing the information flow between financial time series. Eur. Phys. J. B Condens. Matter Complex Syst. 2002, 30, 275–281. [Google Scholar] [CrossRef]
  24. Kugiumtzis, D. Evaluation of surrogate and bootstrap tests for nonlinearity in time series. Stud. Nonlinear Dyn. Econom. 2008. [Google Scholar] [CrossRef]
  25. Robinson, P.M. Consistent nonparametric entropy-based testing. Rev. Econ. Stud. 1991, 58, 437–453. [Google Scholar] [CrossRef]
  26. Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
  27. Granger, C.; Maasoumi, E.; Racine, J. A dependence metric for possibly nonlinear processes. J. Time Ser. Anal. 2004, 25, 649–669. [Google Scholar] [CrossRef]
  28. Diks, C.; Fang, H. Detecting Granger Causality with a Nonparametric Information-Based Statistic; CeNDEF Working Paper No. 17-03; University of Amsterdam: Amsterdam, The Netherlands, 2017. [Google Scholar]
  29. Moddemeijer, R. On estimation of entropy and mutual information of continuous distributions. Signal Process. 1989, 16, 233–248. [Google Scholar] [CrossRef]
  30. Diks, C.; Manzan, S. Tests for serial independence and linearity based on correlation integrals. Stud. Nonlinear Dyn. Econom. 2002, 6, 1–20. [Google Scholar] [CrossRef]
  31. Kraskov, A.; Stögbauer, H.; Grassberger, P. Estimating mutual information. Phys. Rev. E 2004, 69, 066138. [Google Scholar] [CrossRef] [PubMed]
  32. Wand, M.P.; Jones, M.C. Kernel Smoothing; CRC Press: Boca Raton, FL, USA, 1994; Volume 60. [Google Scholar]
  33. Silverman, B.W. Density Estimation for Statistics and Data Analysis; CRC Press: Boca Raton, FL, USA, 1986; Volume 26. [Google Scholar]
  34. Joe, H. Estimation of entropy and other functionals of a multivariate density. Ann. Inst. Stat. Math. 1989, 41, 683–697. [Google Scholar] [CrossRef]
  35. Van der Vaart, A.W. Asymptotic Statistics; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]
  36. Jones, M.C.; Marron, J.S.; Sheather, S.J. A brief survey of bandwidth selection for density estimation. J. Am. Stat. Assoc. 1996, 91, 401–407. [Google Scholar] [CrossRef]
  37. He, Y.L.; Liu, J.N.; Wang, X.Z.; Hu, Y.X. Optimal bandwidth selection for re-substitution entropy estimation. Appl. Math. Comput. 2012, 219, 3425–3460. [Google Scholar] [CrossRef]
  38. Politis, D.N.; Romano, J.P. The stationary bootstrap. J. Am. Stat. Assoc. 1994, 89, 1303–1313. [Google Scholar] [CrossRef]
  39. Kugiumtzis, D. Direct-coupling information measure from nonuniform embedding. Phys. Rev. E 2013, 87, 062918. [Google Scholar] [CrossRef] [PubMed]
  40. Papana, A.; Kyrtsou, C.; Kugiumtzis, D.; Diks, C. Simulation study of direct causality measures in multivariate time series. Entropy 2013, 15, 2635–2661. [Google Scholar] [CrossRef]
  41. Efron, B.; Tibshirani, R.J. An Introduction to the Bootstrap; CRC Press: Boca Raton, FL, USA, 1994. [Google Scholar]
  42. Shao, J.; Tu, D. The Jackknife and Bootstrap; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
  43. Neumann, M.H.; Paparoditis, E. On bootstrapping L2-type statistics in density testing. Stat. Probab. Lett. 2000, 50, 137–147. [Google Scholar] [CrossRef]
  44. Paparoditis, E.; Politis, D.N. The local bootstrap for kernel estimators under general dependence conditions. Ann. Inst. Stat. Math. 2000, 52, 139–159. [Google Scholar] [CrossRef]
  45. Engle, R.; Granger, C. Co-integration and error correction: Representation, estimation, and testing. Econometrica 1987, 2, 251–276. [Google Scholar] [CrossRef]
  46. Bekiros, S.; Diks, C. The nonlinear dynamic relationship of exchange rates: Parametric and nonparametric causality testing. J. Macroecon. 2008, 30, 1641–1650. [Google Scholar] [CrossRef]
  47. Baek, E.G.; Brock, W.A. A General Test for Nonlinear Granger Causality: Bivariate Model; Working Paper; Iowa State University and University of Wisconsin: Madison, EI, USA, 1992. [Google Scholar]
  48. Davidson, J. Establishing conditions for the functional central limit theorem in nonlinear and semiparametric time series processes. J. Econom. 2002, 2, 243–269. [Google Scholar] [CrossRef]
  49. Diebold, F.X.; Yilmaz, K. Measuring financial asset return and volatility spillovers, with application to global equity markets. Econ. J. 2009, 119, 158–171. [Google Scholar] [CrossRef]
  50. Gamba-Santamaria, S.; Gomez-Gonzalez, J.E.; Hurtado-Guarin, J.L.; Melo-Velandia, L.F. Volatility Spillovers Among Global Stock Markets: Measuring Total and Directional Effects; Technical Report No. 983; Banco de la Republica de Colombia: Bogotá, Colombia, 2017. [Google Scholar]
  51. Vakorin, V.A.; Krakovska, O.A.; McIntosh, A.R. Confounding effects of indirect connections on causality estimation. J. Neurosci. Methods 2009, 184, 152–160. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Size-size and size-power plots of Granger non-causality tests, based on 500 replications and smoothed local bootstrap (a). The data generating process (DGP) is the bivariate VAR process in Equation (9), with Y affecting X. The left (right) column shows observed rejection rates under the null (alternative) hypothesis. The sample size varies from n = 200 to n = 2000 .
Figure 1. Size-size and size-power plots of Granger non-causality tests, based on 500 replications and smoothed local bootstrap (a). The data generating process (DGP) is the bivariate VAR process in Equation (9), with Y affecting X. The left (right) column shows observed rejection rates under the null (alternative) hypothesis. The sample size varies from n = 200 to n = 2000 .
Entropy 19 00372 g001
Figure 2. Size-size and size-power plots of Granger non-causality tests, based on 500 replications and smoothed local bootstrap (a). The DGP is the bivariate non-linear VAR process in Equation (10), with Y affecting X. The left (right) column shows observed rejection rates under the null (alternative) hypothesis. The sample size varies from n = 200 to n = 2000 .
Figure 2. Size-size and size-power plots of Granger non-causality tests, based on 500 replications and smoothed local bootstrap (a). The DGP is the bivariate non-linear VAR process in Equation (10), with Y affecting X. The left (right) column shows observed rejection rates under the null (alternative) hypothesis. The sample size varies from n = 200 to n = 2000 .
Entropy 19 00372 g002
Figure 3. Size-size and size-power plots of Granger non-causality tests, based on 500 replications and smoothed local bootstrap (a). The DGP is the bivariate ARCH process in Equation (11), with Y affecting X. The left (right) column shows observed rejection rates under the null (alternative) hypothesis. The sample size varies from n = 200 to n = 2000 .
Figure 3. Size-size and size-power plots of Granger non-causality tests, based on 500 replications and smoothed local bootstrap (a). The DGP is the bivariate ARCH process in Equation (11), with Y affecting X. The left (right) column shows observed rejection rates under the null (alternative) hypothesis. The sample size varies from n = 200 to n = 2000 .
Entropy 19 00372 g003
Figure 4. Size-size and size-power plots of Granger non-causality tests, based on 500 replications and smoothed local bootstrap (a). The DGP is the bilinear process in Equation (12), with Y affecting X. The left (right) column shows observed rejection rates under the null (alternative) hypothesis. The sample size varies from n = 200 to n = 2000 .
Figure 4. Size-size and size-power plots of Granger non-causality tests, based on 500 replications and smoothed local bootstrap (a). The DGP is the bilinear process in Equation (12), with Y affecting X. The left (right) column shows observed rejection rates under the null (alternative) hypothesis. The sample size varies from n = 200 to n = 2000 .
Entropy 19 00372 g004
Figure 5. Size-size and size-power plots of Granger non-causality tests, based on 500 replications and smoothed local bootstrap (a). The DGP is the bivariate AR2-GARCH process in Equation (13), with Y affecting X. The left (right) column shows observed rejection rates under the null (alternative) hypothesis. The sample size varies from n = 200 to n = 2000 .
Figure 5. Size-size and size-power plots of Granger non-causality tests, based on 500 replications and smoothed local bootstrap (a). The DGP is the bivariate AR2-GARCH process in Equation (13), with Y affecting X. The left (right) column shows observed rejection rates under the null (alternative) hypothesis. The sample size varies from n = 200 to n = 2000 .
Entropy 19 00372 g005
Figure 6. Size-size and size-power plots of Granger non-causality tests, based on 500 replications and smoothed local bootstrap (a). The DGP is the bivariate ARMA-GARCH process in Equation (14), with Y affecting X. The left (right) column shows observed rejection rates under the null (alternative) hypothesis. The sample size varies from n = 200 to n = 2000 .
Figure 6. Size-size and size-power plots of Granger non-causality tests, based on 500 replications and smoothed local bootstrap (a). The DGP is the bivariate ARMA-GARCH process in Equation (14), with Y affecting X. The left (right) column shows observed rejection rates under the null (alternative) hypothesis. The sample size varies from n = 200 to n = 2000 .
Entropy 19 00372 g006
Figure 7. Size-size and size-power plots of Granger non-causality tests, based on 500 replications and smoothed local bootstrap (a). The DGP is the bivariate AR1-EGARCH process in Equation (15), with Y affecting X. The left (right) column shows observed rejection rates under the null (alternative) hypothesis. The sample size varies from n = 200 to n = 2000 .
Figure 7. Size-size and size-power plots of Granger non-causality tests, based on 500 replications and smoothed local bootstrap (a). The DGP is the bivariate AR1-EGARCH process in Equation (15), with Y affecting X. The left (right) column shows observed rejection rates under the null (alternative) hypothesis. The sample size varies from n = 200 to n = 2000 .
Entropy 19 00372 g007
Figure 8. Size-size and size-power plots of Granger non-causality tests, based on 500 replications and smoothed local bootstrap (a). The DGP is the VECM process in Equation (16), with Y affecting X. The left (right) column shows observed rejection rates under the null (alternative) hypothesis. The sample size varies from n = 200 to n = 2000 .
Figure 8. Size-size and size-power plots of Granger non-causality tests, based on 500 replications and smoothed local bootstrap (a). The DGP is the VECM process in Equation (16), with Y affecting X. The left (right) column shows observed rejection rates under the null (alternative) hypothesis. The sample size varies from n = 200 to n = 2000 .
Entropy 19 00372 g008
Figure 9. Size-size and size-power plots of Granger non-causality tests, based on 500 replications and smoothed local bootstrap (a). The DGP is the bivariate threshold AR(1) process in Equation (17), with Y affecting X. The left (right) column shows observed rejection rates under the null (alternative) hypothesis. The sample size varies from n = 200 to n = 2000 .
Figure 9. Size-size and size-power plots of Granger non-causality tests, based on 500 replications and smoothed local bootstrap (a). The DGP is the bivariate threshold AR(1) process in Equation (17), with Y affecting X. The left (right) column shows observed rejection rates under the null (alternative) hypothesis. The sample size varies from n = 200 to n = 2000 .
Entropy 19 00372 g009
Figure 10. Size-power plots of Granger non-causality tests, based on 500 replications and smoothed local bootstrap (a). The DGP is the two-way VAR process in Equation (18), with X affecting Y and Y affecting X. The left (right) column shows observed rejection rates for testing X (Y) causing Y (X). The sample size varies from n = 200 to n = 2000 .
Figure 10. Size-power plots of Granger non-causality tests, based on 500 replications and smoothed local bootstrap (a). The DGP is the two-way VAR process in Equation (18), with X affecting Y and Y affecting X. The left (right) column shows observed rejection rates for testing X (Y) causing Y (X). The sample size varies from n = 200 to n = 2000 .
Entropy 19 00372 g010
Figure 11. Graphical representation of pairwise causalities on global stock returns and volatilities. All “→” in the graph indicate a significant directional causality at the 5% level.
Figure 11. Graphical representation of pairwise causalities on global stock returns and volatilities. All “→” in the graph indicate a significant directional causality at the 5% level.
Entropy 19 00372 g011
Figure 12. Time-varying p-values for the TE-based Granger causality test in Return series. The causal linkages from DJIA to other markets, as well as the linkages from other markets to DJIA are tested.
Figure 12. Time-varying p-values for the TE-based Granger causality test in Return series. The causal linkages from DJIA to other markets, as well as the linkages from other markets to DJIA are tested.
Entropy 19 00372 g012
Table 1. Observed size and power of the TE-based test for the linear VAR process in Equation (9).
Table 1. Observed size and power of the TE-based test for the linear VAR process in Equation (9).
Size
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 4.8 2000.05600.05000.04600.04200.0440 0.08200.07400.07400.07800.0780
5000.07400.06800.06600.07000.0660 0.11600.11200.12000.11600.1220
10000.06200.05600.06000.05600.0560 0.09400.09200.09800.09600.0980
20000.03800.03400.03800.04600.0460 0.09400.09200.09600.09800.0980
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 8 2000.05000.04400.04600.04600.0440 0.11200.10000.09600.09600.0920
5000.08400.07600.07600.07200.0660 0.13600.13000.10600.11600.1140
10000.07200.06800.06200.05600.0580 0.12800.12800.11600.12600.1200
20000.08800.07800.08200.07600.0820 0.14200.13800.13200.13400.1440
Power
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 4.8 2000.18800.19800.19000.19200.1900 0.27800.27800.28800.28600.2920
5000.34600.34600.34000.34800.3420 0.45200.44600.45800.45000.4500
10000.54400.53400.53200.53400.5280 0.64000.65200.64600.64800.6460
20000.75000.74200.75000.74600.7460 0.81600.80800.81000.81200.8120
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 8 2000.16600.16800.16400.16800.1700 0.26600.26400.26800.27400.2740
5000.29000.29000.30200.30400.3020 0.40200.39600.39400.40200.3980
10000.49800.49800.50000.49000.4980 0.60400.61200.61200.61400.6120
20000.84200.83800.84000.83400.8460 0.89600.88800.89000.89000.8900
Note: Empirical size and power of the TE-based test at 5% and 10% significance levels for process Equation (9) for different resampling methods. The values represent observed rejection rates over 500 realizations for nominal size 0.05. Sample sizes go from 200 to 2000. The control parameter a = 0.4 for size evaluation and a = 0.1 for establishing powers. For this simulation study we consider C = 4.8 and C = 8 .
Table 2. Observed size and power of the TE-based test for the nonlinear VAR process in Equation (10).
Table 2. Observed size and power of the TE-based test for the nonlinear VAR process in Equation (10).
Size
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 4.8 2000.05000.03600.03400.03800.0360 0.07600.07200.07800.07600.0720
5000.05800.05800.05800.05800.0580 0.09600.09800.10400.10400.0980
10000.03400.03600.03800.03600.0380 0.06200.05800.07400.07000.0720
20000.03800.03200.04400.04600.0420 0.07800.07000.09600.09200.0880
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 8 2000.05600.04800.04600.05200.0480 0.10000.08000.09400.09400.0940
5000.06400.06200.05000.04800.0540 0.11200.11000.10800.10400.1040
10000.04400.03600.03200.03000.0280 0.09000.08600.08200.07800.0800
20000.02600.03000.02800.02800.0260 0.07000.06400.07400.06600.0640
Power
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 4.8 2000.14000.13600.13800.14000.1440 0.24800.24200.23000.23600.2300
5000.33800.33600.33400.33600.3320 0.44000.44000.44400.43000.4360
10000.60600.60400.62400.62200.6220 0.71800.72600.71400.71800.7160
20000.87600.87600.87800.87400.8780 0.94400.93000.93200.93000.9280
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 8 2000.09400.09000.09000.09000.0880 0.18000.17400.17000.16800.1760
5000.18800.18000.18000.17600.1780 0.29600.29600.30000.29800.3020
10000.38000.39400.39000.38400.3820 0.55200.54400.54800.55200.5480
20000.83400.83000.82800.82200.8300 0.90400.90400.90600.89800.9040
Note: Empirical size and power of the TE-based test at 5% and 10% significance levels for process Equation (10) for different resampling methods. The values represent observed rejection rates over 500 realizations for nominal size 0.05. Sample sizes go from 200 to 2000. The control parameter a = 0.4 for size evaluation and a = 0.1 for establishing powers. For this simulation study, we consider C = 4.8 and C = 8 .
Table 3. Observed size and power of the TE-based test for the bivariate ARCH process in Equation (11).
Table 3. Observed size and power of the TE-based test for the bivariate ARCH process in Equation (11).
Size
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 4.8 2000.06600.05800.06200.06400.0620 0.14200.12800.12600.13400.1280
5000.06400.05600.05600.05600.0580 0.11200.10600.10800.10200.1000
10000.04800.04600.04000.04200.0340 0.07400.07000.07000.06600.0620
20000.02200.02000.00800.00800.0080 0.04600.05000.02200.02600.0180
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 8 2000.09200.07600.07600.07200.0840 0.15400.13600.12800.12600.1320
5000.11000.09000.07200.07600.0800 0.19800.18600.16200.14800.1600
10000.09200.09600.07400.07200.0800 0.15000.15000.12200.11800.1240
20000.07800.07400.06600.06200.0600 0.12600.11800.11400.11600.1120
Power
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 4.8 2000.23200.22400.21800.22400.2160 0.33200.33400.33200.35200.3460
5000.35200.34200.35200.35400.3540 0.48000.48000.47400.47800.4780
10000.50200.50600.51200.51200.5000 0.63400.63200.63400.62800.6300
20000.60200.60600.59400.59200.5960 0.73400.72800.73600.73000.7280
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 8 2000.29400.28800.32200.31800.3080 0.43600.43200.43800.44000.4340
5000.55200.55000.56200.57200.5680 0.68800.69400.68800.69000.6920
10000.77200.77200.78200.78000.7740 0.86000.85600.86400.86000.8640
20000.97800.97200.97800.97400.9760 0.99000.99000.99200.99200.9920
Note: Empirical size and power of the TE-based test at 5% and 10% significance levels for process Equation (11) for different resampling methods. The values represent observed rejection rates over 500 realizations for nominal size 0.05. Sample sizes go from 200 to 2000. The control parameter a = 0.4 for size evaluation and a = 0.1 for establishing powers. For this simulation study, we consider C = 4.8 and C = 8 .
Table 4. Observed size and power of the TE-based test for the bilinear process in Equation (12).
Table 4. Observed size and power of the TE-based test for the bilinear process in Equation (12).
Size
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 4.8 2000.04600.04200.04200.04200.0460 0.08200.07800.09000.08600.0900
5000.05400.04800.04800.05000.0480 0.09800.09400.10400.10400.1040
10000.04400.04400.04400.04200.0500 0.10200.10200.10600.10000.1040
20000.04600.04800.04800.04800.0520 0.09400.09400.10000.10000.1020
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 8 2000.04400.04000.04600.04600.0460 0.09400.08600.09400.09000.0940
5000.05000.04800.04400.04200.0420 0.08400.08800.08200.08200.0820
10000.06600.06200.05400.05200.0560 0.12200.12000.11800.11400.1100
20000.03600.03800.04000.03400.0360 0.08800.09200.10000.09800.0960
Power
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 4.8 2000.18000.18000.20400.20000.1960 0.30000.29000.30200.30000.3060
5000.46200.45800.46800.44600.4640 0.59200.59400.60200.60600.6100
10000.77000.78000.77800.78200.7820 0.86200.85600.86400.85800.8600
20000.97800.98200.98000.98000.9780 0.98600.98800.98800.98800.9880
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 8 2000.14400.13800.15400.14400.1460 0.23400.23200.25400.24200.2440
5000.32400.32000.33200.33200.3320 0.46200.46800.46800.47000.4700
10000.64000.63600.64000.62400.6260 0.75200.74800.74600.75000.7460
20000.96200.95800.96200.96000.9640 0.98800.99000.98600.99000.9840
Note: Empirical size and power of the TE-based test at 5% and 10% significance levels for process Equation (12) for different resampling methods. The values represent observed rejection rates over 500 realizations for nominal size 0.05. Sample sizes go from 200 to 2000. The control parameter a = 0.4 for size evaluation and a = 0.1 for establishing powers. For this simulation study, we consider C = 4.8 and C = 8 .
Table 5. Observed size and power of the TE-based test for the AR(2)-GARCH process in Equation (13).
Table 5. Observed size and power of the TE-based test for the AR(2)-GARCH process in Equation (13).
Size
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 4.8 2000.06000.05400.08400.07800.0740 0.12800.12800.15200.15000.1520
5000.05800.05600.07400.07200.0700 0.11000.10600.13800.13200.1300
10000.04200.04800.04400.05400.0540 0.09000.08400.09200.09200.0900
20000.04800.04400.03200.03400.0320 0.08000.08000.06600.06800.0660
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 8 2000.06800.06200.09200.08800.0860 0.12800.12000.14400.14000.1340
5000.08800.08800.11200.10600.1040 0.14000.14000.14800.15200.1520
10000.06000.06200.06600.06800.0680 0.11200.11400.10800.11600.1140
20000.08200.08000.08000.08400.0760 0.13800.13800.13600.13000.1300
Power
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 4.8 2000.08600.08000.08600.08400.0840 0.14200.14600.15000.14400.1400
5000.09800.09200.09800.08400.0920 0.17000.17000.15000.14800.1580
10000.07600.08000.05200.05400.0620 0.15000.14800.12200.11600.1080
20000.05400.04800.02800.02600.0260 0.10800.12000.04600.04600.0500
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 8 2000.11200.10200.13200.13000.1360 0.19000.18600.22000.21200.2140
5000.15400.15600.17400.16600.1680 0.25600.24800.26800.26400.2560
10000.21800.21000.20800.20000.2020 0.30800.31400.30200.29600.3020
20000.27400.28200.25400.25000.2560 0.39000.39000.34200.34600.3540
Note: Empirical size and power of the TE-based test at 5% and 10% significance levels for process Equation (13) for different resampling methods. The values represent observed rejection rates over 500 realizations for nominal size 0.05. Sample sizes go from 200 to 2000. The control parameter a = 0.4 for size evaluation and a = 0.1 for establishing powers. For this simulation study, we consider C = 4.8 and C = 8 .
Table 6. Observed size and power of the TE-based test for the ARMA-GARCH process in Equation (14).
Table 6. Observed size and power of the TE-based test for the ARMA-GARCH process in Equation (14).
Size
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 4.8 2000.05800.04800.06000.05600.0600 0.10800.09800.11600.11600.1080
5000.06400.06400.06600.06400.0700 0.10600.09600.11200.11000.1080
10000.05400.05000.04600.04200.0440 0.09400.09800.07600.08000.0820
20000.02400.02600.01600.01800.0200 0.06600.07200.02800.03000.0280
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 8 2000.05400.03800.07800.07600.0700 0.11600.10000.15200.14800.1520
5000.09000.08200.10200.10600.0940 0.14800.12400.16000.15200.1520
10000.05400.05800.06000.06200.0660 0.12200.11400.12200.12200.1260
20000.06200.06400.06400.06600.0680 0.12200.12200.11800.11400.1080
Power
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 4.8 2000.26800.26200.25600.26000.2640 0.38400.38200.38000.37200.3760
5000.35400.34600.35400.36000.3580 0.45800.45000.46000.46400.4520
10000.33800.34200.32800.32400.3260 0.42000.42600.38800.38600.3900
20000.27400.28000.23000.23200.2320 0.33600.33800.28200.28200.2880
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 8 2000.38600.38000.41400.40600.4100 0.51800.50000.53200.52600.5180
5000.72600.72800.72200.72000.7180 0.80600.80200.80400.80000.7980
10000.85000.84400.84000.83800.8320 0.87800.87400.87000.87000.8700
20000.89000.89600.88600.88800.8900 0.92400.92000.91600.91600.9140
Note: Empirical size and power of the TE-based test at 5% and 10% significance levels for process Equation (14) for different resampling methods. The values represent observed rejection rates over 500 realizations for nominal size 0.05. Sample sizes go from 200 to 2000. The control parameter a = 0.4 for size evaluation and a = 0.1 for establishing powers. For this simulation study, we consider C = 4.8 and C = 8 .
Table 7. Observed size and power of the TE-based test for the AR(1)-EGARCH process in Equation (15).
Table 7. Observed size and power of the TE-based test for the AR(1)-EGARCH process in Equation (15).
Size
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 4.8 2000.04200.04200.07400.07400.0740 0.10000.09400.15600.15000.1480
5000.06400.05200.08200.08600.0860 0.10400.09400.14600.14600.1460
10000.04000.03600.04600.04400.0480 0.07600.07400.08800.08800.0900
20000.03600.04000.03200.03800.0360 0.06800.06800.05200.05200.0580
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 8 2000.05400.04800.08600.08800.0860 0.11400.10600.14600.14400.1460
5000.09000.07400.10200.10200.1020 0.14400.13400.17600.16600.1580
10000.07200.06400.08000.08000.0800 0.12400.12000.13800.13200.1340
20000.06600.06400.06800.07000.0720 0.10200.10800.11000.11000.1160
Power
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 4.8 2000.16600.14600.22000.22200.2280 0.25800.25200.32800.32800.3260
5000.25000.24200.29400.29600.2960 0.37400.37200.41400.40800.4140
10000.28600.28600.30400.31000.3000 0.38400.38200.40000.40200.3960
20000.31800.30200.29000.29000.2900 0.37800.38400.35200.34400.3600
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 8 2000.21600.20000.27400.26600.2640 0.30000.28800.35600.35800.3600
5000.36600.35200.40400.40800.4020 0.49000.47200.57200.56000.5580
10000.62000.60200.66000.65200.6520 0.72800.71400.74200.74400.7460
20000.83000.83600.84600.84200.8380 0.88400.88200.88800.88600.8920
Note: Empirical size and power of the TE-based test at 5% and 10% significance levels for process Equation (15) for different resampling methods. The values represent observed rejection rates over 500 realizations for nominal size 0.05. Sample sizes go from 200 to 2000. The control parameter a = 0.4 for size evaluation and a = 0.1 for establishing powers. For this simulation study, we consider C = 4.8 and C = 8 .
Table 8. Observed size and power of the TE-based test for the VECM process in Equation (16).
Table 8. Observed size and power of the TE-based test for the VECM process in Equation (16).
Size
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 4.8 2000.00600.00600.00400.00400.0040 0.02200.02000.02200.02200.0200
5000.02200.02200.02400.02400.0220 0.04600.04800.05000.04800.0560
10000.04000.04000.04000.04000.0420 0.08400.08600.09000.08400.0860
20000.04400.04200.03800.03600.0360 0.08000.07800.08200.07400.0700
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 8 2000.00800.01000.01000.00800.0100 0.03400.03000.03000.02800.0340
5000.02400.02200.02200.02000.0240 0.06800.06600.06800.06800.0660
10000.04800.04200.04200.04400.0440 0.08600.08400.08400.08200.0880
20000.02000.02200.02000.02000.0220 0.05600.05400.05200.05600.0540
Power
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 4.8 2000.25400.25200.26000.26000.2640 0.32600.33000.34000.33800.3440
5000.51400.51200.52200.52800.5120 0.61200.59200.59800.60600.6080
10000.78400.78400.78000.78400.7800 0.83400.83200.84200.84400.8420
20000.92400.92600.92800.92600.9280 0.95600.95600.95200.95400.9520
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 8 2000.25000.24400.25000.25600.2480 0.31000.30000.31400.31600.3140
5000.46000.46400.46800.47000.4740 0.56000.55600.55400.56000.5580
10000.76200.76600.77800.76800.7760 0.82200.81800.82400.82400.8280
20000.95200.95600.95600.95400.9520 0.97800.97000.97800.97800.9780
Note: Empirical size and power of the TE-based test at 5% and 10% significance levels for process Equation (16) for different resampling methods. The values represent observed rejection rates over 500 realizations for nominal size 0.05. Sample sizes go from 200 to 2000. The control parameter a = 0.4 for size evaluation and a = 0.1 for establishing powers. For this simulation study, we consider C = 4.8 and C = 8 .
Table 9. Observed size and power of the TE-based test for the threshold AR(1) process in Equation (17).
Table 9. Observed size and power of the TE-based test for the threshold AR(1) process in Equation (17).
Size
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 4.8 2000.04200.03600.04200.03600.0360 0.07800.06800.07200.07000.0720
5000.04600.04200.04400.04600.0480 0.08800.08800.08600.09000.0940
10000.03800.04200.04200.04000.0360 0.09800.09000.07600.08400.0800
20000.04400.04800.04200.04000.0400 0.08600.09000.07800.08000.0780
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 8 2000.04000.04000.03800.04000.0400 0.08000.08000.09200.08000.0860
5000.04200.03800.04400.05600.0560 0.10800.10200.10200.10400.1060
10000.04800.04400.04400.04400.0500 0.10200.09800.09600.10800.1000
20000.04400.04600.04800.04400.0460 0.09400.08600.08400.08800.0920
Power
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 4.8 2000.85200.84600.82000.82800.8120 0.90800.91200.89400.89600.8920
5001.00001.00001.00001.00001.0000 1.00001.00001.00001.00001.0000
10001.00001.00001.00001.00001.0000 1.00001.00001.00001.00001.0000
20001.00001.00001.00001.00001.0000 1.00001.00001.00001.00001.0000
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 8 2000.39000.37400.31200.31600.3180 0.49800.50000.44200.44600.4440
5000.94600.94400.92200.92400.9200 0.97600.97600.96000.96400.9660
10001.00001.00001.00001.00001.0000 1.00001.00001.00001.00001.0000
20001.00001.00001.00001.00001.0000 1.00001.00001.00001.00001.0000
Note: Empirical size and power of the TE-based test at 5% and 10% significance levels for process Equation (17) for different resampling methods. The values represent observed rejection rates over 500 realizations for nominal size 0.05. Sample sizes go from 200 to 2000. The control parameter a = 0.4 for size evaluation and a = 0.1 for establishing powers. For this simulation study, we consider C = 4.8 and C = 8 .
Table 10. Observed Power of the TE-based test for the two-Way VAR process in Equation (18).
Table 10. Observed Power of the TE-based test for the two-Way VAR process in Equation (18).
X Y
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 4.8 2000.27800.26000.35400.35200.3560 0.39800.37600.45600.45600.4660
5000.56400.53600.62600.62400.6260 0.67800.66800.74200.74000.7460
10000.83800.83200.88000.87400.8780 0.90400.89800.92000.92400.9200
20000.99000.99000.99800.99601.0000 1.00001.00001.00001.00001.0000
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 8 2000.24800.22400.30800.31800.3100 0.35400.33600.39000.39400.3920
5000.41800.40200.47400.47000.4700 0.54000.53200.58800.59200.5800
10000.79600.78400.81400.81400.8160 0.85600.85400.87400.87200.8720
20000.98000.97800.98200.98000.9800 0.99000.98800.99000.99000.9920
Y X
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 4.8 2000.51400.50600.54800.55600.5480 0.61200.59600.69200.69200.6940
5000.94000.93400.94800.94400.9460 0.96200.95800.97200.97000.9740
10001.00001.00001.00001.00001.0000 1.00001.00001.00001.00001.0000
20001.00001.00001.00001.00001.0000 1.00001.00001.00001.00001.0000
α = 0.05 α = 0.10
nTS.aTS.bSMB.aSMB.bSTB TS.aTS.bSMB.aSMB.bSTB
C = 8 2000.40400.37000.44200.44000.4280 0.50600.49000.54000.54000.5320
5000.82600.82200.82200.82000.8160 0.87400.87400.87000.87200.8680
10000.98200.98400.98200.98000.9800 0.99400.99400.99400.99200.9940
20001.00001.00001.00001.00001.0000 1.00001.00001.00001.00001.0000
Note: Empirical power of the TE-based test at 5% and 10% significance levels for process Equation (18) for different resampling methods. The values represent observed rejection rates over 500 realizations for nominal size 0.05. Sample sizes go from 200 to 2000. The control parameter b = 0.2 and c = 0.1 for establishing powers. For this simulation study, we consider C = 4.8 and C = 8 .
Table 11. Descriptive statistics for global stock market return and volatility.
Table 11. Descriptive statistics for global stock market return and volatility.
Return
DJIANikkeiHangsengFTSEDAXCAC
Mean0.1433−0.01260.12940.08230.15350.0790
Median0.29110.14720.26270.21210.40290.1984
Maximum10.697711.449613.916912.584514.942112.4321
Minimum−20.0298−27.8844−19.9212−23.6317−24.3470−25.0504
Std. Dev.2.23213.05213.38192.33673.09722.9376
Skewness−0.8851−0.6978−0.3773−0.8643−0.6398−0.6803
Kurtosis10.84308.92505.952213.27777.91868.0780
LB Test49.9368 * * 15.457728.792261.0916 * * 28.047443.5004 * *
ADF Test−38.9512 * * −37.1989 * * −35.4160 * * −38.8850 * * −37.2015 * * −38.7114 * *
Volatility
DJIANikkeiHangsengFTSEDAXCAC
Mean4.59837.71679.61645.53068.56988.2629
Median2.21554.62084.58272.71614.05964.7122
Maximum208.2227265.9300379.4385149.1572175.0968179.8414
Minimum0.06360.18820.15540.11540.12630.2904
Std. Dev.9.996113.515421.383810.116715.284512.7872
Skewness10.99809.636110.28686.71795.36026.0357
Kurtosis180.0844140.7606152.426367.012842.081058.3263
LB Test1924.0870 * * 933.3972 * * 1198.6872 * * 1970.7366 * * 2770.5973 * * 1982.4141 * *
ADF Test−16.0378 * * −17.9044 * * −17.4896 * * −14.0329 * * −14.1928 * * −13.6136 * *
Note: Descriptive statistics for six globally leading indexes. The sample size is 1313 for both Returns and Volatilities. The nominal returns are measured by weekly Friday-to-Friday log price difference multiplied by 100 and the Monday-to-Friday volatilities are calculated following [49]. For the LB test and the ADF test statistics, the asterisks indicate the significance of the corresponding p-value at 1% ( * * ) levels.
Table 12. Detection of Conditional Dependence in Global Stock Returns.
Table 12. Detection of Conditional Dependence in Global Stock Returns.
ToDJIANikkeiHangsengFTSEDAXCAC
From
C = 4.8 C = 8 C = 4.8 C = 8 C = 4.8 C = 8 C = 4.8 C = 8 C = 4.8 C = 8 C = 4.8 C = 8
TS.aDJIA-0.3670.002 * * 0.9720.2730.6090.009 * * 0.5330.012 * 0.2770.001 * *
Nikkei0.2310.081-0.9970.9510.8830.0590.8680.2420.004 * * 0.001 * *
Hangseng0.8980.1970.9630.407-0.7010.035 * 0.9690.1740.6400.005 * *
FTSE0.4830.004 * * 0.004 * * 0.001 * * 0.4190.001 * * -0.8390.1850.9170.615
DAX0.9770.1490.027 * 0.001 * * 0.009 * * 0.001 * * 0.004 * * 0.001 * * -0.6950.025 *
CAC0.9950.7130.001 * * 0.001 * * 0.2940.001 * * 0.9180.7410.6390.203-
TS.bDJIA-0.4020.004 * * 0.9670.3410.5950.020 * 0.5690.049 * 0.2880.012 *
Nikkei0.2190.110-0.9990.9560.8530.1030.8550.2550.009 * * 0.006 * *
Hangseng0.8990.2690.9750.485-0.6960.0880.9710.2310.6640.023 *
FTSE0.4770.016 * 0.006 * * 0.003 * * 0.4040.018 * -0.8470.2450.8960.642
DAX0.9760.2210.027 * 0.002 * * 0.009 * * 0.005 * * 0.011 * 0.010 * * -0.6920.065
CAC0.9930.7290.002 * * 0.002 * * 0.3460.016 * 0.9070.7630.6500.244-
SMB.aDJIA-0.5640.001 * * 0.9990.3490.7960.003 * * 0.8170.014 * 0.4250.001 * *
Nikkei0.3210.147-0.9880.9460.9570.0850.9440.3210.006 * * 0.002 * *
Hangseng0.9460.2730.9670.483-0.8600.016 * 0.9940.1880.7930.004 * *
FTSE0.5790.005 * * 0.021 * 0.001 * * 0.7010.003 * * -0.9470.2970.9430.739
DAX0.9880.2400.044 * 0.001 * * 0.012 * 0.001 * * 0.006 * * 0.001 * * -0.7880.020 *
CAC0.9930.7620.006 * * 0.001 * * 0.5940.001 * * 0.9460.8610.8420.270-
SMB.bDJIA-0.5830.002 * * 0.9940.3340.7970.001 * * 0.8550.014 * 0.4130.001 * *
Nikkei0.3510.155-0.9920.9400.9520.0880.9650.3220.002 * * 0.003 * *
Hangseng0.9450.2760.9700.506-0.8660.015 * 0.9970.2150.8290.004 * *
FTSE0.6370.006 * * 0.008 * * 0.001 * * 0.7140.003 * * -0.9540.3450.9530.789
DAX0.9800.2560.034 * 0.001 * * 0.011 * 0.001 * * 0.009 * * 0.001 * * -0.8310.042 *
CAC0.9930.8250.003 * * 0.001 * * 0.5910.002 * * 0.9800.8980.8620.304-
STBDJIA-0.6450.001 * * 0.9960.3340.7870.002 * * 0.8230.015 * 0.4300.001 * *
Nikkei0.3630.114-0.9890.9440.9460.0790.9550.2960.003 * * 0.002 * *
Hangseng0.9640.2720.9840.491-0.8590.013 * 0.9870.1970.7860.004 * *
FTSE0.6520.005 * * 0.016 * 0.001 * * 0.6880.003 * * -0.9400.2620.9650.761
DAX0.9820.2340.048 * 0.001 * * 0.017 * 0.001 * * 0.006 * * 0.001 * * -0.8280.029 *
CAC0.9960.8140.007 * * 0.001 * * 0.5780.001 * * 0.9670.8350.7990.265-
Note: Statistics for pairwise TE-based test on returns of global stock-indexes for one-week ahead conditional non-independence. The results are shown both for the five different resampling methods in Section 2.3. The constant C takes value 4.8 and 8 for robustness-check. The asterisks indicate the significance of the corresponding p-value, at the 5% ( * ) and 1% ( * * ) levels.
Table 13. Detection of Conditional Dependence in Global Stock Volatilities.
Table 13. Detection of Conditional Dependence in Global Stock Volatilities.
ToDJIANikkeiHangsengFTSEDAXCAC
From
C = 4.8 C = 8 C = 4.8 C = 8 C = 4.8 C = 8 C = 4.8 C = 8 C = 4.8 C = 8 C = 4.8 C = 8
TS.aDJIA-0.9970.9980.9980.9940.9740.8530.8280.001 * * 0.005 * * 0.001 * *
Nikkei0.9980.995-0.9961.0000.9970.9940.9710.9501.0000.999
Hangseng0.9430.003 * * 0.9890.992-0.8220.001 * * 0.001 * * 0.001 * * 0.9730.953
FTSE0.010 * * 0.001 * * 0.9550.9440.9971.000-0.8060.001 * * 0.9850.946
DAX0.9750.9310.9340.8980.9990.9950.001 * * 0.001 * * -0.0720.001 * *
CAC0.9960.9440.9880.9870.9990.9970.003 * * 0.001 * * 0.0540.001 * * -
TS.bDJIA-0.9930.9940.9960.9920.9580.8570.8060.010 * * 0.018 * 0.005 * *
Nikkei0.9860.994-0.9960.9970.9880.9880.9420.9260.9980.996
Hangseng0.9190.011 * 0.9760.966-0.8190.002 * * 0.001 * * 0.001 * * 0.9650.941
FTSE0.019 * 0.003 * * 0.9500.9270.9940.997-0.7850.002 * * 0.9820.932
DAX0.9760.9040.9430.8900.9970.9990.001 * * 0.001 * * -0.1130.009 * *
CAC0.9980.9510.9790.9830.9990.9980.009 * * 0.002 * * 0.0770.003 * * -
SMB.aDJIA-0.8230.7860.9840.9680.4680.1890.2100.004 * * 0.027 * 0.002 * *
Nikkei0.9570.941-1.0001.0000.8430.8000.7430.6140.9980.993
Hangseng0.4580.030 * 0.8880.808-0.1650.007 * * 0.001 * * 0.001 * * 0.8390.736
FTSE0.034 * 0.001 * * 0.5570.4171.0001.000-0.3580.001 * * 0.9670.793
DAX0.7020.2550.4480.3271.0001.0000.001 * * 0.001 * * -0.030 * 0.001 * *
CAC0.8870.2410.6730.6431.0001.0000.005 * * 0.001 * * 0.012 * 0.001 * * -
SMB.bDJIA-0.8510.8060.9730.9690.5630.2690.2260.004 * * 0.032 * 0.001 * *
Nikkei0.9500.940-1.0001.0000.8880.8490.7920.6750.9990.993
Hangseng0.4510.022 * 0.9190.869-0.2090.005 * * 0.002 * * 0.001 * * 0.8680.793
FTSE0.023 * 0.001 * * 0.5610.4571.0001.000-0.4340.001 * * 0.9740.840
DAX0.8190.4420.4910.3561.0001.0000.001 * * 0.001 * * -0.030 * 0.001 * *
CAC0.9580.5540.7810.6961.0001.0000.009 * * 0.001 * * 0.014 * 0.001 * * -
STBDJIA-0.8470.8130.9720.9640.6020.2780.2210.009 * * 0.045 * 0.004 * *
Nikkei0.9670.956-1.0001.0000.8350.7890.7770.6240.9980.997
Hangseng0.5270.033 * 0.9330.887-0.2220.010 * * 0.003 * * 0.001 * * 0.8600.762
FTSE0.024 * 0.002 * * 0.6350.4971.0001.000-0.4050.001 * * 0.9770.826
DAX0.7950.3920.5610.4101.0001.0000.004 * * 0.001 * * -0.027 * 0.001 * *
CAC0.9030.5610.8090.7481.0001.0000.020 * 0.001 * * 0.010 * * 0.001 * * -
Note: Statistics for pairwise TE-based test on volatilities of global stock-indexes for one-week ahead conditional non-independence. The results are shown both for the five different resampling methods in Section 2.3. The constant C takes value 4.8 and 8 for robustness-check. The asterisks indicate the significance of the corresponding p-values at the 5% ( * ) and 1% ( * * ) levels.

Share and Cite

MDPI and ACS Style

Diks, C.; Fang, H. Transfer Entropy for Nonparametric Granger Causality Detection: An Evaluation of Different Resampling Methods. Entropy 2017, 19, 372. https://doi.org/10.3390/e19070372

AMA Style

Diks C, Fang H. Transfer Entropy for Nonparametric Granger Causality Detection: An Evaluation of Different Resampling Methods. Entropy. 2017; 19(7):372. https://doi.org/10.3390/e19070372

Chicago/Turabian Style

Diks, Cees, and Hao Fang. 2017. "Transfer Entropy for Nonparametric Granger Causality Detection: An Evaluation of Different Resampling Methods" Entropy 19, no. 7: 372. https://doi.org/10.3390/e19070372

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop