1. Introduction
Describing ‘connectivity’ has become of paramount interest in many areas of investigation that involve interacting systems. Physiology, climatology, and economics are three good examples where dynamical evolution modelling is often hindered as system manipulation may be difficult or unethical. Consequently, interaction inference is frequently constrained to using time observations alone.
A number of investigation approaches have been put forward [
1,
2,
3,
4,
5]. However, the most popular and traditional one still is the nonparametric computation of cross-correlation (CC) between pairs of time series, and variants thereof, like coherence analysis [
6], even despite their many shortcomings [
7].
When it comes to connectivity analysis, recent times have seen the rise of Granger Causality (GC) as a unifying concept. This is mostly due to GC’s unreciprocal character [
8] (as opposed to CC) which allows for establishing the direction of information flow between component subsystems.
Most GC approaches rest on fitting parametric models to time series data and, again as opposed to CC, under appropriate conceptualization, also holds for more than just pairs of time series, giving rise to the ideas of (a) Granger connectivity and (b) Granger influentiability [
9].
GC inferential methodology is dominated by the use of
linear multivariate time series models [
10]. This is so because linear models have statistical properties (and shortcomings) that are well understood besides having the advantage of sufficing when the data are Gaussian. As an added advantage GC characterization allows immediate frequency domain connectivity characterization via concepts like ‘directed coherence’ (DC) and ‘partial directed coherence’ (PDC) [
11].
It is often the case, however that data Gaussianity does not hold. Whereas nonparametric approaches do exist [
1,
4,
5], parametric nonlinear modelling offers little relief from the need for long observation data sets for reliable estimation in sharp contrast to linear models that perform well under the typical scenario of fairly short datasets over which natural phenomena can be considered stable. A case in point is neural data where animal behaviour changes are associated with relatively short-lived episodic signal modifications.
The motivation for the present development is that reproducing kernel transformations applied to data, as in the support vector machine learning classification case [
12], can effectively produce estimates that inherit many of the good convergence properties of linear methods. Because these properties carry over under proper kernelization, it is possible to show that nonlinear links between subsystems can be rigorously detected.
Before proceeding, it is important to have in mind that the present developments focus on addressing the connectivity detection issue exclusively, in which we clearly show that solving the so-called pre-image reconstruction problem is unnecessary as has been until now assumed essential. This leads to a considerably simpler approach.
In
Section 2, we formulate the problem and review some background about reproducing kernel theory together with the main results which are backed up by extensive numerical Monte Carlo illustrations in
Section 3. Conclusions and current problem status and challenges end the paper (
Section 4).
2. Problem Formulation
The most popular approach to investigating GC connectivity is through modeling multivariate time series via linear vector autoregressive models [
10], where the central idea is to compare prediction effectiveness for a time series
when the past of other time series is taken into account in addition to its own past. Namely,
Under mild conditions, Equation (
1) constitutes a valid representation of a linear stationary stochastic process where the evolution of
is obtained by filtering suitable
purely stochastic innovation processes, i.e., where
and
are independent provided
[
13]. If
are jointly Gaussian, so are
and the problem of characterizing connectivity reduces to well known procedures to estimate the
parameters in Equation (
1) via least squares, which is the applicable maximum likelihood procedure. Nongaussian
translate into nongaussian
even if some actual (
1) linear generation mechanism holds. Linearity among nongaussian
time series may be tested with help of cross-polyspectra [
14,
15], which, if unrejected, still allows for a representation like (
1) whose optimal estimation requires a suitable likelihood function to accommodate the observed non-Gaussianity.
If linearity is rejected,
non-Gaussianity is a sign of nonlinear mechanisms of generation modelled by
which generalizes (
1) where
stands for
’s past under some suitable dynamical law
.
The distinction between (a) nonlinear
that are nonetheless linearly coupled as in (
1) under nongaussian
and (b) fully nonlinearly coupled processes is often overlooked. In the former case, linear methods suffice for connectivity detection [
16] but fail in the latter case [
17] calling for the adoption of alternative approaches. In some cases, however, linear approximations are inadequate in so far as to preclude connectivity detection [
17].
In the present context, the solution to the connectivity problem entails a suitable data driven approximation of
whilst singling out the
and
of interest. To do so, we examine the employment of
kernel methods [
18] where functional characterization is carried out with the help of a high dimensional space representation
for
), where
is a mapping from the input space
into the feature space
whose role is to properly unwrap the data and yet ensure that the inner product
can be written as a simple function of
and
dispensing with the need for computations in
. This possibility is granted by chosing
to satisfy the so-called
Mercer condition [
19].
A simple example of (
3) is the mapping
for
and
using Dirac’s bra-ket notation. In this case, the Mercer kernel is given by
which is the simplest example of a polynomial kernel [
18].
In the multivariate time series case, we consider
where, for simplicity, we adopt the same transformation
for each
time series component so that the
is a matrix whose elements are given by
. In the development below, we follow the standard practice of denoting the
quantities as
in view of the assumed stationarity of the processes under study.
Rather than go straight into the statement of the general theory, a simple example is more enlightening. In this sense, consider a bivariate stationary time series
where
are nonlinear functions and only the previous instant is relevant in producing the present behaviour. An additional feature, thru (9), is that
is connected to (Granger causes)
but not conversely. Application of the kernel transformation leads to
However, if one assumes the possibility of a linear approximation in
, one may write
where
stands for approximation errors in the form of innovations. Mercer kernel theory allows for taking the external product with respect to
on both sides of (
12) leading to
after taking expectations on both sides where
and
since
for
given that
plays a zero mean innovations role.
It is easy to obtain from sample kernel estimates. Furthermore, it is clear that (8) holds if and only if .
To (
13), which plays the role of Yule–Walker equations and which can be written more simply as
and one may add the following equation to compute the innovations covariance
where only reference to the
difference is explicitly denoted assuming signal stationarity so that (
17) simplifies to
This formulation is easy to generalize to model orders
and to more time series via
where
which is assumed as due to filtering appropriately modelled innovations
. For the present formulation, one must also consider the associated ‘ket’-vector
that when applied to (
19) for
after taking expectations
under the zero mean innovations nature of
leads to
where
and
’s elements are given by
so that (
22) constitutes a generalization of the Yule–Walker equations. By making
one may reframe (
22) in matrix form as
where
is block Toeplitz matrix containing
p Toeplitz blocks. Equation (
23) provides
equations for the same number of unknown parameters in
.
The high model order counterpart to (
17) is given by
It is not difficult to see that the more usual Yule–Walker complete equation form becomes
There are a variety of ways for solving for the parameters. A simple one is to define
leading to
Even though one may employ least-squares solution methods to solve either (
26) or (
23), a Total-Least-Squares (TLS) approach [
20] has proven a better solution since both members of the equations are affected by estimation inaccuracies that are better dealt with using TLS.
Likewise, (
24) can be used in conjunction with generalizations of model order criteria of Akaike’s AIC type
where
stands for the number of available time observations. In generalizing Akaike’s criterion to the multivariate case
, whereas
for the Hannan–Quinn criterion, our choice in this paper.
Thus far, we have described procedures for choosing model order in the space. In ordinary time series analysis, in addition to model order identification, one must also perform proper model diagnostics. This entails checking for residual whiteness among other things. This is usually done by checking the residual auto/crosscorrelation functions for their conformity to a white noise hypothesis.
In the present formulation because, we do not explicitly compute the
space series, we must resort to means other than computing the latter correlation functions from the residual data as usual. However, using the same ideas for computing (
24), one may obtain estimates of the innovation cross-correlation in the feature space at various lags as
by replacing
by their estimates and using
obtained by solving (
22) for
between a minimum
to a
maximum lag. The usefulness of (
28) is to provide means to test model accuracy and quality as a function of
choice under the best model order provided by the model order criterion.
By defining a suitable normalized estimated lagged
kernel correlation function (KCF)
which, given the inner product nature of kernel definition, satisfies the condition
as easily proved using the Cauchy–Schwarz inequality.
The notion of
applies not only to the original kernels but also in connection with the residual kernel values given by (
28) where, for explicitness, we write it as
where
are the matrix entries in (
28).
In the numerical illustrations that follow, we have assumed that
asymptotically under the white residual hypothesis
This choice turned out to be reasonably consistent in practice. Along the same line of reasoning, other familiar tests over residuals, such as the Portmanteau test [
10] were also carried out and consistently allowed verifying residual nonwhiteness.
One may say that the present theory follows closely the developments of ordinary second order moment theory with the added advantage that now nonlinear connections can be effectively captured by replacing second order moments by their respective lagged kernel estimates.
2.1. Estimation and Asymptotic Considerations
The essential problem then becomes that of estimating the entries of
, entries. They can be obtained by averaging kernel values computed over the available data
for nonzero terms in the
range.
Under these conditions, for an appropriately defined kernel function, the feature space becomes linearized and, following [
21], it is fair to assume that the estimated vector stacked representation of the model coefficient matrices
is asymptotically Gaussian, i.e.,
where
is the feature space residual matrix given by (
28) and where
for the ‘bra’-vector
and the ‘ket’-vector
which are used to construct the kernel scalar products. It is immediate to note that (
36) is a Toeplitz matrix composed of suitably displaced
blocks.
An immediate consequence of (
35) is that one may test for model coefficient nullity and thereby provide a kernel Granger Causality test. This is equivalent to testing for
so that the statistic
where
is a contrast matrix (or structure selection matrix) so that the null hypothesis becomes
Hence, under (
35),
where
corresponds to the number of the explicitly imposed constraints on
.
Data Workflow
Given , analysis proceeds by
Computing the kernel values (
33) to obtain the kernel Yule–Walker equations (
25) or equivalently (
26) for a given value of
p (starting from
);
After solving the latter for the parameters in
via Total-Least-Squares (TLS), one computes (
18) wherefrom the generalized model order choice criterion (
27) can be computed;
With the help of the computed (
33) values, one can obtain the residual
functions in (
31) which can be used to check model adequacy via (
32). Additionally, Portmanteau tests may be also used;
If analysis does not suggest feature space model residual whiteness, p is increased by 1, and the procedure from step 1 is repeated until feature space model residual whiteness is obtained and attains its first local minimum meaning that the ideal model order has been reached;
Once the best model is attained, one employs the (
39) to infer connectivity.
These steps closely mirror those of ordinary time series model fitting and analysis.
4. Conclusions and Future Work
After a brief theoretical presentation (
Section 2), we have shown that canonical model fitting procedures that involve (a) model specification with order determination and (b) explicit model diagnostic testing can be successfully carried out in the feature space
to detect connectivity via reproducing kernels. In dealing with Granger causality detection using kernels as in [
29,
30], this stands in sharp contrast as the latter depend on solving the
reconstruction/pre-image problem to provide prediction error estimates in the original data space
. In fact, part of the challenge in pre-image determination lies in its frequently associated numerical ill-condition [
31].
The key result behind doing model diagnostics and inference in
is (
28) by realizing that kernel quantities may be normalized much as correlation coefficients. It should be noted that (
28) holds even in the case of (nonkernel) linear modelling by replacing the
matrices by auto/crosscorrelation matrices, something that, in practice, is never adopted in classical linear time series modelling because the necessary auto/crosscorrelations are more efficiently computed from model residuals that are easy to obtain as no pre-imaging problem is involved there.
Thus, what importantly sets the present approach apart from previous work is the lack of need for returning to the original input space to gauge model quality as the reconstruction/pre-image problem can be fully circumvented bypassing unnecessary uncertainties.
As such, we showed that, because model adequacy testing can be performed directly in the feature space , directional Granger type connectivity can be detected for a variety of multivariate nonlinear coupling scenarios, thereby totally dispensing with the need for detailed ‘a priori’ model knowledge.
We observed that successful connectivity detection is achievable at the expense of a relatively short time series. A systematic comparison with other approaches [
4,
5,
32,
33,
34,
35] is planned for future work, but, at least for the cases we tested so far, savings of at least one order magnitude in record lengths are feasible.
One of the basic tenants of the present work is that model coefficients in the feature space are asymptotically normal, something whose consistency was successfully illustrated though the need for a more formal proof remains, especially in connection to explicit kernel estimates under the total-least-squares solution to (
23). Our choice of TLS was dictated by its apparent superiority when compared to the ‘kernel trick’ [
32] whose multivariate version we employed in [
26,
36,
37].
In this context, it is important to note that, contrary to other methods that require time-consuming resampling procedures for adequate inference, the present approach relies on asymptotic statistics and is thus less susceptible to eventual problems derived from data shuffling.
One of the advantages of the present development is that the procedure allows for determining how far in the past to look via the model order criteria we employed (
27).
Even though order estimation and model testing were successful upon borrowing from the usual linear modelling practices, further systematic examination is still needed and is under way.
One may rightfully argue that the kernels we chose for illustrating the present work are equivalent to modelling the original time series after the application of a suitable transformation to the data and that they look for the causality evidence present in higher order momenta. This, in fact, explains why quadratic kernels converge much faster than quartic ones in Example 1. The merit of framing the time series transformation discussion for connectivity detection in terms of kernels produces a simple workflow and paves the way to developing future data-driven criteria towards optimum data transformation choice for a given problem. Other kernel choices are being investigated.
The signal model used in the present development does not contemplate additive measurement noise whose impact on connectivity detection we also leave for future examination.
One thing the present type of analysis cannot do is expose details of how the nonlinearity takes place. For example, coupling may be quadratic or involve higher exponential powers or some other function. What the present approach can do, however, is to expose existing connections, so that modelling efforts can be concentrated on them, thereby avoiding modelling parameter waste on non relevant links.
Finally, the present systematic empirical investigation sets the proposal of using feature space-frequency domain descriptions of connectivity like
kernel partial directed coherence [
26,
36], and
kernel directed transfer function [
37] on sound footing, especially with respect to their asymptotic connectivity behaviour.