Linearized Transfer Entropy for Continuous Second Order Systems

Nichols, Jonathan M.; Bucholtz, Frank; Michalowicz, Joe V.

doi:10.3390/e15083276

Open AccessArticle

Linearized Transfer Entropy for Continuous Second Order Systems

by

Jonathan M. Nichols

^1,*,

Frank Bucholtz

¹ and

Joe V. Michalowicz

^2,†

¹

Naval Research Laboratory, Optical Sciences Division, Washington DC 20375, USA

²

Naval Research Laboratory, Optical Science Division, Washington DC 20375, USA

^*

Author to whom correspondence should be addressed.

^†

Permanent Address: Global Strategies Group, Crofton, MD 21114, USA

Entropy 2013, 15(8), 3186-3204; https://doi.org/10.3390/e15083276

Submission received: 22 May 2013 / Revised: 5 July 2013 / Accepted: 18 July 2013 / Published: 7 August 2013

(This article belongs to the Special Issue Transfer Entropy)

Download

Browse Figures

Versions Notes

Abstract

:

The transfer entropy has proven a useful measure of coupling among components of a dynamical system. This measure effectively captures the influence of one system component on the transition probabilities (dynamics) of another. The original motivation for the measure was to quantify such relationships among signals collected from a nonlinear system. However, we have found the transfer entropy to also be a useful concept in describing linear coupling among system components. In this work we derive the analytical transfer entropy for the response of coupled, second order linear systems driven with a Gaussian random process. The resulting expression is a function of the auto- and cross-correlation functions associated with the system response for different degrees-of-freedom. We show clearly that the interpretation of the transfer entropy as a measure of “information flow” is not always valid. In fact, in certain instances the “flow” can appear to switch directions simply by altering the degree of linear coupling. A safer way to view the transfer entropy is as a measure of the ability of a given system component to predict the dynamics of another.

Keywords:

transfer entropy; joint entropy; coupling

1. Introduction

One of the biggest challenges in the modeling and analysis of dynamical systems is understanding coupling mechanisms among different system components. Whether one is studying coupling on a small scale (e.g., neurons in a biological system) or large scale (e.g. coupling among widely separated geographical locations due to climate), understanding the functional form, strength, and/or direction of the coupling between two or more system components is a non-trivial task. However, this understanding is necessary if we are to build accurate models of the coupled system and make predictions (our ultimate goal). Accurately assessing the functional form of the coupling is beyond the scope of this work. To do so would require positing various models for a particular coupled system and then testing the predictive power of those models against observed data. Rather, the focus here is on understanding the strength and direction of the coupling among two system components. This task can be accomplished by forming a general hypothesis about what it means for two system components to be coupled, and then testing that hypothesis against observation. It is in this framework that the transfer entropy is operates.

The transfer entropy (TE) is a scalar measure designed to capture both the magnitude and direction of coupling among two components of a dynamical system. This measure was posed initially for data described by discrete probability distributions [1] and was later extended to continuous random variables [2]. By construction, this measure quantifies a general definition of coupling that is appropriate for both linear and nonlinear systems. Moreover, TE is defined in such a way as to provide insight into the direction of the coupling (is component A driving component B or vice-versa?). Since its introduction, the TE has been applied to a diverse set of systems, including biological [1,3], chemical [4], economic [5], structural [6,7], and climate [8]. A number of papers in the Neurosciences also have focused on the TE as a useful way to draw inference about coupling [9,10,11]. In each case the TE provided information about the system that traditional linear measures of coupling (e.g., cross-correlation) could not.

The TE has also been linked to other concepts of coupling such as “Wiener-Granger Causality”. in fact, for the class of systems studied in this work the TE can be shown to be entirely equivalent to measures of Granger causality [12]. Linkages to other models and concepts of dynamical coupling such as conditional mutual information [13] and Dynamic Causal Modeling (DCM) [14], are also possible for certain special cases. The connectivity model assumed by DCM is fundamentally nonlinear (specifically bilinear), however as the degree of nonlinearity decreases the form of the DCM model approaches that of the model studied here.

Although the TE was designed as a way to gain insight into nonlinear system coupling, we have found the TE to be quite useful in the study of linear systems as well. In this special case, analytical expressions for the TE are possible and can be used to provide useful insight into the behavior of the TE. Furthermore, unlike in the general case, the linearized TE can be easily estimated from observed data. This work is therefore devoted to the understanding of TE as applied to coupled, driven linear systems. Specifically, we consider coupling among components of a general, second order linear structural system driven by a Gaussian random process. The particular model studied is used to describe numerous phenomena, including structural dynamics, electrical circuits, heat transfer, etc. [15]. As such, it presents an opportunity to better understand the properties of the TE for a broad class of dynamical systems. Section 1 develops the general analytical expression for the TE in terms of the covariance matrices associated with different combinations of system response data. Section 2 specifies the general model under study and derives the TE for the model response data. Section 3 and Section 4 present results and concluding remarks.

2. Mathematical Development

In what follows we assume that we have observed the signals

x_{i} (t_{n}), i = 1 \dots M

as the output of a dynamical system and that we have sampled these signals at times

t_{n}, n = 1 \dots N

. The system is assumed to be appropriately modeled as a mixture of deterministic and stochastic components, hence we choose to model each sampled value

x_{i} (t_{n})

as a random variable

X_{i n}

. That is to say, for any particular observation time

t_{n}

we can define a function

P_{X_{i n}} (x_{i} (t_{n}))

that assigns a probability to the event that

X_{i n} < x_{i} (t_{n})

. We further assume that these are continuous random variables and that we may also define the probability density function (PDF)

p_{X_{i n}} (x (t_{n})) = d P_{X_{i n}} / d x_{n}

.

The vector of random variables

X_{i} \equiv (X_{i 1}, X_{i 2}, \dots, X_{i N})

defines a random process and will be used to model the

i^{t h}

signal

x_{i} \equiv x_{i} (t_{n}), n = 1 \dots N

. Using this notation, we can also define the joint PDF

p_{X_{i}} (x_{i})

which specifies the probability of observing such a sequence. In this work we further assume that the random processes are strictly stationary, that is to say the joint PDF obeys

p_{X_{i}} (x_{i} (t_{1}), x_{i} (t_{2}), \dots, x_{i} (t_{N})) = p_{X_{i}} (x_{i} (t_{1} + τ), x_{i} (t_{2} + τ), \dots, x_{i} (t_{N} + τ))

i.e. the joint PDF is invariant to a fixed temporal shift τ.

The joint probability density functions are models that predict the likelihood of observing a particular sequence of values. These same models can be extended to include dynamical effects by including conditional probability,

p_{X_{i n}} (x_{i} (t_{n}) | x_{i} (t_{n - 1}))

, which can be used to specify the probability of observing the value

x_{i} (t_{n})

given that we have already observed

x_{i} (t_{n - 1})

. The idea that knowledge of past observations changes the likelihood of future events is certainly common in dynamical systems. A dynamical system whose output is a repeating sequence of

010101 \dots

is equally likely to be in state 0 or state 1 (probability 0.5) if the system is observed at a randomly chosen time. However, if we know the value at

t_{1} = 0

the value

t_{2} = 1

is known with probability 1. This concept lies at the heart of the

P^{t h}

order Markov model, which by definition obeys

\begin{matrix} p_{X_{i}} (x_{i} (t_{n + 1}) | x_{i} (t_{n}), & x_{i} (t_{n - 1}), x_{i} (t_{n - 2}), \dots, x_{i} (t_{n - P})) = \\ p_{X_{i}} (x_{i} (t_{n + 1}) | x_{i} (t_{n}), x_{i} (t_{n - 1}), x_{i} (t_{n - 2}), \dots, x_{i} (t_{n - P}), x_{i} (t_{n - P - 1}), \dots) \\ \equiv p_{X_{i}} (x_{i} {(t_{n})}^{(1)} | x_{i} {(t_{n})}^{(P)}) . \end{matrix}

(1)

That is to say, the probability of the random variable attaining the value

x_{i} (t_{n + 1})

is conditional on the previous

P

values only. The shorthand notation used here specifies relative lags/advances as a superscript.

Armed with this notation we consider the work of Kaiser and Schreiber [2] and define the continuous transfer entropy between processes

X_{i}

and

X_{j}

as

\begin{matrix} T E_{j \to i} (t_{n}) & = \int_{R^{P + Q + 1}} p_{X_{i}} (x_{i} {(t_{n})}^{(1)} | x_{i}^{(P)} (t_{n}), x_{j}^{(Q)} (t_{n})) \\ \times {log}_{2} (\frac{p_{X_{i}} (x_{i} {(t_{n})}^{(1)} | x_{i}^{(P)} (t_{n}), x_{j}^{(Q)} (t_{n}))}{p_{X_{i}} (x_{i} {(t_{n})}^{(1)} | x_{i}^{(P)})}) d x_{i} (t_{n}^{(1)}) d x_{i} {(t_{n})}^{(P)} d x_{j} {(t_{n})}^{(Q)} \end{matrix}

(2)

where

\int_{R^{N}}

is used to denote the N-dimensional integral over the support of the random variables. By definition, this measure quantifies the ability of the random process

X_{j}

to predict the dynamics of the random process

X_{i}

. To see why, we can examine the argument of the logarithm. In the event that the two random processes are not coupled, the dynamics will obey the Markov model in the denominator of Equation (2). However, should

X_{j}

carry added information about the transition probabilities of

X_{i}

, the numerator is a better model. The transfer entropy is effectively mapping the difference between these hypotheses to the scalar

T E_{j \to i} (t_{n})

. In short, the transfer entropy measures deviations from the hypothesis that the dynamics of

X_{i}

can be described entirely by its own past history and that no new information is gained by considering the dynamics of system

X_{j}

.

Two simplifications are possible which will aid in the evaluation of Equation (2). First, recall that we assumed the processes were stationary such that the joint probability distributions are invariant to the particular temporal location

t_{n}

at which they are evaluated (only relative lags between observations matter). Hence, in what follows we may drop this index from the notation, i.e.,

T E_{j \to i} (t_{n}) \to T E_{j \to i}

. Secondly, we may use the law of conditional probability and expand Equation (2) as

\begin{matrix} T E_{j \to i} & = \int_{R^{P + Q + 1}} p_{X_{i}^{(1)} X_{i} X_{j}} (x_{i}^{(1)}, x_{i}^{(P)}, x_{j}^{(Q)}) {log}_{2} (p_{X_{i}^{(1)} X_{i} X_{j}} (x_{i}^{(1)}, x_{i}^{(P)}, x_{j}^{(Q)})) \\ \times d x_{i}^{(1)} d x_{i}^{(P)} d x_{j}^{(Q)} \\ - \int_{R^{P + Q}} p_{X_{i} X_{j}} (x_{i}^{(P)}, x_{j}^{(Q)}) {log}_{2} (p_{X_{i} X_{j}} (x_{i}^{(P)}, x_{j}^{(Q)})) d x_{i}^{(P)} d x_{j}^{(Q)} \\ - \int_{R^{P + 1}} p_{X_{i}^{(1)} X_{i}} (x_{i}^{(1)}, x_{i}^{(P)}) {log}_{2} (p_{X_{i}^{(1)} X_{i}} (x_{i}^{(1)}, x_{i}^{(P)})) d x_{i}^{(1)} d x_{i}^{(P)} \\ + \int_{R^{P}} p_{X_{i}} (x_{i}^{(P)}) {log}_{2} (p_{X_{i}} (x_{i}^{(P)})) d x_{i}^{(P)} \\ = - h_{X_{i}^{(1)} X_{i}^{(P)} X_{j}^{(Q)}} + h_{X_{i}^{(P)} X_{j}^{(Q)}} + h_{X_{i}^{(1)} X_{i}^{(P)}} - h_{X_{i}^{(P)}} \end{matrix}

(3)

where the terms

h_{X} = - \int_{R^{M}} p_{X} (x) {log}_{2} (p (x)) d x

are the joint differential entropies associated with the

M -

dimensional random variable

X

. In the next section we evaluate Equation (3) among the outputs of a second-order linear system driven with a jointly Gaussian random process.

3. Transfer Entropy (TE) for Second Order Linear Systems

3.1. Time-Delayed TE

The only multivariate probability distribution that readily admits an analytical solution for the differential entropies is the jointly Gaussian distribution. Consider the general case of the two data vectors

x \in^{N}

and the

y \in R^{M}

. The jointly Gaussian model for these data vectors is

\begin{matrix} p_{XY} (x, y) & = \frac{1}{{(2 π)}^{(N + M) / 2} {| C_{XY} |}^{1 / 2}} e^{- \frac{1}{2} x^{T} C_{XY}^{- 1} y} \end{matrix}

(4)

where

C_{XY}

is the

N \times M

covariance matrix and

| \cdot |

takes the determinant. Substituting Equation (4) into the expression for the corresponding differential entropy yields

\begin{matrix} h_{XY} & = - \int_{R^{M \times N}} p_{XY} (x, y) {log}_{2} (p (x, y)) d x d y \\ = \frac{1}{2} {log}_{2} (| C_{XY} |) . \end{matrix}

(5)

Therefore, assuming that both random processes

X_{i}

and

X_{j}

are jointly Gaussian distributed, we may substitute Equation (4) into Equation (3) for each of the differential entropies yielding

\begin{matrix} T E_{j \to i} = \frac{1}{2} {log}_{2} (\frac{| C_{X_{i}^{(P)} X_{j}^{(Q)}} | | C_{X_{i}^{(1)} X_{i}^{(P)}} |}{| C_{X_{i}^{(1)} X_{i}^{(P)} X_{j}^{(Q)}} | | C_{X_{i}} |}) . \end{matrix}

(6)

For

P, Q

large the needed determinants become difficult to compute. We therefore employ a simplification to the model that retains the spirit of the transfer entropy, but that makes an analytical solution more tractable. In our approach, we set

P = Q = 1

i.e., both random processes are assumed to follow a first order Markov model. However, we allow the time interval between the random processes to vary, just as is typically done for the mutual information and/or linear cross-correlation functions [6]. Specifically, we model

X_{i} (t)

as the first order Markov model

p_{X_{i}} (x_{i} (t_{n} + Δ_{t}) | x_{i} (t_{n}))

and use the TE to consider the alternative

p_{X_{i}} (x_{i} (t_{n} + Δ_{t}) | x_{i} (t_{n}), x_{j} (t_{n} + τ))

. Note that in anticipation of dealing with measured data, sampled at constant time interval

Δ_{t}

, we have made the replacement

t_{n + 1} = t_{n} + Δ_{t}

. Although we are only using first order Markov models, by varying the time delay τ we can explore whether or not the random variable

X_{j} (t_{n} + τ)

carries information about the transition probability

p_{X_{i}} (x_{i} (t_{n} + Δ_{t}) | x_{i} (t_{n}))

. Should consideration of

x_{j} (t_{n} + τ)

provide no additional knowledge about the dynamics of

x_{i} (t_{n})

the transfer entropy will be zero, rising to some positive value should

x_{j} (t_{n} + τ)

carry information not possessed in

x_{j} (t_{n})

.

In what follows we refer to this particular form of the TE as the time-delayed transfer entropy, or, TDTE. In this simplified situation the needed covariance matrices are

\begin{matrix} C_{X_{i} X_{j}} (τ) & = [\begin{matrix} E [{(x_{i} (t_{n}) - {\bar{x}}_{i})}^{2}] & E [(x_{i} (t_{n}) - {\bar{x}}_{i}) (x_{j} (t_{n} + τ) - {\bar{x}}_{j})] \\ E [(x_{j} (t_{n} + τ) - {\bar{x}}_{j}) (x_{i} (t_{n}) - {\bar{x}}_{i})] & E [{(x_{j} (t_{n} + τ) - {\bar{x}}_{j})}^{2}] \end{matrix}] \end{matrix}

\begin{matrix} C_{X_{i}^{(1)} X_{i} X_{j}} (τ) & = [\begin{matrix} E [{(x_{i} (t_{n} + Δ_{t}) - {\bar{x}}_{i})}^{2}] & E [(x_{i} (t_{n} + Δ_{t}) - {\bar{x}}_{i}) (x_{i} (t_{n}) - {\bar{x}}_{i})] \\ E [(x_{i} (t_{n}) - {\bar{x}}_{i}) (x_{i} (t_{n} + Δ_{t}) - {\bar{x}}_{i})] & E [{(x_{i} (t_{n}) - {\bar{x}}_{i})}^{2}] \\ E [(x_{j} (t_{n} + τ) - {\bar{x}}_{j}) (x_{i} (t_{n} + Δ_{t}) - {\bar{x}}_{i})] & E [(x_{j} (t_{n} + τ) - {\bar{x}}_{j}) (x_{i} (t_{n}) - {\bar{x}}_{i})] \end{matrix} \end{matrix}

\begin{matrix} \begin{matrix} E [(x_{i} (t_{n} + Δ_{t}) - {\bar{x}}_{i}) (x_{j} (t_{n} + τ) - {\bar{x}}_{j})] \\ E [(x_{i} (t_{n}) - {\bar{x}}_{i}) (x_{j} (t_{n} + τ) - {\bar{x}}_{j})] \\ E [{(x_{j} (t_{n} + τ) - {\bar{x}}_{j})}^{2}] \end{matrix}] \end{matrix}

\begin{matrix} C_{X_{i}^{(1)} X_{i}} & = [\begin{matrix} E [{(x_{i} (t_{n} + Δ_{t}) - {\bar{x}}_{i})}^{2}] & E [(x_{i} (t_{n} + Δ_{t}) - {\bar{x}}_{i}) (x_{i} (t_{n}) - {\bar{x}}_{i})] \\ E [(x_{i} (t_{n}) - {\bar{x}}_{i}) (x_{i} (t_{n} + Δ_{t}) - {\bar{x}}_{i})] & E [{(x_{i} (t_{n}) - {\bar{x}}_{i})}^{2}] \end{matrix}] \end{matrix}

(7)

and

C_{X_{i} X_{i}} = E [{(x_{i} (t_{n}) - {\bar{x}}_{i})}^{2}] \equiv σ_{i}^{2}

is simply the variance of the random process

X_{i}

and

{\bar{x}}_{i}

its mean. The assumption of stationarity also allows to write

E [{(x_{i} (t_{n} + Δ_{t}) - {\bar{x}}_{i})}^{2}] = σ_{i}^{2}

and

E [{(x_{j} (t_{n} + τ) - {\bar{x}}_{j})}^{2}] = σ_{j}^{2}

. Making these substitutions into Equation (6) yields the expression

\begin{matrix} T E_{j \to i} (τ) & = \frac{1}{2} {log}_{2} [\frac{(1 - ρ_{i i}^{2} (Δ_{t})) (1 - ρ_{i j}^{2} (τ))}{1 - ρ_{i j}^{2} (τ) - ρ_{i j}^{2} (τ - Δ_{t}) - ρ_{i i}^{2} (Δ_{t}) + 2 ρ_{i i} (Δ_{t}) ρ_{i j} (τ) ρ_{i j} (τ - Δ_{t})}] \end{matrix}

(8)

where we have defined particular expectations in the covariance matrices using the shorthand

ρ_{i j} (τ) \equiv E [(x_{i} (t_{n}) - {\bar{x}}_{i}) (x_{j} (t_{n} + τ) - {\bar{x}}_{j})] / σ_{i} σ_{j}

. This particular quantity is referred to in the literature as the cross-correlation function [16]. Note that the covariance matrices are positive-definite matrices and that the determinant of a positive definite matrix is positive [17]. Thus the quantity inside the logarithm will always be positive and the logarithm will exist.

Now, the hypothesis that the TE was designed to test is whether or not past values of the process

X_{j}

carry information about the transition probabilities of the second process

X_{i}

. Thus, if we are to keep with the original intent of the measure we would only consider

τ < 0

. However, this restriction is only necessary if one implicitly assumes a non-zero TE means

X_{j}

is influencing the transition

p_{X_{i}} (x_{i} (t_{n} + Δ_{t}) | x_{i} (t_{n}))

as opposed to simply carrying additional information about the transition. Again, this latter statement is a more accurate depiction of what the TE is really quantifying and we have found it useful to consider both negative and positive delays τ in trying to understand coupling among system components.

It is also interesting to note the bounds of this function. Certainly for constant signals (i.e.

x_{i} (t_{n}), x_{j} (t_{n})

are single-valued for all time) we have

ρ_{X_{i} X_{i}} (Δ_{t}) = ρ_{X_{i} X_{j}} (τ) = 0 \forall τ

and the transfer entropy is zero for any choice of time-scales τ defining the Markov processes. Knowledge of

X_{j}

does not aid in forecasting

X_{i}

simply because the transition probability in going from

x_{i} (t_{n})

to

x_{i} (t_{n} + Δ_{t})

is always unity. Likewise, if there is no coupling between system components we have

ρ_{X_{i} X_{j}} (τ) = 0

and the TDTE becomes

T E_{j \to i} (τ) = \frac{1}{2} {log}_{2} [\frac{1 - ρ_{X_{i} X_{i}}^{2} (Δ_{t})}{1 - ρ_{X_{i} X_{i}}^{2} (Δ_{t})}] = 0

. At the other extreme, for perfectly coupled systems i.e.

X_{i} = X_{j}

, consider

τ \to 0

. In this case, we have

ρ_{X_{i} X_{j}}^{2} (τ) \to 1

, and

ρ_{X_{i} X_{j}} (τ - Δ_{t}) \to ρ_{X_{i} X_{i}} (- Δ_{t}) = ρ_{X_{i} X_{i}} (Δ_{t})

(in this last expression we have noted the symmetry of the function

ρ_{X_{i} X_{i}} (τ)

with respect to the time-delay). The transfer entropy then becomes

\begin{matrix} T E_{j \to i} (0) & = \frac{1}{2} {log}_{2} [\frac{0}{0}] \to 0 \end{matrix}

(9)

and the random process

X_{j}

at

τ = 0

is seen to carry no additional information about the dynamics of

X_{i}

simply due to the fact that in this special case we have

p_{X_{i}} (x_{i} (t_{n} + Δ_{t}) | x_{i} (t_{n})) = p_{X_{i}} (x_{i} (t_{n} + Δ_{t}) | x_{i} (t_{n}), x_{i} (t_{n}))

. These extremes highlight the care that must be taken in interpreting the transfer entropy. Because the TDTE is zero for both the perfectly coupled and uncoupled case we must not interpret the measure to quantify the coupling strength between two random processes. Rather, the TDTE measures the additional information provided by one random process about the dynamics of another.

We should point out that the average mutual information function can resolve the ambiguity in the TDTE as a measure of coupling strength. For two Gaussian random processes the time-delayed mutual information is known to be

I_{X_{i} X_{j}} (τ) = - \frac{1}{2} {log}_{2} [1 - ρ_{i j}^{2} (τ)]

. Hence, for perfect coupling

I_{X_{i} X_{j}} (0) \to \infty

whereas for uncoupled systems

I_{X_{i} X_{j}} (0) \to 0

. Estimating both time-delayed mutual information and transfer entropies can therefore permit stronger inference about dynamical coupling.

3.2. Analytical Cross-Correlation Function

To fully define the TDTE, the auto- and cross-correlation functions

ρ_{i i} (T), ρ_{i j} (T)

are required. They are derived here for a general class of linear system found frequently in the modeling and analysis of physical processes. Consider the system

\begin{matrix} M \ddot{x} (t) + C \dot{x} (t) + K x (t) = f (t) \end{matrix}

(10)

where

x (t) \equiv {(x_{1} (t), x_{2} (t), \dots, x_{M} (t))}^{T}

is the system’s response to the forcing function(s)

f (t) \equiv {(f_{1} (t), f_{2} (t), \dots, f_{M} (t))}^{T}

and

M, C, K

are

M \times M

constant coefficient matrices that capture the system’s physical properties. Thus, we are considering a second-order, constant coefficient,

M -

degree-of-freedom (DOF) linear system. It is assumed that we may measure the response of this system at any of the DOFs and/or the forcing functions.

One physical embodiment of this system is shown schematically in Figure 1. Five masses are coupled together via restoring elements

k_{i}

(springs) and dissipative elements,

c_{i}

(dash-pots). The first mass is fixed to a boundary while the driving force is applied at the end mass. If the response data

x (t)

are each modeled as a stationary random process we may use the analytical TDTE to answer questions about shared information between any two masses. We can explore this relationship as a function of coupling strength and also which particular mass response data we choose to analyze.

Figure 1. Physical system modeled by Equation (10). Here, an

M = 5

DOF structure is represented by masses coupled together via both restoring and dissipative elements. Forcing is applied at the end mass.

Figure 1. Physical system modeled by Equation (10). Here, an

M = 5

DOF structure is represented by masses coupled together via both restoring and dissipative elements. Forcing is applied at the end mass.

However, before proceeding we require a general expression for the cross-correlation between any two DOFs,

i, j \in [1, M]

. In other words, we require the expectation

E [x_{i} (n) x_{j} (n + T)]

for any combination of

i, j

. Such an expression can be obtained by first transforming coordinates. Let

x (t) = u η (t)

where the matrix

u

contain the non-trivial solutions to the eigen-value problem

| M^{- 1} K - ω_{i}^{2} I | u_{i} = 0

as its columns [18]. Here the eigen-values are the natural frequencies of the system, denoted

ω_{i}, i = 1 \dots M

. Making the above coordinate transformation, substituting into Equation (10) and then pre-multiplying both sides by

u^{T}

allows the equations of motion to be uncoupled and written separately as

\begin{matrix} {\ddot{η}}_{i} (t) + 2 ζ_{i} ω_{i} {\dot{η}}_{i} (t) + ω_{i}^{2} η_{i} (t) = u_{i}^{T} f (t) \equiv q_{i} (t) . \end{matrix}

(11)

where the eigenvectors have been normalized such that

u^{T} M u = I

(the identity matrix). In the above formulation we have also made the assumption that

C = α K

i.e., the dissipative coupling

C \dot{x} (t)

is of the same form as the restoring term, albeit scaled by the constant

α < < 1

(i.e., a lightly damped system). To obtain the form shown in Equation (11) we introduce the dimensionless damping coefficient

ζ_{i} = \frac{α}{2} ω_{i}

.

The general solution to these un-coupled, linear equations is well-known [18] and can be written as the convolution

\begin{matrix} η_{i} (t) & = \int_{0}^{\infty} h_{i} (θ) q_{i} (t - θ) d θ \end{matrix}

(12)

where

h (θ)

is the impulse response function

\begin{matrix} h_{i} (θ) = \frac{1}{ω_{d i}} e^{- ζ_{i} ω_{i} θ} sin (ω_{d i} θ) \end{matrix}

(13)

and

ω_{d i} \equiv ω_{i} \sqrt{1 - ζ_{i}^{2}}

. In general terms, we therefore have

\begin{matrix} x_{i} (t) & = \sum_{l = 1}^{M} u_{i l} η_{l} (t) \end{matrix}

\begin{matrix} = \int_{0}^{\infty} \sum_{l = 1}^{M} u_{i l} h_{l} (θ) q_{l} (t - θ) d θ \end{matrix}

(14)

If we further consider the excitation

f (t)

to be a zero-mean random process, so too will be

q_{l} (t)

. Using this model, we may construct the covariance

\begin{matrix} E [x_{i} (t) x_{j} (t + τ)] & = \\ E [\int_{0}^{\infty} \int_{0}^{\infty} \sum_{l = 1}^{M} \sum_{m = 1}^{M} u_{i l} u_{j m} h_{l} (θ_{1}) h_{m} (θ_{2}) q_{l} (t - θ_{1}) q_{m} (t + τ - θ_{2}) d θ_{1} d θ_{2}] \\ = \int_{0}^{\infty} \int_{0}^{\infty} \sum_{l = 1}^{M} \sum_{m = 1}^{M} u_{i l} u_{j m} h_{l} (θ_{1}) h_{m} (θ_{2}) E [q_{l} (t - θ_{1}) q_{m} (t + τ - θ_{2})] d θ_{1} d θ_{2} \end{matrix}

(15)

which is a function of the eigen-vectors

u_{i}

, the impulse response function

h (\cdot)

and the covariance of the modal forcing matrix. Knowledge of this covariance matrix can be obtained from knowledge of the forcing covariance matrix

R_{F_{l} F_{m}} (τ) \equiv E [f_{l} (t) f_{m} (t + τ)]

. Recalling that

\begin{matrix} q_{l} (t) = \sum_{p = 1}^{M} u_{l p} f_{p} (t) \end{matrix}

(16)

we write

\begin{matrix} E [q_{l} (t - θ_{1}) q_{m} (t + τ - θ_{2})] & = \sum_{p = 1}^{M} \sum_{q = 1}^{M} u_{l q} u_{m p} E [f_{q} (t - θ_{1}) f_{p} (t + τ - θ_{2})] \end{matrix}

(17)

It is assumed that the random vibration inputs are uncorrelated, i.e.

E [f_{q} (t) f_{p} (t)] = 0 \forall q \neq p

, with variance

σ_{F_{p}}^{2} = E [f_{p} (t) f_{p} (t)]

. Thus, the above can therefore be simplified as

\begin{matrix} E [q_{l} (t - θ_{1}) q_{m} (t + τ - θ_{2})] & = \sum_{p = 1}^{M} u_{l p} u_{m p} E [f_{p} (t - θ_{1}) f_{p} (t + τ - θ_{2})] \end{matrix}

(18)

The most common linear models assume the input is applied at a single DOF, i.e.

f_{p} (t)

is non-zero only for

p = P

. For a load applied at DOF P, the auto-covariance becomes

\begin{matrix} E [x_{i} (t) x_{j} (t + τ)] & = \int_{0}^{\infty} \int_{0}^{\infty} \sum_{l = 1}^{M} \sum_{m = 1}^{M} u_{i l} u_{j m} u_{l P} u_{m P} h_{l} (θ_{1}) h_{m} (θ_{2}) E [f_{P} (t - θ_{1}) f_{P} (t + τ - θ_{2})] d θ_{1} d θ_{2} \\ = \sum_{l = 1}^{M} \sum_{m = 1}^{M} u_{i l} u_{j m} u_{l P} u_{m P} \int_{0}^{\infty} h_{l} (θ_{1}) \int_{0}^{\infty} h_{m} (θ_{2}) E [f_{P} (t - θ_{1}) f_{P} (t + τ - θ_{2})] d θ_{2} d θ_{1} . \end{matrix}

(19)

The inner integral can be further evaluated as

\begin{matrix} \int_{0}^{\infty} h_{m} (θ_{2}) E [f_{P} (t - θ_{1}) f_{P} (t + τ - θ_{2})] d θ_{2} & = \int_{0}^{\infty} h_{m} (θ_{2}) \int_{- \infty}^{\infty} S_{F F} (ω) e^{i ω (τ - θ_{2} + θ_{1})} d ω d θ_{2} . \end{matrix}

(20)

Note that we have re-written the forcing auto-covariance as the inverse Fourier transform of the associated power spectral density function, denoted

S_{F F} (ω)

, via the well-known Wiener-Khinchine relation [16]. We have already assumed the forcing is comprised of independent, identically distributed values, in which case the forcing power spectral density

S_{F F} (ω) = c o n s t \forall ω

. Denoting this constant

S_{F F} (0)

, we note that the Fourier Transform of a constant is simply

\int_{- \infty}^{\infty} S_{F F} (0) \times e^{i ω t} d t = S_{F F} (0) \times δ (t)

, hence our integral becomes

\begin{matrix} \int_{0}^{\infty} h_{m} (θ_{2}) & E [f_{P} (t - θ_{1}) f_{P} (t + τ - θ_{2})] d θ_{2} \end{matrix}

\begin{matrix} = \int_{0}^{\infty} h_{m} (θ_{2}) S_{F F} (0) δ (τ - θ_{2} + θ_{1}) d θ_{2} = h (τ + θ_{1}) S_{F F} (0) . \end{matrix}

(21)

Returning to Equation (19) we have

\begin{matrix} E [x_{i} (t) x_{j} (t + τ)] & = \int_{0}^{t} \sum_{l = 1}^{M} \sum_{m = 1}^{M} u_{l P} u_{m P} u_{i l} u_{j m} h_{l} (θ_{1}) h_{m} (θ_{1} + τ) S_{F F} (0) d θ_{1} . \end{matrix}

(22)

At this point we can simplify the expression by carrying out the integral.

Substituting the expression for the impulse response in Equation (13), the needed expectation in Equation (22) becomes [19,20]

\begin{matrix} R_{X_{i} X_{j}} (τ) & = \frac{S_{F F} (0)}{4} \sum_{l = 1}^{M} \sum_{m = 1}^{M} u_{l P} u_{m P} u_{i l} u_{j m} [A_{l m} e^{- ζ_{m} ω_{m} τ} cos (ω_{d m} τ) + B_{l m} e^{- ζ_{m} ω_{m} τ} sin (ω_{d m} τ)] \end{matrix}

(23)

where

\begin{matrix} A_{l m} & = \frac{8 (ω_{l} ζ_{l} + ω_{m} ζ_{m})}{ω_{l}^{4} + ω_{m}^{4} + 4 ω_{l}^{3} ω_{m} ζ_{l} ζ_{m} + 4 ω_{m}^{3} ω_{l} ζ_{l} ζ_{m} + 2 ω_{m}^{2} ω_{l}^{2} (- 1 + 2 ζ_{l}^{2} + 2 ζ_{m}^{2})} \end{matrix}

\begin{matrix} B_{l m} & = \frac{4 (ω_{l}^{2} + 2 ω_{l} ω_{m} ζ_{l} ζ_{m} + ω_{m}^{2} (- 1 + 2 ζ_{m}^{2}))}{ω_{d m} (ω_{l}^{4} + ω_{m}^{4} + 4 ω_{l}^{3} ω_{m} ζ_{l} ζ_{m} + 4 ω_{m}^{3} ω_{l} ζ_{l} ζ_{m} + 2 ω_{m}^{2} ω_{l}^{2} (- 1 + 2 ζ_{l}^{2} + 2 ζ_{m}^{2}))} \end{matrix}

(24)

We can further normalize this function to give

\begin{matrix} ρ_{i j} (τ) = R_{i j} (τ) / \sqrt{R_{i i} (0) R_{j j} (0)} \end{matrix}

(25)

for the normalized auto- and cross-correlation functions.

It will also prove instructive to study the TDTE between the drive and response. This requires

R_{X_{i} F_{P}} (τ) \equiv E [x_{i} (t) f_{P} (t + τ)]

. Following the same procedure as above results in the expression

\begin{matrix} R_{X_{i} F_{P}} (τ) & = \{\begin{matrix} S_{F F} (0) \sum_{m = 1}^{M} u_{i m} u_{m P} h_{m} (- τ) & : & τ \leq 0 \\ 0 & : & τ > 0 \end{matrix} \end{matrix}

(26)

Normalizing by the variance of the random process

X_{i}

and assuming

σ_{F_{P}}^{2} = 1

yields the needed correlation function

ρ_{i f} (τ)

. This expression may be substituted into the expression for the transfer entropy to yield the TDTE between drive and response. At this point we have completely defined the analytical TDTE for a broad class of second order linear systems. The behavior of this function is described next. Before concluding this section we note that it also may be possible to derive expressions for the TDTE for different types of forcing functions. Impulse excitation and also non-Gaussian inputs where the marginal PDF can be described as a polynomial transformation of a Gaussian random variable (see e.g., [21]) are two such possibilities.

4. Behavior of the TDTE

Before proceeding with an example, we first require a means of estimating

T E_{j \to i} (τ)

from observed data. Assume we have recorded the signals

x_{i} (n Δ_{t}), x_{j} (n Δ_{t}), n = 1 \dots N

with a fixed sampling interval

Δ_{t}

. In order to estimate the TDTE we require a means of estimating the normalized correlation functions

ρ_{i j} (τ)

which can be substituted into Equation (8). While different estimators of correlation functions exist (see e.g., [16]), we use a frequency domain estimator. This estimator relies on the assumption that the observed data are the output of an ergodic (therefore stationary) random process. If we further assume that the correlation functions are absolute integrable, e.g.,

\int | R_{i j} (τ) d τ | < \infty

, the Wiener-Khinchin Theorem tells us that the cross-spectral density and cross-covariance functions are related via Fourier transform as [16].

\begin{matrix} \int_{- \infty}^{\infty} E [x_{i} (t) x_{j} (t + τ)] e^{- i 2 π f τ} d τ = S_{X_{j} X_{i}} (f) \equiv lim_{T \to \infty} E [\frac{X_{i}^{*} (f) X_{j} (f)}{2 T}] . \end{matrix}

(27)

where

X_{i} (f)

denotes the Fourier transform of the signal

x_{i} (t)

. One approach is to therefore estimate the spectral density

{\hat{S}}_{X_{j} X_{i}} (f)

and then inverse Fourier transform to give

{\hat{R}}_{X_{i} X_{j}} (τ)

. We further rely on the ergodic theorem of Birkhoff ([22]) which (when applied to probability) allows one to write expectations defined over multiple realizations to be well-approximated temporally averaging over a finite number of samples. More specifically, we divide the temporal sequences

x_{i} (n), x_{j} (n), n = 1 \dots N

into S segments of length

N_{s}

(possibly) overlapping by L points. Taking the discrete Fourier transform of each segment, e.g.,

X_{i s} (k) = \sum_{n = 0}^{N_{s} - 1} x_{i} (n + s N_{s} - L) e^{- i 2 π k n / N_{s}}, s = 0 \dots S - 1

and averaging gives the estimator

\begin{matrix} {\hat{S}}_{X_{j} X_{i}} (k) = \frac{Δ_{t}}{N_{s} S} \sum_{s = 0}^{S - 1} {\hat{X}}_{i s}^{*} (k) {\hat{X}}_{j s} (k) \end{matrix}

(28)

at discrete frequency k. This quantity is then inverse discrete Fourier transformed to give

\begin{matrix} {\hat{R}}_{X_{i} X_{j}} (n) = \sum_{k = 0}^{N_{s} - 1} {\hat{S}}_{X_{j} X_{i}} (k) e^{i 2 π k n / S} . \end{matrix}

(29)

Finally, we may normalize the estimate to give the cross-correlation coefficient

\begin{matrix} {\hat{ρ}}_{X_{i} X_{j}} (n) = {\hat{R}}_{X_{i} X_{j}} (n) / \sqrt{{\hat{R}}_{X_{i} X_{i}} (0) {\hat{R}}_{X_{j} X_{j}} (0)} . \end{matrix}

(30)

This estimator is asymptotically consistent and unbiased and can therefore be substituted into Equation (8) to produce very accurate estimates of the TE (see examples to follow). In the general (nonlinear) case, kernel density estimators are typically used but are known to be poor in many cases, particularly when data are scarce (see e.g., [6,23]). We also point out that for this study stationarity (and ergodicity) only up to second order (covariance) is required. In general the TDTE is a function of all joint moments hence higher-order ergodicity must be assumed.

As an example, consider a five-DOF system governed by Equation (10), where:

\begin{matrix} M & = [\begin{matrix} m_{1} & 0 & 0 & 0 & 0 \\ 0 & m_{2} & 0 & 0 & 0 \\ 0 & 0 & m_{3} & 0 & 0 \\ 0 & 0 & 0 & m_{4} & 0 \\ 0 & 0 & 0 & 0 & m_{5} \end{matrix}] \end{matrix}

\begin{matrix} C & = [\begin{matrix} c_{1} + c_{2} & - c_{2} & 0 & 0 & 0 \\ - c_{2} & c_{2} + c_{3} & - c_{3} & 0 & 0 \\ 0 & - c_{3} & c_{3} + c_{4} & - c_{4} & 0 \\ 0 & 0 & - c_{4} & c_{4} + c_{5} & - c_{5} \\ 0 & 0 & 0 & - c_{5} & c_{5} \end{matrix}] \end{matrix}

\begin{matrix} K & = [\begin{matrix} k_{1} + k_{2} & - k_{2} & 0 & 0 & 0 \\ - k_{2} & k_{2} + k_{3} & - k_{3} & 0 & 0 \\ 0 & - k_{3} & k_{3} + k_{4} & - k_{4} & 0 \\ 0 & 0 & - k_{4} & k_{4} + k_{5} & - k_{5} \\ 0 & 0 & 0 & - k_{5} & k_{5} \end{matrix}] \end{matrix}

(31)

are constant coefficient matrices commonly used to describe structural systems. In this case, these particular matrices describe the motion of a cantilevered structure where we assume a joint normally distributed random process applied at the end mass, i.e.

f (t) = (0, 0, 0, 0, N (0, 1))

. In this first example we examine the TDTE between response data collected from two different points on the structure. We fix

m_{i} = 0.01 k g

,

c_{i} = 0.1 N \cdot s / m

, and

k_{i} = 10 N / m

for each of the

i = 1 \dots 5

degrees of freedom (thus we are using

α = 0.01

in the modal damping model

C = α K

). The system response data

x_{i} (n Δ_{t}), n = 1 \dots 2^{15}

to the stochastic forcing is then generated via numerical integration. For simulation purposes we used a time-step of

Δ_{t} = 0.01 s

which is sufficient to capture all five of the system natural frequencies (the lowest of which is

ω_{1} = 9.00

rad/s). Based on these parameters, we generated the analytical expressions

T E_{3 \to 2} (τ)

and

T E_{2 \to 3} (τ)

and also

T E_{5 \to 1} (τ)

and

T E_{1 \to 5} (τ)

for illustrative purposes. These are shown in Figure 2 along with the estimates formed using the Fourier transform-based procedure. In forming the estimates we used

L = 0

,

S = 2^{3}

,

N_{s} = 2^{12}

, resulting in low bias and variance, and providing curves that are in very close agreement with theory.

With Figure 2 in mind, first consider negative delays only where

τ < 0

. Clearly, the further the random variable

X_{j} (t_{n} + τ)

is from

X_{i} (t_{n})

, the less information it carries about the probability of

X_{i}

transitioning to a new state

Δ_{t}

seconds into the future. This is to be expected from a stochastically driven system and accounts for the decay of the transfer entropy to zero for large

| τ |

. However, we also see periodic returns to the point

T E_{j \to i} (τ) = 0

for even small temporal separation. Clearly this is a reflection of the periodicity observed in second order linear systems. In fact, for this system the dominant period of oscillation is

2 π / ω_{1} = 0.698

seconds. It can be seen that the argument of the logarithm in Equation (8) periodically reaches a minimum value of unity at precisely half this period, thus we observe zeros of the TDTE at times

(i - 1) \times π / ω_{1}, i = 1 \dots

. In this case the TDTE is going to zero not because the random variables

X_{j} (t_{n} + τ), X_{i} (t_{n})

are unrelated, but because knowledge of one allows us to exactly predict the position of the other (no additional information is present). We believe this is likely to be a feature of most systems possessing an underlying periodicity and is one reason why using the TE as a measure of coupling must be done with care.

Figure 2. Time delay transfer entropy between masses two and three (top row) and one and five (bottom row) of a 5 DOF system driven at mass,

P = 5

.

Figure 2. Time delay transfer entropy between masses two and three (top row) and one and five (bottom row) of a 5 DOF system driven at mass,

P = 5

.

One possible way to eliminate this feature is to condition the measure on more of the signal’s past history. In fact, several papers (see e.g., [9,13]) mention the importance of conditioning on the full state vector

X_{j} (t_{n} - τ_{1}), X_{j} (t_{n} - τ_{2}), \dots, X_{j} (t_{n} - τ_{d})

where d is the dimensionality (in a loose sense, the number of dynamical degrees of freedom) of the random process

X_{j}

. Building in more past history would almost certainly remove the oscillations as some of the past observations would always be providing additional predictive power. However, building in more history significantly complicates the ability to derive closed-form expressions. Moreover, for this simple linear system the basic envelope of the TDTE curves would not likely be effected by altering the model in this way.

We also point out that values of the TDTE are non-zero for positive delays as well. Again, so long as we interpret the TE as a measure of predictive power this makes sense. That is to say, future values

X_{j}

can aid in predicting the current dynamics of

X_{i}

. Interestingly, the asymmetry in the TE peaks near

τ = 0

may provide the largest clue as to the location of the forcing signal. Consistently we have found that the TE is larger for negative delays when mass closest the driven end plays the role of

X_{j}

; conversely it is larger for positive delays when the mass furthest from the driven end plays this role. So long as the coupling is bi-directional, results such as those shown in Figure 2 can be expected in general.

However, the situation is quite different if we consider the case of uni-directional coupling. For example, we may consider

T E_{f \to i} (τ)

, i.e. the TDTE between the forcing signal and response variable i. This is a particularly interesting case as, unlike in previous examples, there is no feedback from DOF i to the driving signal. Figure 3 shows the TDTE between drive and response and clearly highlights the directional nature of the coupling. Past values of the forcing function clearly help in predicting the dynamics of the response. Conversely, future values of the forcing say nothing about transition probabilities for the mass response simply because the mass has not “seen” that information yet. Thus, for uni-directional coupling, the TDTE can easily diagnose whether

X_{j}

is driving

X_{i}

or vice-versa. It can also be noticed from these plots that the drive signal is not that much help in predicting the response as the TDTE is much smaller in magnitude that when computed between masses. We interpret this to mean that the response data are dominated by the physics of the structure (e.g., the structural modes), which is information not carried in the drive signal. Hence, the drive signal offers little in the way of additional predictive power. While the drive signal puts energy into the system, it is not very good at predicting the response. It should also be pointed out that the kernel density estimation techniques are not able to capture these small values of the TDTE. The error in such estimates is larger than these subtle fluctuations. Only the “linearized” estimator is able to capture the fluctuations in the TDTE for small (

O (10^{- 2})

) values.

Figure 3. Time delay transfer entropy between the forcing (denoted as DOF “0”) and mass three for the 5 DOF system driven at mass,

P = 5

. The plot is consistent with the interpretation of information moving from the forcing to mass three.

Figure 3. Time delay transfer entropy between the forcing (denoted as DOF “0”) and mass three for the 5 DOF system driven at mass,

P = 5

. The plot is consistent with the interpretation of information moving from the forcing to mass three.

It has been suggested that the main utility of the TE is to, given a sequence of observations, assess the direction of information flow in a coupled system. More specifically, one computes the difference

T E_{i \to j} - T E_{j \to i}

with a positive difference suggesting information flow from i to j (negative differences indicating the opposite) [2,4]. In the system modeled by Equation (10) one would heuristically understand the information as flowing from the drive signal to the response. This is certainly reinforced by Figure 3. However, by extension it might seem probable that information would similarly flow from the mass closest the drive signal to the mass closest the boundary (e.g., DOF 5 to DOF 1).

Figure 4. Difference in time delay transfer entropy between the driven mass five and each other DOF as a function of

k_{3}

. A positive difference indicates

T E_{i \to j} > T E_{j \to i}

and is commonly used to indicate that information is moving from mass i to mass j. Based on this interpretation, negative values indicate information moving from the driven end to the base; positive values indicate the opposite. Even for this linear system, choosing different masses in the analysis can produce very different results. In fact,

T E_{2 \to 5} - T E_{5 \to 2}

implies a different direction of information transfer, depending on the strength of the coupling,

k_{3}

Figure 4. Difference in time delay transfer entropy between the driven mass five and each other DOF as a function of

k_{3}

. A positive difference indicates

T E_{i \to j} > T E_{j \to i}

and is commonly used to indicate that information is moving from mass i to mass j. Based on this interpretation, negative values indicate information moving from the driven end to the base; positive values indicate the opposite. Even for this linear system, choosing different masses in the analysis can produce very different results. In fact,

T E_{2 \to 5} - T E_{5 \to 2}

implies a different direction of information transfer, depending on the strength of the coupling,

k_{3}

We test this hypothesis as a function of the coupling strength between masses. Fixing each stiffness and damping coefficient to the previously used values, we vary

k_{3}

from

1 N / m

to

40 N / m

and examine the quantity

T E_{i \to j} - T E_{j \to i}

evaluated at

τ^{*}

, taken as the delay at which the TDTE reaches its maximum. Varying

k_{3}

slightly alters the dominant period of the response. By accounting for this shift we eliminate the possibility of capturing the TE at one of its nulls (see Figure 2). For example, in Figure 2 we see that

τ^{*} = - 0.15

in the plot of

T E_{3 \to 2} (τ)

. Figure 4 shows the difference in TDTE as a function of the coupling strength. The result is non-intuitive if one assumes information would move from driven end toward the non-driven end of the system. For certain DOFs this interpretation holds, for others, it does not. Herein lies the difficulty in interpreting the TE when bi-directional coupling exists. This was also pointed out by Schreiber [1] who noted “Reducing the analysis to the identification of a “drive" and a “response" may not be useful and could even be misleading”. The above results certainly reinforce this statement.

Figure 5. Difference in time-delayed transfer entropy (TDTE) among different combinations of masses. By the traditional interpretation of TE, negative values indicate information moving from the driven end to the base; positive values indicate the opposite.

Rather than being viewed as a measure of information flow, we find it more useful to interpret the difference measure as simply one of predictive power. That is to say, does knowledge of system j help predict system i more so than i helps predict j. This is a slightly different question. Our analysis suggests that if

X_{i}

and

X_{j}

are both near the driven end but with DOF i the closer of the two , then knowledge of

X_{j}

is of more use in predicting

X_{i}

than vice-versa. This interpretation also happens to be consistent with the notion of information moving from the driven end toward the base. However as i and j become de-coupled (physically separated) it appears the reverse is true. The random process

X_{i}

is better at predicting

X_{j}

than

X_{j}

is in predicting

X_{i}

. Thus, for certain pairs of masses information seems to be traveling from the base toward the drive. One possible explanation is that because the mass

X_{i}

is further removed from the drive signal it is strongly influenced by the vibration of each of the other masses. By contrast, a mass near the driven end is strongly influenced only by the drive signal. Because the dynamics

X_{i}

are influenced heavily by the structure (as opposed to the drive),

X_{i}

does a good job in helping to predict the dynamics everywhere. The main point of this analysis is that the difference in TE is not at all an unambiguous measure of the direction of information flow.

To further explore this question, we have repeated this numerical experiment for all possible combinations of masses. These results are displayed in Figure 5 where the same basic phenomenology is observed. If both masses being analyzed are near the driven end, the mass closest the drive is a better predictor of the one that is further away. However again, as i and j become decoupled the reverse is true. Our interpretation is that the further the process is removed from the drive signal, the more it is dominated by the other mass dynamics and the boundary conditions. Because such a process is strongly influenced by the other DOFs, it can successfully predict the motion for these other DOFs.

It is also interesting to note how the strength, and even directionality (sign) of the difference in TDTE changes with variations in a single stiffness element. Depending on the value of

k_{3}

we see changes in which of the two masses is a better predictor. In some cases we even see zero TDTE difference, implying that the dynamics of the constituent signals are equally useful in predicting one another. Again, this does not support our intuitive notion of what it means for information to travel through a structural system. Only in the case of uni-directional coupling can we unambiguously use the TE to indicate directionality of information transport.

One of the strengths of our analysis is that these conclusions are not influenced by estimation error. In studying heart and breath rate interactions, for example, the ambiguity in information flow was assigned to difficulties in the estimation process [2]. We have shown here that even when estimation error is not a factor the ambiguity remains. We would imagine a similar result would hold for more complex systems, however such systems are beyond our ability to develop analytical expressions. The difference in TDTE is, however, a useful indicator of which system component carries the most predictive power about the rest of the system dynamics.

In short, the TDTE can be a very useful descriptor of system dynamics and coupling among system components. However any real understanding is only likely to be obtained in the context of a particular system model, or class of models (e.g., linear). Absent physical insight into the process that generates the observations, understanding results of a TDTE analysis can be challenging at best.

However, it is perhaps worth mentioning that the expressions derived here might permit inference about the general form of the underlying “linearized” system model. Different linear system models yield different expressions for

ρ_{i j} (τ)

, hence different expressions for the TDTE. One could then conceivably use estimates of the TDTE as a means to select among this class of models given observed data. Whether or not the TDTE is of use in the context of model selection remains to be seen.

5. Conclusions

In this work we have derived an analytical expression for the time-delayed transfer entropy (TDTE) among components of a broad class of second order linear systems driven by a jointly Gaussian input. This solution has proven particularly useful in understanding the behavior of the TDTE as a measure of dynamical coupling. In particular, when the coupling is uni-directional, we have found the TDTE to be an unambiguous indicator of the direction of information flow in a system. However, for bi-directional coupling the situation is significantly more complicated, even for linear systems. We have found that a heuristic understanding of information flow is not always accurate. For example, one might expect information to travel from the driven end of a system toward the non-driven end. In fact, we have shown precisely the opposite to be true. Simply varying a linear stiffness element can cause the apparent direction of flow to change. It would seem a safer interpretation is that a positive difference in the transfer entropy between two system components tells the practitioner which component has the greater predictive power.

Acknowledgements

The authors would like to thank the Naval Research Laboratory for providing funding for this work.

Conflict of Interest

The authors declare no conflict of interest

References

Schreiber, T. Measuring information transfer. Phys. Rev. Lett. 2000, 85, 461–464. [Google Scholar] [CrossRef] [PubMed]
Kaiser, A.; Schreiber, T. Information transfer in continuous processes. Phys. D: Nonlinear Phenom. 2002, 166, 43–62. [Google Scholar] [CrossRef]
Moniz, L.J.; Cooch, E.G.; Ellner, S.P.; Nichols, J.D.; Nichols, J.M. Application of information theory methods to food web reconstruction. Ecol. Model. 2007, 208, 145–158. [Google Scholar] [CrossRef]
Bauer, M.; Cox, J.W.; Caveness, M.H.; Downs, J.J.; Thornhill, N.F. Finding the direction of disturbance propagation in a chemical process using transfer entropy. IEEE Trans. Control Syst. Technol. 2007, 15, 12–21. [Google Scholar] [CrossRef]
Marschinski, R.; Kantz, H. Analysing the information flow between financial time series: An improved estimator for transfer entropy. Eur. Phys. J. B 2002, 30, 275–281. [Google Scholar] [CrossRef]
Nichols, J.M. Examining structural dynamics using information flow. Probab. Eng. Mech. 2006, 21, 420–433. [Google Scholar] [CrossRef]
Nichols, J.M.; Seaver, M.; Trickey, S.T.; Salvino, L.W.; Pecora, D.L. Detecting impact damage in experimental composite structures: An information-theoretic approach. Smart Mater. Struct. 2006, 15, 424–434. [Google Scholar] [CrossRef]
Moniz, L.J.; Nichols, J.D.; Nichols, J.M. Mapping the information landscape: Discerning peaks and valleys for ecological monitoring. J. Biol. Phys. 2007, 33, 171–181. [Google Scholar] [CrossRef] [PubMed]
Vicente, R.; Wibral, M.; Lindner, M.; Pipa, G. Transfer entropy-a model-free measure of effective connectivity for the neurosciences. J. Comput. Neurosci. 2001, 30, 45–67. [Google Scholar] [CrossRef] [PubMed]
Wibral, M.; Rahm, B.; Rieder, M.; Lindner, M.; Vicente, R.; Kaiser, J. Transfer entropy in magnetoencephalographic data: Quantifying information flow in cortical and cerebellar networks. Progr. Biophys. Mol. Biol. 2011, 105, 80–97. [Google Scholar] [CrossRef] [PubMed]
Vakorin, V.A.; Krakovska, O.A.; McIntosh, A.R. Confounding effects of indirect connections on causality estimation. J. Neurosci. Method. 2009, 184, 152–160. [Google Scholar] [CrossRef] [PubMed]
Barnett, L.; Barrett, A.B.; Seth, A.K. Granger causality and transfer entropy are equivalent for gaussian variables. Phys. Rev. Lett. 2009, 103, 238701. [Google Scholar] [CrossRef] [PubMed]
Palus, M.; Vejmelka, M. Directionality of coupling from bivariate time series: How to avoid false causalities and missed connections. Phys. Rev. E 2007, 75, 056211. [Google Scholar] [CrossRef] [PubMed]
Friston, K.J.; Harrison, L.; Penny, W. Dynamic causal modelling. NeuroImage 2003, 19, 1273–1302. [Google Scholar] [CrossRef]
Greenberg, M.D. Advanced Engineering Mathematics; Prentice-Hall, Inc.: Englewood Cliffs, NJ, USA, 1988. [Google Scholar]
Bendat, J.S.; Piersol, A.G. Random Data Analysis and Measurement Procedures, 3rd ed.; Wiley & Sons: New York, NY, USA, 2000. [Google Scholar]
Kay, S.M. Fundamentals of Statistical Signal Processing: Volume I, Estimation Theory; Prentice Hall: New Jersey, NJ, USA, 1993. [Google Scholar]
Meirovitch, L. Introduction to Dynamics and Control, 1st ed.; Wiley & Sons.: New York, NY, USA, 1985. [Google Scholar]
Crandall, S.H.; Mark, W.D. Random Vibration in Mechanical Systems; Academic Press: New York, NY, USA, 1963. [Google Scholar]
Benaroya, H. Mechanical Vibration: Analysis, Uncertainties, and Control; Prentice Hall: New Jersey, NJ, USA, 1998. [Google Scholar]
Nichols, J.M.; Olson, C.C.; Michalowicz, J.V.; Bucholtz, F. The bispectrum and bicoherence for quadratically nonlinear systems subject to non-gaussian inputs. IEEE Trans. Signal Process. 2009, 57, 3879–3890. [Google Scholar] [CrossRef]
Birkhoff, G.D. Proof of the ergodic theorem. Proc. Natl. Acad. Sci. USA 1931, 17, 656–660. [Google Scholar] [CrossRef] [PubMed]
Hahs, D.W.; Pethel, S.D. Distinguishing Anticipation from Causality: Anticipatory Bias in the Estimation of Information Flow. Phys. Rev. Lett. 2011, 107, 128701. [Google Scholar] [CrossRef] [PubMed]

© 2013 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license ( http://creativecommons.org/licenses/by/3.0/).

Share and Cite

MDPI and ACS Style

Nichols, J.M.; Bucholtz, F.; Michalowicz, J.V. Linearized Transfer Entropy for Continuous Second Order Systems. Entropy 2013, 15, 3186-3204. https://doi.org/10.3390/e15083276

AMA Style

Nichols JM, Bucholtz F, Michalowicz JV. Linearized Transfer Entropy for Continuous Second Order Systems. Entropy. 2013; 15(8):3186-3204. https://doi.org/10.3390/e15083276

Chicago/Turabian Style

Nichols, Jonathan M., Frank Bucholtz, and Joe V. Michalowicz. 2013. "Linearized Transfer Entropy for Continuous Second Order Systems" Entropy 15, no. 8: 3186-3204. https://doi.org/10.3390/e15083276

Article Menu

Linearized Transfer Entropy for Continuous Second Order Systems

Abstract

1. Introduction

2. Mathematical Development

3. Transfer Entropy (TE) for Second Order Linear Systems

3.1. Time-Delayed TE

3.2. Analytical Cross-Correlation Function

4. Behavior of the TDTE

5. Conclusions

Acknowledgements

Conflict of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI