Information Geometry Theoretic Measures for Characterizing Neural Information Processing from Simulated EEG Signals

Hua, Jia-Chen; Kim, Eun-jin; He, Fei

doi:10.3390/e26030213

Open AccessArticle

Information Geometry Theoretic Measures for Characterizing Neural Information Processing from Simulated EEG Signals

by

Jia-Chen Hua

^1,*

,

Eun-jin Kim

¹

and

Fei He

²

¹

Centre for Fluid and Complex Systems, Coventry University, Coventry CV1 2NL, UK

²

Centre for Computational Science and Mathematical Modelling, Coventry University, Coventry CV1 2TL, UK

^*

Author to whom correspondence should be addressed.

Entropy 2024, 26(3), 213; https://doi.org/10.3390/e26030213

Submission received: 16 December 2023 / Revised: 11 January 2024 / Accepted: 18 February 2024 / Published: 28 February 2024

(This article belongs to the Special Issue Entropy Applications in Electroencephalography)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

In this work, we explore information geometry theoretic measures for characterizing neural information processing from EEG signals simulated by stochastic nonlinear coupled oscillator models for both healthy subjects and Alzheimer’s disease (AD) patients with both eyes-closed and eyes-open conditions. In particular, we employ information rates to quantify the time evolution of probability density functions of simulated EEG signals, and employ causal information rates to quantify one signal’s instantaneous influence on another signal’s information rate. These two measures help us find significant and interesting distinctions between healthy subjects and AD patients when they open or close their eyes. These distinctions may be further related to differences in neural information processing activities of the corresponding brain regions, and to differences in connectivities among these brain regions. Our results show that information rate and causal information rate are superior to their more traditional or established information-theoretic counterparts, i.e., differential entropy and transfer entropy, respectively. Since these novel, information geometry theoretic measures can be applied to experimental EEG signals in a model-free manner, and they are capable of quantifying non-stationary time-varying effects, nonlinearity, and non-Gaussian stochasticity presented in real-world EEG signals, we believe that they can form an important and powerful tool-set for both understanding neural information processing in the brain and the diagnosis of neurological disorders, such as Alzheimer’s disease as presented in this work.

Keywords:

information geometry; information length; information rate; causal information rate; causality; stochastic oscillators; electroencephalography; stochastic simulation; signal processing; dementia; Alzheimer’s disease; information theory; neural information processing; brain networks

1. Introduction

Identifying quantitative features from neurophysiological signals such as electroencephalography (EEG) is critical for understanding neural information processing in the brain and the diagnosis of neurological disorders such as dementia. Many such features have been proposed and employed to analyze neurological signals, which not only resulted in insightful understanding of the brain neurological dynamics of patients with certain neurological disorders versus healthy control (CTL) groups, but also helped build mathematical models that replicate the neurological signal with these quantitative features [1,2,3,4,5].

An important distinction, or non-stationary time-varying effects of the neurological dynamics, is the switching between eyes-open (EO) and eyes-closed (EC) states, where numerous research studies have been conducted on this distinction between EO and EC states to quantify important features of CTL subjects and patients using different techniques on EEG data such as traditional frequency-domain analysis [6,7], transfer entropy [8], energy landscape analysis [9], and nonlinear manifold learning for functional connectivity analysis [10], while also attempting to relate these features to specific clinical conditions and/or physiological variables, including skin conductance levels [11,12], cerebral blood flow [13], brain network connectivity [14,15,16], brain activities in different regions [17], and performance on the unipedal stance test (UPST) [18]. Clinical physiological studies found that there are distinct mental states related to the EO and EC states. Specifically, there is an “exteroceptive” mental activity state characterized by attention and ocular motor activity during EO, and an “interoceptive” mental activity state characterized by imagination and multisensory activity during EC [19,20]. Ref. [21] suggested that the topological organization of human brain networks dynamically switches corresponding to the information processing modes when the brain is visually connected to or disconnected from the external environment. However, patients with Alzheimer’s disease (AD) show loss of brain responsiveness to environmental stimuli [22,23], which might be due to impaired or loss of connectivities in the brain networks. This suggests that dynamical changes between EO and EC might represent an ideal paradigm to investigate the effect of AD pathophysiology and could be developed as biomarkers for diagnosis purposes. However, sensible quantification of robust features of these dynamical changes between EO and EC of both healthy and AD subjects, solely relying on EEG signals, is nontrivial. Despite the success of many statistical and quantitative measures being applied to neurological signal analysis, the main challenges stem from the non-stationary time-varying dynamics of the human brain with nonlinearity and non-Gaussian stochasticity, which makes most, if not all, of these traditional quantitative measures inadequate, and blindly applying these traditional measures to nonlinear and nonstationary time series/signals may produce spurious results, leading to incorrect interpretation.

In this work, by using simulated EEG signals of both CTL groups and AD patients under both EC and EO conditions and based on our previous works on information geometry [24,25,26], we develop novel and powerful quantitative measures in terms of information rate and causal information rate to quantify the important features of neurological dynamics of brains. We are able to find significant and interesting distinctions between CTL subjects and AD patients when they switch between the eyes-open and eyes-closed status. These quantified distinctions may be further related to differences in neural information processing activities of the corresponding brain regions, and to differences in connectivities among these brain regions, and therefore, they can be further developed as important biomarkers to diagnose neurological disorders, including but not limited to Alzheimer’s disease. It should be noted that these novel and powerful quantitative measures in terms of information rate and causal information rate can be applied to experimental EEG signals in a model-free manner, and they are capable of quantifying non-stationary time-varying effects, nonlinearity, and non-Gaussian stochasticity presented in real-world EEG signals, and hence, they are more robust and reliable than other information-theoretic measures applied to neurological signal analysis in the literature [27,28]. Therefore, we believe that these information geometry theoretic measures can form an important and powerful tool set for the neuroscience community.

The EEG signals have been modeled using many different methodologies in the literature. An EEG model in terms of nonlinear stochastic differential equation (SDE) could be sufficiently flexible in that it usually contains many parameters, whose values can be tuned to match the model’s output with actual EEG signals for different neurophysiological conditions, such as EC and EO, of CTL subjects or AD patients. Moreover, an SDE model of EEG can be solved by a number of numerical techniques to generate simulated EEG signals superior to actual EEG signals in terms of much higher temporal resolution and much larger number of sample paths available. These are the two main reasons why we choose to work with SDE models of EEG signals. Specifically, we employed a model of stochastic coupled Duffing–van der Pol oscillators proposed by Ref. [1], which is flexible enough to represent the EC and EO conditions for both CTL and AD subjects and straightforward enough to be simulated by using typical numerical techniques for solving SDE. Moreover, the model parameters reported in Ref. [1] were fine tuned against real-world experimental EEG signals of CTL and AD patients with both EC and EO conditions, and therefore, quantitative investigations on the model’s output of simulated signals are sufficiently representative for a large population of healthy and AD subjects.

2. Methods

2.1. Stochastic Nonlinear Oscillator Models of EEG Signals

A phenomenological model of the EEG based on a coupled system of Duffing–van der Pol oscillators subject to white noise excitation has been introduced [1] with the following form:

{\begin{matrix} {\ddot{x}}_{1} + (k_{1} + k_{2}) x_{1} - k_{2} x_{2} = - b_{1} x_{1}^{3} - b_{2} {(x_{1} - x_{2})}^{3} + ϵ_{1} {\dot{x}}_{1} (1 - x_{1}^{2}), \\ {\ddot{x}}_{2} - k_{2} x_{1} + k_{2} x_{2} = b_{2} {(x_{1} - x_{2})}^{3} + ϵ_{2} {\dot{x}}_{2} (1 - x_{2}^{2}) + μ d W, \end{matrix}

(1)

where

x_{i}, {\dot{x}}_{i}, {\ddot{x}}_{i}, i = 1, 2

are positions, velocities, and accelerations of the two oscillators, respectively. Parameters

k_{i}, b_{i}, ϵ_{i}, i = 1, 2

are the linear stiffness, cubic stiffness, and van der Pol damping coefficient of the two oscillators, respectively. Parameter

μ

represents the intensity of white noise and

d W

is a Wiener process representing the additive noise in the stochastic differential system. The physical meanings of these variables and parameters were nicely explained in a schematic figure in Ref. [1].

By using actual EEG signals, Ref. [1] utilized a combination of several different statistical and optimization techniques to fine tune the parameters in the model equations for eyes-closed (EC) and eyes-open (EO) conditions of both healthy control (CTL) subjects and Alzheimer’s disease (AD) patients, and these parameter values for different conditions are summarized in Table 1 and Table 2.

The model Equation (1) can be easily rewritten in a more standard form of stochastic differential equation (SDE) as follows:

{\begin{matrix} {\dot{x}}_{1} = x_{3}, \\ {\dot{x}}_{2} = x_{4}, \\ {\dot{x}}_{3} = - (k_{1} + k_{2}) x_{1} + k_{2} x_{2} - b_{1} x_{1}^{3} - b_{2} {(x_{1} - x_{2})}^{3} + ϵ_{1} x_{3} (1 - x_{1}^{2}), \\ {\dot{x}}_{4} = k_{2} x_{1} - k_{2} x_{2} + b_{2} {(x_{1} - x_{2})}^{3} + ϵ_{2} x_{4} (1 - x_{2}^{2}) + μ d W, \end{matrix}

(2)

which is more readily suitable for stochastic simulations.

2.2. Initial Conditions (ICs) and Specifications of Stochastic Simulations

For simplicity, we employ the Euler–Maruyama scheme [29] to simulate

2 \times 10^{7}

trajectories in total of the model Equation (2); although, other more sophisticated methods for stochastic simulations exist. We simulate such a large number of trajectories, because calculations of information geometry theoretic measures rely on accurate estimation of probability density functions (PDFs) of the model’s variables

x_{i} (t)

, which requires a large number of data samples of

x_{i} (t)

at any given time t.

Since nonlinear oscillators’ solution is very sensitive to initial conditions, we start the simulation with a certain initial probability distribution (e.g., a Gaussian distribution) for all

x_{1} (0), x_{2} (0), x_{3} (0), x_{4} (0)

, which means that the 20 million

x_{i} (0)

(\forall i = 1, 2, 3, 4)

are randomly drawn from a probability density function (PDF) of the initial distribution. The time-step size

d t

is set to

10^{- 6}

to compensate for the very-high values of stiffness parameters

k_{1}

and

k_{2}

in Table 1 and Table 2. The total number of simulation time steps is

1 \times 10^{7}

, making the total time range of simulation

[0, 10]

. The

Δ t = 10^{- 4}

is the time interval when the probability density functions (PDFs)

p (x_{1}, t)

and

p (x_{2}, t)

are estimated for calculating information geometry theoretic measures such as information rates and causal information rates, as explained in Section 2.3.

For nonlinear oscillators, different initial conditions can result in dramatically different long-term time evolution. So in order to explore more diverse initial conditions, we simulated the SDE with 6 different initial Gaussian distributions with different means and standard deviations, i.e.,

x_{1} (0) \sim N (μ_{x_{1} (0)}, σ^{2})

,

x_{2} (0) \sim N (μ_{x_{2} (0)}, σ^{2})

,

x_{3} (0) \sim N (μ_{x_{3} (0)}, σ^{2})

,

x_{4} (0) \sim N (μ_{x_{4} (0)}, σ^{2})

, where the parameters are summarized alongside other specifications in Table 3.

For brevity, in this paper, we use the word “initial conditions” or its abbreviation “IC” to refer to the (set of 4) initial Gaussian distributions from which the 20 million

x_{i} (0)

(\forall i = 1, 2, 3, 4)

are randomly drawn. For example, the “IC No.6” in Table 3 (and simply “IC6” elsewhere in this paper) refers to the 6th (set of 4) Gaussian distributions with which we start the simulation, and the specifications of this stimulation are listed in the last column of Table 3.

2.3. Information Geometry Theoretic Measures: Information Rate and Causal Information Rate

When a stochastic differential equation (SDE) model exhibits non-stationary time-varying effects, nonlinearity, and/or non-Gaussian stochasticity, while we are interested in large fluctuations and extreme events in the solutions, simple statistics such as mean and variance might not suffice to compare the solutions of different SDE models (or same model with different parameters). In such cases, quantifying and comparing the time evolution of probability density functions (PDFs) of solutions will provide us with more information [30]. The time evolution of PDFs can be studied and compared through the framework of information geometry [31], wherein PDFs are considered as points on a Riemannian manifold (which is called the statistical manifold), and their time evolution can be considered as a motion on this manifold. Several different metrics can be defined on a probability space to equip it with a manifold structure, including a metric related to the Fisher Information [32], known as the Fisher Information metric [33,34], which we use in this work:

g_{μ ν} (θ) \overset{def}{=} \int_{X} \frac{\partial log p (x; {θ})}{\partial θ_{μ}} \frac{\partial log p (x; {θ})}{\partial θ_{ν}} p (x; {θ}) d x .

(3)

Here,

p (x; {θ})

denotes a continuous family of PDFs parameterized by parameters

{θ}

. If a time-dependent PDF

p (x, t)

is considered as a continuous family of PDFs parameterized by a single parameter time t, the metric tensor is then reduced to a scalar metric:

g (t) = \int d x \frac{1}{p (x, t)} {[\frac{\partial p (x, t)}{\partial t}]}^{2} .

(4)

The infinitesimal distance

d L

on the manifold is then given by

d L^{2} = g (t) d t^{2}

, where

L

is called the Information Length and defined as follows:

L (t) \overset{def}{=} \int_{0}^{t} d t_{1} \sqrt{\int d x \frac{1}{p (x, t_{1})} {[\frac{\partial p (x, t_{1})}{\partial t_{1}}]}^{2}} .

(5)

The Information Length

L

represents the dimensionless distance, which measures the total distance traveled on the statistical manifold. The time derivative of

L

then represents the speed of motion on this manifold:

Γ (t) \overset{def}{=} lim_{d t \to 0} \frac{d L (t)}{d t} = \sqrt{\int d x \frac{1}{p (x, t)} {[\frac{\partial p (x, t)}{\partial t}]}^{2}},

(6)

which is referred to as the Information Rate. If multiple variables are involved, such as

x_{i} (t)

where

i = 1, 2, 3, 4

as in the stochastic nonlinear oscillator model Equation (2), we will use subscript in

Γ (t)

, e.g.,

Γ_{x_{2}} (t)

to denote the information rate of signal

x_{2} (t)

.

The notion of Causal Information Rate was introduced in Ref. [25] to quantify how one signal instantaneously influences another signal’s information rate. As an example, the causal information rate of signal

x_{1} (t)

influencing signal

x_{2} (t)

’s information rate is denoted and defined by

Γ_{x_{1} \to x_{2}} (t) \overset{def}{=} Γ_{x_{2}}^{*} (t) - Γ_{x_{2}} (t)

, where

Γ_{x_{2}} {(t)}^{2} = \int d x_{2} p (x_{2}, t) {[\partial_{t} ln p (x_{2}, t)]}^{2},

(7)

and

\begin{matrix} Γ_{x_{2}}^{*} {(t)}^{2} & \overset{def}{=} lim_{t_{*} \to t^{+}} \int d x_{1} d x_{2} p (x_{2}, t_{*}; x_{1}, t) {[\partial_{t_{*}} ln p (x_{2}, t_{*} | x_{1}, t)]}^{2} \\ = lim_{t_{*} \to t^{+}} \int d x_{1} d x_{2} p (x_{2}, t_{*}; x_{1}, t) {[\partial_{t_{*}} ln p (x_{2}, t_{*}; x_{1}, t)]}^{2}, \end{matrix}

(8)

where the relation between conditional, joint, and marginal PDFs

p (x_{2}, t_{*} | x_{1}, t) = \frac{p (x_{2}, t_{*}; x_{1}, t)}{p (x_{1}, t)}

and the fact

\partial_{t_{*}} p (x_{1}, t) = 0

for

t_{*} \neq t

are used in the 2nd equal sign above.

Γ_{x_{2}}^{*}

denotes the (auto) contribution to the information rate from

x_{2}

itself, while

x_{1}

is given/known and frozen in time. In other words,

Γ_{x_{2}}^{*}

represents the information rate of

x_{2}

when the additional information of

x_{1}

(at the same time with

x_{2}

) becomes available or known. Subtracting

Γ_{x_{2}}

from

Γ_{x_{2}}^{*}

following the definition of

Γ_{x_{1} \to x_{2}}

then gives us the contribution of (knowing the additional information of)

x_{1}

to

Γ_{x_{2}}

, signifying how

x_{1}

instantaneously influences the information rate of

x_{2}

. One can easily verify that if signals

x_{1} (t)

and

x_{2} (t)

are statistically independent such that the equal-time joint PDF can be separated as

p (x_{1}, t; x_{2}, t) = p (x_{1}, t) \cdot p (x_{2}, t)

, then

Γ_{x_{2}}^{*} (t)

will reduce to

Γ_{x_{2}} (t)

, making the causal information rate

Γ_{x_{1} \to x_{2}} = 0

, which is consistent with the assumption that

x_{1} (t)

and

x_{2} (t)

are statistically independent at the same time t.

For numerical estimation purposes, one can derive simplified equations

Γ_{x_{2}} {(t)}^{2}

=

4 \int d x_{2} {[\partial_{t} \sqrt{p (x_{2}, t)}]}^{2}

and

Γ_{x_{2}}^{*} {(t)}^{2} = 4 lim_{t_{*} \to t^{+}} \int d x_{1} d x_{2} {[\partial_{t_{*}} \sqrt{p (x_{2}, t_{*}; x_{1}, t)}]}^{2}

to ease the numerical calculations and avoid numerical errors in PDFs (due to finite sample-size estimations using a histogram-based approach) being doubled or enlarged when approximating the integrals in the original Equations (7) and (8) by finite summation. On the other hand, the time derivatives of the square root of PDFs are approximated by using temporally adjacent PDFs with each pair of two adjacent PDFs being separated by

Δ t = 10^{- 4}

in time, as mentioned at the end of Section 2.2.

2.4. Shannon Differential Entropy and Transfer Entropy

As a comparison with more traditional and established information-theoretic measures, we also calculate differential entropy and transfer entropy using the numerically estimated PDFs and compare them with information rate and causal information rate, respectively.

The Shannon differential entropy of a signal

x (t)

is defined to extend the idea of Shannon discrete entropy as

h (x (t)) = E [- ln p (x, t)] = - \int p (x, t) ln p (x, t) d x = - \int P (d x (t)) ln \frac{P (d x (t))}{μ (d x)},

(9)

where

μ (d x) = d x

is the Lebesgue measure, and

P (d x (t)) = p (x, t) μ (d x) = p (x, t) d x

is the probability measure. In other words, differential entropy is the negative relative entropy (Kullback-Leibler divergence) from the Lebesgue measure (considered as an unnormalized probability measure) to a probability measure P (with density p). In contrast, information rate

Γ_{x} (t)

=

\sqrt{\int d x p (x, t) {[\partial_{t} ln p (x, t)]}^{2}} = \sqrt{lim_{d t \to 0} \frac{2}{d t^{2}} \int d x p (x, t + d t) ln \frac{p (x, t + d t)}{p (x, t)}}

(see Refs. [24,25,26] for detailed derivations) is related to the rate of change in relative entropy of two infinitesimally close PDFs

p (x, t)

and

p (x, t + d t)

. Therefore, although differential entropy can measure the complexity of a signal

x (t)

at time t, it neglects how the signal’s PDF

p (x, t)

changes instantaneously at that time, which is crucial to quantify how new information can be reflected from the instantaneous entropy production rate of the signal

x (t)

. This is the theoretical reason why the information rate is a much better and more appropriate measure than differential entropy for characterizing the neural information processing from EEG signals of the brain, and the practical reason for this will be illustrated in terms of numerical results and discussed at the end of Section 3.3.2 and Section 3.3.3.

The transfer entropy (TE) measures the directional flow or transfer of information between two (discrete-time) stochastic processes. The transfer entropy from a signal

x_{1} (t)

to another signal

x_{2} (t)

is the amount of uncertainty reduced in future values of

x_{2} (t)

by knowing the past values of

x_{1} (t)

given past values of

x_{2} (t)

. Specifically, if the amount of information is measured using Shannon’s (discrete) entropy

H (X_{t})

=

- \sum_{x} p (x, t) {log}_{2} p (x, t)

of a stochastic process

X_{t}

and conditional entropy

H (Y_{t_{2}} | X_{t_{1}}) = - \sum_{x, y} p (x, t_{1}; y, t_{2}) {log}_{2} p (y, t_{2} | x, t_{1})

, the transfer entropy from a process

X_{t}

to another process

Y_{t}

(for discrete-time

t \in Z

) can be written as follows:

\begin{matrix} (10) & {TE}_{X_{t} \to Y_{t}} (t) = H (Y_{t + 1} ∣ Y_{t : t - (k - 1)}) - H (Y_{t + 1} ∣ Y_{t : t - (k - 1)}, X_{t : t - (l - 1)}), \\ = - \sum_{y} p (y_{t + 1}, y_{t : t - (k - 1)}) {log}_{2} p (y_{t + 1} | y_{t : t - (k - 1)}) \\ + \sum_{x, y} p (y_{t + 1}, y_{t : t - (k - 1)}, x_{t : t - (l - 1)}) {log}_{2} p (y_{t + 1} | y_{t : t - (k - 1)}, x_{t : t - (l - 1)}), \\ = \sum_{x, y} p (y_{t + 1}, y_{t : t - (k - 1)}, x_{t : t - (l - 1)}) {log}_{2} \frac{p (y_{t + 1} | y_{t : t - (k - 1)}, x_{t : t - (l - 1)})}{p (y_{t + 1} | y_{t : t - (k - 1)})}, \\ (11) & = \sum_{x, y} p (y_{t + 1}, y_{t : t - (k - 1)}, x_{t : t - (l - 1)}) {log}_{2} \frac{p (y_{t + 1}, y_{t : t - (k - 1)}, x_{t : t - (l - 1)}) p (y_{t : t - (k - 1)})}{p (y_{t : t - (k - 1)}, x_{t : t - (l - 1)}) p (y_{t + 1}, y_{t : t - (k - 1)})}, \end{matrix}

which quantifies the amount of reduced uncertainty in future value

Y_{t + 1}

by knowing the past l values of

X_{t}

given past k values of

Y_{t}

, where

Y_{t : t - (k - 1)}

and

X_{t : t - (l - 1)}

are shorthands of past k values

Y_{t}, Y_{t - 1}, \dots, Y_{t - (k - 1)}

and past l values

X_{t}, X_{t - 1}, \dots, X_{t - (l - 1)}

, respectively.

In order to properly compare with causal information rate signifying how one signal instantaneously influences another signal’s information rate (at the same/equal-time t), we set

k = l = 1

in calculating the transfer entropy between two signals. Also, since the causal information rate involves partial time derivatives, which have to be numerically estimated using temporally adjacent PDFs separated by

Δ t = 10^{- 4}

in time (as mentioned at the end of Section 2.2), the discrete-time

t \in Z

in transfer entropy should be changed to

n Δ t

with

n \in Z

. Therefore, the transfer entropy appropriate for comparing with the causal information rate should be rewritten as follows:

\begin{matrix} (12) & {TE}_{x_{1} \to x_{2}} (t) = H (x_{2} (t + Δ t) ∣ x_{2} (t)) - H (x_{2} (t + Δ t) ∣ x_{2} (t), x_{1} (t)), \\ = \sum_{x_{1}, x_{2}} p (x_{2}, t + Δ t; x_{2}, t; x_{1}, t) {log}_{2} \frac{p (x_{2}, t + Δ t | x_{2}, t; x_{1}, t)}{p (x_{2}, t + Δ t | x_{2}, t)}, \\ (13) & = \sum_{x_{1}, x_{2}} p (x_{2}, t + Δ t; x_{2}, t; x_{1}, t) {log}_{2} \frac{p (x_{2}, t + Δ t; x_{2}, t; x_{1}, t) p (x_{2}, t)}{p (x_{2}, t; x_{1}, t) p (x_{2}, t + Δ t; x_{2}, t)} . \end{matrix}

Numerical estimations of the information rate, causal information rate, differential entropy, and transfer entropy are all based on numerical estimation of PDFs using histograms. In particular, in order to sensibly and consistently estimate the causal information rate (e.g., to avoid getting negative values), special caution is required when choosing the binning for histogram estimation of PDFs in calculating

Γ_{x_{2}} {(t)}^{2} = 4 \int d x_{2} {[\partial_{t} \sqrt{p (x_{2}, t)}]}^{2}

and

Γ_{x_{2}}^{*} {(t)}^{2} = 4 lim_{t_{*} \to t^{+}} \int d x_{1} d x_{2} {[\partial_{t_{*}} \sqrt{p (x_{2}, t_{*}; x_{1}, t)}]}^{2}

. The finer details for these numerical estimation techniques are elaborated in Appendix A.

3. Results

We performed simulations with six different Gaussian initial distributions (with different means and standard deviations summarized in Table 3). Initial Conditions No.1 (IC No.1, or simply IC1) through No.3 (IC3) are Gaussian distributions with a narrow width or smaller standard deviation, whereas IC4 through IC6 have a larger width/standard deviation, and therefore, the simulation results of IC4 through IC6 exhibit more diverse time evolution behaviors (e.g., more complex attractors, as explained next), and hence, the corresponding calculation results are more robust or insensitive to the specific mean values

μ_{x_{i} (0)}

’s of the initial Gaussian distributions (see Table 3 for more details). Therefore, in the main text here, we focus on these results from initial Gaussian distributions with wider width/larger standard deviation, and we list complete results from all six initial Gaussian distributions in the Appendix B. Specifically, we found that the results from IC4 through IC6 are qualitatively the same or very similar, and therefore, in the main text here, we illustrate and discuss the results from Initial Conditions No.4 (IC4), which is sufficiently representative for IC5 and IC6, and refer to other IC’s (by referencing the relevant sections in Appendix B or explicitly illustrating the results) if needed.

3.1. Sample Trajectories of $X_{1} (T)$ and $X_{2} (T)$

To give a basic idea of how the simulated trajectories evolve in time, we start by illustrating 50 sample trajectories of

x_{1} (t)

and

x_{2} (t)

from the total

2 \times 10^{7}

simulated trajectories for both CTL subjects and AD patients with both EC and EO conditions, which are visualized in Figure 1 and Figure 2. Notice that from Figure 1c, one can see that it takes some time for the trajectories of

x_{2} (t)

to settle down on some complex attractors for EC, which suggests a longer memory associated with EC of CTL. This is more evident as shown in the time evolution of PDF

p (x_{2}, t)

in Figure 3c below.

3.2. Time Evolution of PDF $P (X_{1}, T)$ and $P (X_{2}, T)$

The empirical PDFs

p (x_{1}, t)

and

p (x_{2}, t)

can better illustrate the overall time evolution of a large number of trajectories, and they serve as a basis for calculations of information geometry theoretic measures such as information rates and causal information rates. These empirical PDFs are estimated using a histogram-based approach with Rice’s rule [35,36], where the number of bins is

n_{bins} = 2 \sqrt[3]{n_{samples}}

, and since we simulated

2 \times 10^{7}

sample trajectories in total, the

n_{bins}

is rounded to 542. The centers of bins are plotted on the y-axis in sub-figures of Figure 3 and Figure 4, where the function values of

p (x_{1}, t)

and

p (x_{2}, t)

are color-coded following the color bars.

As mentioned in the previous section, from Figure 3c, one can see more clearly that after around

t \geq 5

, the trajectories settle down on some complex attractors, and the time evolution of

p (x_{2}, t)

undergoes only minor changes. Meanwhile, from Figure 3a, one can observe that a similar settling down of

x_{1} (t)

on some complex attractors happens after around

t \geq 7.5

. Therefore, we select only PDFs with

t \geq 7.5

for statistical analysis of information rates and causal information rates to investigate the stationary properties.

From Figure 3 and Figure 4, one can already observe some qualitative differences between healthy control (CTL) subjects and AD patients. For example, the time evolution patterns of

p (x_{1}, t)

and

p (x_{2}, t)

are significantly different when CTL subjects open their eyes from eyes-closed (EC) state, whereas for AD patients, these differences are relatively minor. One of the best ways to provide quantitative descriptions of these differences (instead of being limited to qualitative descriptions) is using information geometry theoretic measures such as information rates and causal information rates, whose results are listed in Section 3.3 and Section 3.4, respectively.

As can be seen from Appendix B.2, IC1 through IC3 exhibit much simpler attractors than IC4 through IC6. Since the width/standard deviation of the initial Gaussian distributions of IC1 through IC3 is much smaller, they are more sensitive to the specific mean values

μ_{x_{i} (0)}

’s of the initial Gaussian distribution, and one can see that IC3’s time evolution behaviors of

p (x_{i}, t)

are somewhat qualitatively different from IC1 and IC2, whereas

p (x_{i}, t)

’s time evolution behaviors of IC4 through IC6 are all qualitatively the same.

3.3. Information Rates $Γ_{x_{1}} (t)$ and $Γ_{x_{2}} (t)$

Intuitively speaking, the information rate is the (instantaneous) speed of PDF’s motion on the statistical manifold, as each given PDF corresponds to a point on that manifold, and when the time changes, a time-dependent PDF will typically move on a curve on the statistical manifold, whereas a stationary or equilibrium state PDF will remain at the same point on the manifold. Therefore, the information rate is a natural tool to investigate the time evolution of PDF.

Moreover, since the information rate is quantifying instantaneous rate of change in the infinitesimal relative entropy between two adjacent PDFs, it is hypothetically a reflection of neural information processing in the brain, and hence, it may provide important insight into the neural activities in different regions of the brain, as long as the regional EEG signals can be sufficiently collected for calculating the information rates.

3.3.1. Time Evolution

The time evolution of information rates

Γ_{x_{1}} (t)

and

Γ_{x_{2}} (t)

are shown in Figure 5a,b for CTL subjects and AD patients, respectively. Since

Γ_{x_{1}} (t)

and

Γ_{x_{2}} (t)

quantify the (infinitesimal) relative entropy production rate instantaneously at time t, they represent the information-theoretic complexities of signals

x_{1} (t)

and

x_{2} (t)

of the coupled oscillators, respectively, and are hypothetical reflections of neural information processing in the corresponding regions in the brain.

For example, in Figure 5a, there is a clear distinction between eyes-closed (EC) and eyes-open (EO) for CTL subjects: both

Γ_{x_{1}} (t)

and

Γ_{x_{2}} (t)

decrease significantly when healthy subjects open their eyes, which may be interpreted as the neural information processing activities of the corresponding brain regions being “suppressed” by the incoming visual information when eyes are opened from being closed.

Interestingly, when AD patients open their eyes, both

Γ_{x_{1}} (t)

and

Γ_{x_{2}} (t)

are increasing instead of decreasing, as shown in Figure 5b. This might be interpreted as that the incoming visual information received when eyes are opened is in fact “stimulating” the neural information processing activities of the corresponding brain regions, which might be impaired or damaged by the relevant mechanism of Alzheimer’s disease (AD).

In Figure 5a,b, we annotate the mean and standard deviation for

Γ_{x_{1}} (t)

and

Γ_{x_{2}} (t)

after

t \geq 7.5

in the legend, because as mentioned above, the PDFs of this time range reflect longer-term temporal characteristics, and hence, the corresponding

Γ_{x_{1}} (t \geq 7.5)

and

Γ_{x_{2}} (t \geq 7.5)

should reflect more reliable and robust features of neural information processing activities of the corresponding brain regions. Therefore, meaningful statistics will require collecting samples of

Γ_{x_{1}} (t)

and

Γ_{x_{2}} (t)

in this time range, for which the results are shown in the section below.

3.3.2. Empirical Probability Distribution (for $T \geq 7.5$ )

The statistics of

Γ_{x_{1}} (t \geq 7.5)

and

Γ_{x_{2}} (t \geq 7.5)

can be further and better visualized using empirical probability distributions of them, as shown in Figure 6. Again, we use histogram-based density estimation with Rice’s rule, and since the time interval

Δ t

for estimating PDFs and computing

Γ_{x_{1}} (t)

and

Γ_{x_{2}} (t)

is

10^{- 4}

(whereas the time-step size

d t

for simulating the SDE model is

10^{- 6}

), we collected 24,999 samples of both

Γ_{x_{1}} (t)

and

Γ_{x_{2}} (t)

for

7.5 \leq t < 10

, and hence, the number of bins following Rice’s rule is rounded to 58. Figure 6 confirms the observation in the previous section, while it also better visualizes the sample standard deviation in the shapes of the estimated PDFs, indicating that the PDFs of both

Γ_{x_{1}} (t)

and

Γ_{x_{2}} (t)

are narrowed down when healthy subjects open their eyes but are widened when AD patients do so.

As a comparison, we also calculate more traditional/established information-theoretic measure, namely, the Shannon differential entropy

h (x_{1} (t))

and

h (x_{2} (t))

, and estimate their empirical probability distributions in the same manner as we do for information rates, as shown in Figure 7.

One can see that the empirical distributions of differential entropy

h (x_{1} (t))

and

h (x_{2} (t))

are not able to make clear distinction between EC and EO conditions, especially for AD patients. This may be better summarized in Table 4, comparing the mean and standard deviation values of information rate vs. differential entropy for the four cases. Therefore, the information rate is a superior measure for quantifying the non-stationary time-varying dynamical changes in EEG signals when switching between EC and EO states and is a better and more reliable reflection of neural information processing in the brain.

3.3.3. Phase Portraits (for $T \geq 7.5$ )

In addition to empirical statistics of information rates for

t \geq 7.5

in terms of estimated probability distributions, one can also visualize the temporal dynamical features of

Γ_{x_{1}} (t)

and

Γ_{x_{2}} (t)

combined using phase portraits, as shown in Figure 8. Notice that when healthy subjects open their eyes, the fluctuation ranges of

Γ_{x_{1}} (t)

and

Γ_{x_{2}} (t)

shrink by roughly 5-fold, whereas when AD patients open their eyes, the fluctuation ranges are enlarged.

Moreover, when plotting EC and EO of healthy subjects separately in Figure 9a to zoom into the ranges of

Γ_{x_{1}} (t)

and

Γ_{x_{2}} (t)

for EO, one can also see that the phase portrait of EO exhibits a fractal-like pattern, whereas the phase portrait of EC exhibits more regular dynamical features, including an overall trend of fluctuating between bottom left and top right, indicating that the

Γ_{x_{1}} (t)

and

Γ_{x_{2}} (t)

are somewhat synchronized, which could be explained by the strong coupling coefficients in Table 1 of healthy subjects. Contrarily, for AD patients, the phase portraits of EC and EO both exhibit fractal-like patterns in Figure 9b.

Same as at the end of Section 3.3.2, as a comparison, we also visualize the phase portraits of Shannon differential entropy

h (x_{1} (t))

and

h (x_{2} (t))

in Figure A52d and Figure A56 in Appendix B.4.3, where one can see that it is hard to distinguish the phase portraits of

h (x_{1} (t))

vs.

h (x_{2} (t))

of AD EC from those of AD EO, as they are qualitatively the same or very similar. Contrarily, in Figure 8, the fluctuation ranges of phase portraits of

Γ_{x_{1}} (t)

vs.

Γ_{x_{2}} (t)

are significantly enlarged when AD patients open their eyes. Therefore, this reconfirms our claim at the end of Section 3.3.2 that the information rate is a superior measure than differential entropy in quantifying the dynamical changes in EEG signals when switching between EO and EO states and is a better and more reliable reflection of neural information processing in the brain.

3.3.4. Power Spectra (for $T \geq 7.5$ )

Another perspective to visualize the dynamical characteristics of

Γ_{x_{1}} (t)

and

Γ_{x_{2}} (t)

is by using power spectra, i.e., the absolute values of (fast) Fourier transforms of

Γ_{x_{1}} (t)

and

Γ_{x_{2}} (t)

, as shown in Figure 10. Frequency-based analyses will not make much sense if the signals or time series of

Γ_{x_{1}} (t)

and

Γ_{x_{2}} (t)

have non-stationary time-varying effects, and this is why we only consider time range

t \geq 7.5

for

Γ_{x_{1}} (t)

and

Γ_{x_{2}} (t)

, when the time evolution patterns of

p (x_{1}, t)

and

p (x_{2}, t)

almost stop changing as shown in Figure 3 (and especially in Figure 3a,c).

The power spectra of

Γ_{x_{1}} (t)

and

Γ_{x_{2}} (t)

also exhibit a clear distinction between EC and EO for CTL and AD subjects. Specifically, the power spectra of

Γ_{x_{1}} (t)

and

Γ_{x_{2}} (t)

can be fit by power law for frequencies between ∼100 Hz to ∼1000 Hz (the typical sampling frequency of experimental EEG signals is 1000 Hz, whereas most of brain wave’s/neural oscillations’ frequencies are below 100 Hz). From Figure 11a, one can see that power law fit exponents (quantifying how fast the power density decreases with increasing frequency) of

Γ_{x_{1}} (t)

’s and

Γ_{x_{2}} (t)

’s power spectra are largely reduced when healthy subjects open their eyes, which indicates that the strength of noise in

Γ_{x_{1}} (t)

and

Γ_{x_{2}} (t)

decreases significantly when switching from EC to EO. Contrarily, for AD patients as shown in Figure 11b, the power law fit exponents of

Γ_{x_{1}} (t)

’s and

Γ_{x_{2}} (t)

’s power spectra increase significantly and slightly, respectively, indicating that the strength of noise in

Γ_{x_{1}} (t)

and

Γ_{x_{2}} (t)

increases when switching from EC to EO.

3.4. Causal Information Rates $Γ_{x_{2} \to X_{1}} (T), Γ_{x_{1} \to X_{2}} (T)$ , and Net Causal Information Rates $Γ_{x_{2} \to X_{1}} (T) - Γ_{x_{1} \to X_{2}} (T)$

The notion of causal information rate was introduced in Ref. [25], which quantifies how one signal instantaneously influences another signal’s information rate. A comparable measure of causality is transfer entropy; however, as shown in Appendix B.6, our calculation results of transfer entropy are too spiky/noisy to reliably quantify causality, and hence, the results are only included in Appendix as a comparison, which we will discuss at the end of this section. Nevertheless, similar to net transfer entropy, one can calculate the net causal information rate, e.g.,

Γ_{x_{2} \to x_{1}} (t) - Γ_{x_{1} \to x_{2}} (t)

, signifying the net causality measure from signal

x_{2} (t)

to

x_{1} (t)

. Since

{\dot{x}}_{2} (t)

is the only variable that is directly affected by random noise in the stochastic oscillator model Equation (2), we calculate

Γ_{x_{2} \to x_{1}} (t) - Γ_{x_{1} \to x_{2}} (t)

for the net causal information rate of the coupled oscillator’s signal

x_{2} (t)

influencing

x_{1} (t)

.

Notice that for stochastic coupled oscillator model Equation (2), causal information rates

Γ_{x_{2} \to x_{1}} (t)

and

Γ_{x_{1} \to x_{2}} (t)

will reflect how strongly the two oscillators are directionally coupled or causally related. Since signals

x_{1} (t)

and

x_{2} (t)

are the results of neural activities in the corresponding brain regions, the causal information rates can be used to measure connectivities among different regions of the brain.

3.4.1. Time Evolution

Similar to Section 3.3.1, we also visualize the time evolution of causal information rates

Γ_{x_{2} \to x_{1}} (t)

and

Γ_{x_{1} \to x_{2}} (t)

in Figure 12a,b for CTL subjects and AD patients, respectively.

For both CTL and AD subjects,

Γ_{x_{2} \to x_{1}} (t)

and

Γ_{x_{1} \to x_{2}} (t)

both decrease when changing from EC to EO, except for AD subjects’

Γ_{x_{2} \to x_{1}} (t)

increasing on average. On the other hand, the net causal information rate

Γ_{x_{2} \to x_{1}} (t) - Γ_{x_{1} \to x_{2}} (t)

changes differently: when healthy subjects open their eyes, it increases and changes from significantly negative on average to slightly positive on average, whereas for AD patients, it increases from almost zero on average to significantly positive on average without net directional change. A possible interpretation might be that, when healthy subjects open their eyes, the brain region generating the signal

x_{2} (t)

becomes more sensitive to the noise, causing it to influence

x_{1} (t)

more compared to the eyes-closed state.

3.4.2. Empirical Probability Distribution (for $T \geq 7.5$ )

Similar to Section 3.3.2, we also estimate the empirical probability distributions of

Γ_{x_{2} \to x_{1}} (t)

,

Γ_{x_{1} \to x_{2}} (t)

and

Γ_{x_{2} \to x_{1}} (t) - Γ_{x_{1} \to x_{2}} (t)

to better visualize their statistics in Figure 13a. In particular, we plot the empirical probability distributions of

Γ_{x_{2} \to x_{1}} (t) - Γ_{x_{1} \to x_{2}} (t)

for both healthy and AD subjects with both EC and EO conditions together in Figure 13b, in order to better visualize and compare net causal information rates’ changes when CTL and AD subjects open their eyes. It can be seen that the estimated PDF of

Γ_{x_{2} \to x_{1}} (t) - Γ_{x_{1} \to x_{2}} (t)

shrinks its width in shape when healthy subjects open their eyes. Combining with the observation that the magnitude of sample mean of

Γ_{x_{2} \to x_{1}} (t) - Γ_{x_{1} \to x_{2}} (t)

is close to 0 for healthy subjects with eyes opened, a possible interpretation might be that the directional connectivity between brain regions generating signals

x_{1} (t)

and

x_{2} (t)

is reduced to almost zero, either by incoming visual information received by opened eyes or due to the brain region generating signal

x_{2} (t)

becoming more sensitive to noise when eyes are opened. Contrarily, the estimated PDF of

Γ_{x_{2} \to x_{1}} (t) - Γ_{x_{1} \to x_{2}} (t)

for AD patients qualitatively change in an inverse direction to become widened in shape.

As mentioned earlier, as a comparison, we also calculate more traditional/established information-theoretic measure of causality, i.e., transfer entropy (TE), and estimate their empirical probability distributions in the same manner as we do for causal information rates, as shown in Figure 14.

One can see that the empirical distributions of transfer entropy

{TE}_{x_{2} \to x_{1}} (t)

and

{TE}_{x_{1} \to x_{2}} (t)

, as well as net transfer entropy

{TE}_{x_{2} \to x_{1}} (t) - {TE}_{x_{1} \to x_{2}} (t)

are not able to make clear distinction between EC and EO conditions, especially for AD patients in terms of net transfer entropy. This may be better summarized in Table 5, comparing the mean and standard deviation values of causal information rate vs. transfer entropy for the four cases.

Moreover, the magnitude of numeric values of transfer entropy and net transfer entropy is ∼

10^{- 2}

or ∼

10^{- 3}

, which is too close to zero, making it too noise-like or unreliable to quantify causality. Therefore, the causal information rate is a much superior measure than transfer entropy in quantifying causality, and since causal information rate quantifies how one signal instantaneously influences another signal’s information rate (which is a reflection of neural information processing in corresponding brain region), it can be used to measure directional or causal connectivities among different brain regions.

4. Discussion

A major challenge for practical usage of information geometry theoretic measures on real-world experimental EEG signals is that they require a significant amount of data samples to estimate the probability density functions. For example, in this work, we simulated

2 \times 10^{7}

trajectories or sample paths of the stochastic nonlinear coupled oscillator models, such that at any time instance, we always have a sufficient amount of data samples to accurately estimate the time-dependent probability density functions with a histogram-based approach. This is usually not possible for experimental EEG signals which often contain only one trajectory for each channel, and one has to use a sliding window-based approach to collect data samples for histogram-based density estimation. This approach implicitly assumes that the EEG signals are stationary within each sliding time window, and hence, one has to balance between the sliding time window’s length and number of available data samples, in order to account for non-stationarity while still having enough data samples to accurately and meaningfully estimate the time-dependent probability densities. And therefore, this approach will not work very well if the EEG signals exhibit severely non-stationary time-varying effects, requiring a very short length of sliding windows, which will contain too few data samples.

An alternative approach to overcome this issue is using kernel density estimation to estimate the probability density functions, which usually requires a much smaller number of data samples while still being able to approximate the true probability distribution with acceptable accuracy. However, this approach typically involves a very high computational cost, limiting its practical use for many cases such as computational resource-limited scenarios. A proposed method to avoid this is using the Koopman operator theoretic framework [37,38] and its numerical techniques applicable to experimental data in a model-free manner, since the Koopman operator is the left-adjoint of the Perron–Frobenious operator evolving the probability density functions in time. This exploration will be left for our future investigation.

5. Conclusions

In this work, we explore information geometry theoretic measures to characterize neural information processing from EEG signals simulated by stochastic nonlinear coupled oscillator models. In particular, we utilize information rates to quantify the time evolution of probability density functions of simulated EEG signals and utilize causal information rates to quantify one signal’s instantaneous influence on another signal’s information rate. The parameters of the stochastic nonlinear coupled oscillator models of EEG were fine tuned for both healthy subjects and AD patients, with both eyes-closed and eyes-open conditions. By using information rates and causal information rates, we find significant and interesting distinctions between healthy subjects and AD patients when they change their eyes’ open/closed status. These distinctions may be further related to differences in neural information processing activities of the corresponding brain regions (for information rates) and to differences in connectivities among these brain regions (for causal information rates).

Compared to more traditional or established information-theoretic measures such as differential entropy and transfer entropy, our results show that information geometry theoretic measures such as information rate and causal information rate are superior to their more traditional counterparts, respectively (information rate vs. differential entropy, and causal information rate vs. transfer entropy). Since information rates and causal information rates can be applied to experimental EEG signals in a model-free manner, and they are capable of quantifying non-stationary time-varying effects, nonlinearity, and non-Gaussian stochasticity presented in real-world EEG signals, we believe that these information geometry theoretic measures can become an important and powerful tool-set for both understanding neural information processing in the brain and diagnosis of neurological disorders such as Alzheimer’s disease in this work.

Author Contributions

Conceptualization, J.-C.H., E.-j.K. and F.H.; Methodology, J.-C.H. and E.-j.K.; Software, J.-C.H.; Validation, J.-C.H.; Formal analysis, J.-C.H.; Investigation, J.-C.H., E.-j.K. and F.H.; Resources, E.-j.K. and F.H.; Writing—original draft, J.-C.H.; Writing—review & editing, J.-C.H., E.-j.K. and F.H.; Visualization, J.-C.H.; Supervision, E.-j.K. and F.H.; Project administration, J.-C.H., E.-j.K. and F.H.; Funding acquisition, E.-j.K. and F.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the EPSRC Grant (EP/W036770/1 (https://gow.epsrc.ukri.org/NGBOViewGrant.aspx?GrantRef=EP/W036770/1, accessed on 17 February 2024)).

Data Availability Statement

The stochastic simulation and calculation scripts will be made publicly available in an open repository, which is likely to be updated under https://github.com/jia-chenhua?tab=repositories or https://gitlab.com/jia-chen.hua (both accessed on 17 February 2024).

Acknowledgments

The stochastic simulations and numerical calculations in this work were performed on GPU nodes of Sulis HPC (https://sulis.ac.uk/, accessed on 17 February 2024). The authors would like to thank Alex Pedcenko (https://pureportal.coventry.ac.uk/en/persons/alex-pedcenko, accessed on 17 February 2024) for providing useful help in accessing the HPC resources in order to finish the simulations and calculations in a timely manner.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

PDF	probability density function
SDE	stochastic differential equation
IC	Initial Conditions (in terms of initial Gaussian distributions)
CTL	healthy control (subjects)
AD	Alzheimer’s disease
EC	eyes-closed
EO	eyes-open
TE	transfer entropy

Appendix A. Finer Details of Numerical Estimation Techniques

Recall from Equation (7) that the squared information rate is

Γ_{x} {(t)}^{2} = \int d x p (x, t) {[\partial_{t} ln p (x, t)]}^{2} = 4 \int d x {[\partial_{t} \sqrt{p (x, t)}]}^{2},

(A1)

where the partial time derivative and integral can be numerically approximated using discretization, i.e.,

\partial_{t} \sqrt{p (x, t)} \approx \frac{1}{(Δ t)} (\sqrt{p (x, t + Δ t)} - \sqrt{p (x, t)})

and

\int d x f (x) \approx \sum_{i} Δ x_{i} f (x_{i})

, respectively, where for brevity and if no ambiguity arises, the summation over index i is often omitted and replaced by x itself as

\sum_{x} Δ x f (x)

, and the symbol x serves as both the index of summation (e.g., the x-th interval with length

Δ x

) and the actual value x in

f (x)

.

A common technique to improve the approximation of integral by finite summation is the trapezoidal rule

\int d x f (x) \approx \sum_{i} Δ x_{i} \frac{f (x_{i - 1}) + f (x_{i})}{2}

, which will be abbreviated as

\sum_{x}^{Trapz .} Δ x f (x)

to indicate that the summation is following the trapezoidal rule imposing a

1 / 2

weight/factor on the first and last summation terms (corresponding to the lower and upper bounds of the integral). Similarly, we use

\sum_{x, y}^{Trapz . 2 D} Δ x Δ y f (x, y)

to denote a 2D trapezoidal approximation of the double integral

\int d x d y f (x, y)

, where different weights (

1 / 4

or

1 / 2

) will be applied to the “corner”/boundary terms of the summation. Meanwhile, to distinguish regular summation from trapezoidal approximation, we use the notation

\sum_{x}^{naive} Δ x f (x)

to signify a regular summation as a more naive approximation of the integral.

The PDF

p (x, t)

is numerically estimated using a histogram with Rice’s rule applied, i.e., the number of bins is

n_{bins} = ⌊2 \sqrt[3]{n_{sample}}⌋

(with uniform bin width

Δ x = \frac{range of samples ’ values}{n_{bins}}

), which is rounded towards zero to avoid overestimating the number of bins needed. And for joint PDF

p (x_{i}, t_{i}; x_{j}, t_{j})

, since the bins are distributed in a 2D plane of

(x_{i}, x_{j})

, the number of bins in each dimension is rounded to

⌊\sqrt{⌊2 \sqrt[3]{n_{sample}}⌋}⌋

(and similarly for 3D joint probability as in the transfer entropy calculation, the number of bins in each dimension is rounded to

⌊\sqrt[3]{⌊2 \sqrt[3]{n_{sample}}⌋}⌋

). Combining all of the above, the information rate’s square will be approximated by

Γ_{x} {(t)}^{2} = 4 \int d x {[\partial_{t} \sqrt{p (x, t)}]}^{2} \approx 4 \sum_{x}^{Trapz .} \frac{Δ x}{{[Δ t]}^{2}} {[\sqrt{p (x, t + Δ t)} - \sqrt{p (x, t)}]}^{2},

(A2)

where the bin width

Δ x

can be moved into the square root and multiplied with the PDF to get the probability (mass) of finding a data sample in the x-th bin, which is estimated as

\frac{n_{sample} inside x - th bin}{n_{sample}}

, i.e., the number of data samples inside that bin divided by the number of all data samples (using the relevant functions in MATLAB or Python). The trapezoidal rule imposes a

1 / 2

factor on the first and last terms of summation, corresponding to the first and last bins.

For the causal information rate

Γ_{x_{1} \to x_{2}} (t) \overset{def}{=} Γ_{x_{2}}^{*} (t) - Γ_{x_{2}} (t)

, the

Γ_{x_{2}}^{*} (t)

can be estimated by

\begin{matrix} Γ_{x_{2}}^{*} {(t)}^{2} & = lim_{t_{*} \to t^{+}} \int d x_{1} d x_{2} p (x_{2}, t_{*}; x_{1}, t) {[\partial_{t_{*}} ln p (x_{2}, t_{*}; x_{1}, t)]}^{2} \\ = 4 lim_{t_{*} \to t^{+}} \int d x_{1} d x_{2} {[\partial_{t_{*}} \sqrt{p (x_{2}, t_{*}; x_{1}, t)}]}^{2} \\ \approx 4 \sum_{x_{2}, x_{1}} \frac{Δ x_{2} Δ x_{1}}{{[Δ t]}^{2}} {[\sqrt{p (x_{2}, t + Δ t; x_{1}, t)} - \sqrt{p (x_{2}, t; x_{1}, t)}]}^{2}, \end{matrix}

(A3)

where the number of bins in each of the

x_{1}

and

x_{2}

dimensions is rounded to

⌊\sqrt{⌊2 \sqrt[3]{n_{sample}}⌋}⌋

, and the

Γ_{x_{2}} {(t)}^{2}

can be estimated as above as

4 \int d x_{2} {[\partial_{t} \sqrt{p (x_{2}, t)}]}^{2} \approx 4 \sum_{x_{2}} \frac{Δ x_{2}}{{[Δ t]}^{2}} {[\sqrt{p (x_{2}, t + Δ t)} - \sqrt{p (x_{2}, t)}]}^{2}

using regular or trapezoidal summation. However, here for

Γ_{x_{2}} {(t)}^{2}

, the number of bins for

x_{2}

must not be chosen as

⌊2 \sqrt[3]{n_{sample}}⌋

following the 1D Rice’s rule, which is very critical to avoid insensible or inconsistent estimation of

Γ_{x_{1} \to x_{2}} (t)

, for which the reason is explained below.

Consider the quantity

Γ_{x_{2}}^{*} {(t)}^{2} - Γ_{x_{2}} {(t)}^{2}

; theoretically and by definition, the

d x_{2}

can be pulled outside the integral over

d x_{1}

to combine the two integrals into one integral as follows:

\begin{matrix} Γ_{x_{2}}^{*} {(t)}^{2} - Γ_{x_{2}} {(t)}^{2} = 4 lim_{t_{*} \to t^{+}} \int d x_{2} \{\int {[\partial_{t_{*}} \sqrt{p (x_{2}, t_{*}; x_{1}, t)}]}^{2} d x_{1} - {[\partial_{t} \sqrt{p (x_{2}, t)}]}^{2}\}, \end{matrix}

(A4)

and the corresponding numerical approximations of integrals should be combined as

\begin{matrix} \approx 4 \sum_{x_{2}} Δ x_{2} \{\sum_{x_{1}} {[\sqrt{p (x_{2}, t + Δ t; x_{1}, t)} - \sqrt{p (x_{2}, t; x_{1}, t)}]}^{2} \frac{Δ x_{1}}{{[Δ t]}^{2}} \\ - \frac{{[\sqrt{p (x_{2}, t + Δ t)} - \sqrt{p (x_{2}, t)}]}^{2}}{{[Δ t]}^{2}}\}, \end{matrix}

(A5)

where the sum over

x_{2}

is performed on the same bins for both of the two terms inside the large braces

\{\cdot\}

above. On the other hand, if one numerically approximates

Γ_{x_{2}}^{*} {(t)}^{2}

and

Γ_{x_{2}} {(t)}^{2}

separately as

\begin{matrix} Γ_{x_{2}}^{*} {(t)}^{2} - Γ_{x_{2}} {(t)}^{2} & \approx 4 \sum_{x_{2}, x_{1}} \frac{Δ x_{2} Δ x_{1}}{{[Δ t]}^{2}} {[\sqrt{p (x_{2}, t + Δ t; x_{1}, t)} - \sqrt{p (x_{2}, t; x_{1}, t)}]}^{2} \\ - 4 \sum_{x_{2}} \frac{Δ x_{2}}{{[Δ t]}^{2}} {[\sqrt{p (x_{2}, t + Δ t)} - \sqrt{p (x_{2}, t)}]}^{2}, \end{matrix}

(A6)

then the sum over

x_{2}

in the second term

4 \sum_{x_{2}} \frac{Δ x_{2}}{{[Δ t]}^{2}} {[\sqrt{p (x_{2}, t + Δ t)} - \sqrt{p (x_{2}, t)}]}^{2}

should still be performed on the same bins of

x_{2}

for the first term involving the joint PDFs estimated by 2D histograms (i.e., using the square root number of bins

⌊\sqrt{⌊2 \sqrt[3]{n_{sample}}⌋}⌋

of Rice’s rule, instead of following the 1D Rice’s rule without the square root), even though this second summation term is written as a separate and “independent” term from the first double-summation term. The definition

Γ_{x_{1} \to x_{2}} (t) \overset{def}{=} Γ_{x_{2}}^{*} (t) - Γ_{x_{2}} (t)

might result in a misimpression that one can estimate

Γ_{x_{2}} {(t)}^{2} \approx 4 \sum_{x_{2}} \frac{Δ x_{2}}{{[Δ t]}^{2}} {[\sqrt{p (x_{2}, t + Δ t)} - \sqrt{p (x_{2}, t)}]}^{2}

separately by using a Rice’s rule’s binning method containing

⌊2 \sqrt[3]{n_{sample}}⌋

bins, while estimating

Γ_{x_{2}}^{*} {(t)}^{2} \approx 4 \sum_{x_{2}, x_{1}} \frac{Δ x_{2} Δ x_{1}}{{[Δ t]}^{2}} {[\sqrt{p (x_{2}, t + Δ t; x_{1}, t)} - \sqrt{p (x_{2}, t; x_{1}, t)}]}^{2}

using the square root of Rice’s rule’s number of bins

⌊\sqrt{⌊2 \sqrt[3]{n_{sample}}⌋}⌋

. Using different bins for

x_{2}

will make it invalid to combine the two summations into one summation over the same

x_{2}

’s (and hence invalid to combine the two integrals into one integral by pulling out the same

d x_{2}

).

Using

⌊2 \sqrt[3]{n_{sample}}⌋

bins for

x_{2}

will overestimate the value of

Γ_{x_{2}} {(t)}^{2} \approx 4 \sum_{x_{2}} \frac{Δ x_{2}}{{[Δ t]}^{2}} {[\sqrt{p (x_{2}, t + Δ t)} - \sqrt{p (x_{2}, t)}]}^{2}

, for example, if there are 1 million samples/data points to estimate the PDFs, then

⌊2 \sqrt[3]{n_{sample}}⌋ = 200

for 1D distribution and

⌊\sqrt{⌊2 \sqrt[3]{n_{sample}}⌋}⌋ \approx 14

for 2D joint distribution. Calculating

Γ_{x_{2}} {(t)}^{2} \approx 4 \sum_{x_{2}} \frac{Δ x_{2}}{{[Δ t]}^{2}} {[\sqrt{p (x_{2}, t + Δ t)} - \sqrt{p (x_{2}, t)}]}^{2}

using 200 bins will result in a much larger value than calculating it using 14 bins, which will result in negative values in calculating the causal information rate

Γ_{x_{1} \to x_{2}} (t) = Γ_{x_{2}}^{*} (t) - Γ_{x_{2}} (t)

. When using the same 14 bins of

x_{2}

(for estimating the 2D joint PDF of

(x_{1}, x_{2})

) to estimate the 1D PDF in

Γ_{x_{2}} {(t)}^{2}

\approx 4 \sum_{x_{2}} \frac{Δ x_{2}}{{[Δ t]}^{2}} {[\sqrt{p (x_{2}, t + Δ t)} - \sqrt{p (x_{2}, t)}]}^{2}

, all the unreasonable negative values disappear, except for only some isolated negative values remained, which is related to estimating

Γ_{x_{2}}^{*} {(t)}^{2}

and

Γ_{x_{2}} {(t)}^{2}

using 1D and 2D trapezoidal rules for summations approximating the integrals: if one uses 1D trapezoidal summation for

Γ_{x_{2}} {(t)}^{2}

\approx 4 \sum_{x_{2}}^{Trapz} .

\frac{Δ x_{2}}{{[Δ t]}^{2}} {[\sqrt{p (x_{2}, t + Δ t)} - \sqrt{p (x_{2}, t)}]}^{2}

, while on the other hand, one blindly and inconsistently uses 2D trapezoidal summation for

Γ_{x_{2}}^{*} {(t)}^{2} \approx 4 \sum_{x_{2}, x_{1}}^{Trapz} . 2 D

\frac{Δ x_{2} Δ x_{1}}{{[Δ t]}^{2}} [\sqrt{p (x_{2}, t + Δ t; x_{1}, t)}

- \sqrt{p (x_{2}, t; x_{1}, t)}]^{2}

, this will also result in some negative values in computing

Γ_{x_{1} \to x_{2}} (t)

= Γ_{x_{2}}^{*} (t) - Γ_{x_{2}} (t)

, because the 2D trapezoidal sum will under-estimate the

Γ_{x_{2}}^{*} {(t)}^{2}

as compared to the 1D trapezoidal-sum-estimated

Γ_{x_{2}} {(t)}^{2}

.

To resolve this inconsistent mixing of 1D and 2D trapezoidal rules, there are two possible methods:

Using 2D trapezoidal rule for both $Γ_{x_{2}}^{*} {(t)}^{2}$ and $Γ_{x_{2}} {(t)}^{2}$ , that is, $Γ_{x_{2}}^{*} {(t)}^{2} \approx 4 \sum_{x_{2}, x_{1}}^{Trapz . 2 D}$ $\frac{Δ x_{2} Δ x_{1}}{{[Δ t]}^{2}} [\sqrt{p (x_{2}, t + Δ t; x_{1}, t)} -$ $\sqrt{p (x_{2}, t; x_{1}, t)}]^{2}$ , and $Γ_{x_{2}} {(t)}^{2} \approx 4 \sum_{x_{2}}^{Trapz .}$ $\frac{Δ x_{2}}{{[Δ t]}^{2}} {[\sqrt{p (x_{2}, t + Δ t)} - \sqrt{p (x_{2}, t)}]}^{2} \approx 4 \sum_{x_{2}}^{Trapz .} \frac{Δ x_{2}}{{[Δ t]}^{2}} {[\sqrt{\sum_{x_{1}}^{Trapz .} Δ x_{1} p (x_{2}, t + Δ t; x_{1}, t)} - \sqrt{\sum_{x_{1}}^{Trapz .} Δ x_{1} p (x_{2}, t; x_{1}, t)}]}^{2}$ . In other words, when calculating $Γ_{x_{2}} {(t)}^{2}$ , instead of estimating marginal PDF $p (x_{2}, t + Δ t)$ and $p (x_{2}, t)$ directly by 1D histograms (using the relevant functions in MATLAB or Python), one first estimates the joint PDF $p (x_{2}, t + Δ t; x_{1}, t)$ and $p (x_{2}, t; x_{1}, t)$ by 2D histograms and integrates over $x_{1}$ by trapezoidal summation on it. This will reduce the value of estimated $Γ_{x_{2}} (t)$ , and integrals over both $x_{1}$ and $x_{2}$ are both estimated by trapezoidal summation.
Using the 1D trapezoidal rule for both $Γ_{x_{2}} (t)$ and $Γ_{x_{2}}^{*} (t)$ , that is, $Γ_{x_{2}} {(t)}^{2} \approx 4 \sum_{x_{2}}^{Trapz .}$ $\frac{Δ x_{2}}{{[Δ t]}^{2}} {[\sqrt{p (x_{2}, t + Δ t)} - \sqrt{p (x_{2}, t)}]}^{2} = 4 \sum_{x_{2}}^{Trapz .} \frac{Δ x_{2}}{{[Δ t]}^{2}} [\sqrt{\sum_{x_{1}}^{naive} Δ x_{1} \cdot p (x_{2}, t + Δ t; x_{1}, t)}$ $- \sqrt{\sum_{x_{1}}^{naive} Δ x_{1} \cdot p (x_{2}, t; x_{1}, t)}]^{2}$ , and $Γ_{x_{2}}^{*} {(t)}^{2} \approx \frac{4}{{[Δ t]}^{2}} \sum_{x_{2}}^{Trapz .} Δ x_{2} \{\sum_{x_{1}}^{naive} Δ x_{1} {[\sqrt{p (x_{2}, t + Δ t; x_{1}, t)} - \sqrt{p (x_{2}, t; x_{1}, t)}]}^{2}\}$ . In this approach, the marginal PDF $p (x_{2}, t) = \int p (x_{2}, t; x_{1}, t) d x_{1}$ , where the equal sign holds exactly for the regular or naive summation $p (x_{2}, t) = \sum_{x_{1}}^{naive} Δ x_{1} \cdot p (x_{2}, t; x_{1}, t)$ . This is because the histogram estimation in MATLAB and Python is performed by counting the occurrence of data samples inside each bin, and the probability (mass) is estimated as $\frac{n_{sample} inside x - th bin}{n_{sample}}$ , and the density is estimated as $\frac{n_{sample} inside x - th bin}{n_{sample} \cdot Δ x}$ , where $Δ x$ is the width of the x-th bin (and for 2D histogram, this is replaced by bin area $A_{x_{1}, x_{2}} = Δ x_{1} \cdot Δ x_{2}$ ), and therefore, summing over $x_{1}$ is aggregating the 2D bins of $(x_{1}, x_{2})$ and combining or mixing samples with $x_{2}$ -values/coordinates in the same $x_{2}$ -bin (but with $x_{1}$ -values/coordinates in different $x_{1}$ -bins) together. In other words, it is always true that $n_{x_{2}} = \sum_{x_{1}}^{naive} n_{x_{1}, x_{2}}$ , where $n_{x_{2}}$ is the number of samples inside the $x_{2}$ -th bin and $n_{x_{1}, x_{2}}$ is number of samples inside the $(x_{1}, x_{2})$ -th bin in 2D, and hence, for estimated probability (mass), $\frac{n_{x_{2}}}{n_{sample}} = \sum_{x_{1}}^{naive} \frac{n_{x_{1}, x_{2}}}{n_{sample}}$ , and for estimated PDFs, $\frac{n_{x_{2}}}{n_{sample} \cdot Δ x_{2}} = \sum_{x_{1}}^{naive} \frac{n_{x_{1}, x_{2}}}{n_{sample} \cdot Δ x_{2} \cdot Δ x_{1}} \cdot Δ x_{1}$ , which is why $p (x_{2}, t)$ $= \sum_{x_{1}}^{naive} Δ x_{1} \cdot p (x_{2}, t; x_{1}, t)$ holds exactly for numerically estimated marginal and joint PDFs using histograms, which is consistent with the theoretical relation between marginal and joint PDFs $p (x_{2}, t) = \int p (x_{2}, t; x_{1}, t) d x_{1}$ , and this has been numerically verified using the relevant 1D and 2D histogram functions in MATLAB and Python, i.e., by (naively) summing the estimated joint PDF over $x_{1}$ , and the (naively) summed marginal is exactly the same as the one estimated directly by 1D histogram function. So in this approach, integral over $x_{1}$ is estimated by naive summation on $x_{1}$ , but integral over $x_{2}$ is estimated by trapezoidal summation on $x_{2}$ .

The 1st approach will violate the relation between joint and marginal

p (x_{2}, t)

= \int p (x_{2}, t; x_{1}, t) d x_{1}

, because as explained in the 2nd approach above, when using MATLAB’s and Python’s 1D and 2D histogram functions, one will always get exactly

p (x_{2}, t)

= \sum_{x_{1}}^{naive} Δ x_{1} \cdot p (x_{2}, t; x_{1}, t)

and

p (x_{2}, t + Δ t) = \sum_{x_{1}}^{naive} Δ x_{1} \cdot p (x_{2} + Δ t, t; x_{1}, t)

for naive summation, but not for trapezoidal summation over

x_{1}

due to the weights/factors (≠1) imposed on the “corner”/boundary/first/last summation terms, which is used in the 1st approach. However, the 2nd approach puts different importance or weights on the summation over

x_{1}

as compared to

x_{2}

, which might also be problematic, because the original definition is a double integral over

x_{1}

and

x_{2}

without different weights/factors imposed by different summation methods.

To resolve this, we use the regular or naive summations on both

x_{1}

and

x_{2}

, which avoids the issues in both the 1st and 2nd approaches, and we find that the numerical difference between the 1st and 2nd approaches and our adopted simply naive summations are really negligible, and because in this work, we are performing empirical statistics on the estimated causal information rates and illustrating the qualitative features of the empirical probability distributions of them, we use our simple naive summations over both

x_{1}

and

x_{2}

when estimating

Γ_{x_{2}}^{*} {(t)}^{2}

and

Γ_{x_{2}} {(t)}^{2}

in causal information rate

Γ_{x_{1} \to x_{2}} (t) = Γ_{x_{2}}^{*} (t) - Γ_{x_{2}} (t)

.

Appendix B. Complete Results: All Six Groups of Initial Conditions

For completeness, we list the full results of all figures for all six different initial Gaussian distributions listed in Table 3.

Appendix B.1. Sample Trajectories of $x_{1} (t)$ and $x_{2} (t)$

Appendix B.1.1. Initial Conditions No.1 (IC1)

Figure A1. Initial Conditions No.1 (IC1): 50 sample trajectories of healthy CTL subjects. Each single trajectory is labeled by a different color.

Figure A2. Initial Conditions No.1 (IC1): 50 sample trajectories of AD patients. Each single trajectory is labeled by a different color.

Appendix B.1.2. Initial Conditions No.2 (IC2)

Figure A3. Initial Conditions No.2 (IC2): 50 sample trajectories of healthy CTL subjects. Each single trajectory is labeled by a different color.

Figure A4. Initial Conditions No.2 (IC2): 50 sample trajectories of AD patients. Each single trajectory is labeled by a different color.

Appendix B.1.3. Initial Conditions No.3 (IC3)

Figure A5. Initial Conditions No.3 (IC3): 50 sample trajectories of healthy CTL subjects. Each single trajectory is labeled by a different color.

Figure A6. Initial Conditions No.3 (IC3): 50 sample trajectories of AD patients. Each single trajectory is labeled by a different color.

Appendix B.1.4. Initial Conditions No.4 (IC4)

Figure A7. Initial Conditions No.4 (IC4): 50 sample trajectories of healthy CTL subjects. Each single trajectory is labeled by a different color.

Figure A8. Initial Conditions No.4 (IC4): 50 sample trajectories of AD patients. Each single trajectory is labeled by a different color.

Appendix B.1.5. Initial Conditions No.5 (IC5)

Figure A9. Initial Conditions No.5 (IC5): 50 sample trajectories of healthy CTL subjects. Each single trajectory is labeled by a different color.

Figure A10. Initial Conditions No.5 (IC5): 50 sample trajectories of AD patients. Each single trajectory is labeled by a different color.

Appendix B.1.6. Initial Conditions No.6 (IC6)

Figure A11. Initial Conditions No.6 (IC6): 50 sample trajectories of healthy CTL subjects. Each single trajectory is labeled by a different color.

Figure A12. Initial Conditions No.6 (IC6): 50 sample trajectories of AD patients. Each single trajectory is labeled by a different color.

Appendix B.2. Time Evolution of PDF $p (x_{1}, t)$ and $p (x_{2}, t)$

Appendix B.2.1. Initial Conditions No.1 (IC1)

Figure A13. Initial Conditions No.1 (IC1): Time evolution of estimated PDFs of healthy CTL subjects.

Figure A14. Initial Conditions No.1 (IC1): Time evolution of estimated PDFs of AD patients.

Appendix B.2.2. Initial Conditions No.2 (IC2)

Figure A15. Initial Conditions No.2 (IC2): Time evolution of estimated PDFs of healthy CTL subjects.

Figure A16. Initial Conditions No.2 (IC2): Time evolution of estimated PDFs of AD patients.

Appendix B.2.3. Initial Conditions No.3 (IC3)

Figure A17. Initial Conditions No.3 (IC3): Time evolution of estimated PDFs of healthy CTL subjects.

Figure A18. Initial Conditions No.3 (IC3): Time evolution of estimated PDFs of AD patients.

Appendix B.2.4. Initial Conditions No.4 (IC4)

Figure A19. Initial Conditions No.4 (IC4): Time evolution of estimated PDFs of healthy CTL subjects.

Figure A20. Initial Conditions No.4 (IC4): Time evolution of estimated PDFs of AD patients.

Appendix B.2.5. Initial Conditions No.5 (IC5)

Figure A21. Initial Conditions No.5 (IC5): Time evolution of estimated PDFs of healthy CTL subjects.

Figure A22. Initial Conditions No.5 (IC5): Time evolution of estimated PDFs of AD patients.

Appendix B.2.6. Initial Conditions No.6 (IC6)

Figure A23. Initial Conditions No.6 (IC6): Time evolution of estimated PDFs of healthy CTL subjects.

Figure A24. Initial Conditions No.6 (IC6): Time evolution of estimated PDFs of AD patients.

Appendix B.3. Information Rates $Γ_{x_{1}} (t)$ and $Γ_{x_{2}} (t)$

Appendix B.3.1. Time Evolution: Information Rates

Initial Conditions No.1 (IC1)

Figure A25. Initial Conditions No.1 (IC1): Information rates along time of CTL and AD subjects.

Initial Conditions No.2 (IC2)

Figure A26. Initial Conditions No.2 (IC2): Information rates along time of CTL and AD subjects.

Initial Conditions No.3 (IC3)

Figure A27. Initial Conditions No.3 (IC3): Information rates along time of CTL and AD subjects.

Initial Conditions No.4 (IC4)

Figure A28. Initial Conditions No.4 (IC4): Information rates along time of CTL and AD subjects.

Initial Conditions No.5 (IC5)

Figure A29. Initial Conditions No.5 (IC5): Information rates along time of CTL and AD subjects.

Initial Conditions No.6 (IC6)

Figure A30. Initial Conditions No.6 (IC6): Information rates along time of CTL and AD subjects.

Appendix B.3.2. Empirical Probability Distribution: Information Rates (for $t \geq 7.5$ )

Figure A31. Empirical probability distributions of information rates

Γ_{x_{1}} (t)

and

Γ_{x_{2}} (t)

(t \geq 7.5)

.

Figure A31. Empirical probability distributions of information rates

Γ_{x_{1}} (t)

and

Γ_{x_{2}} (t)

(t \geq 7.5)

.

Appendix B.3.3. Phase Portraits: Information Rates (for $t \geq 7.5$ )

Figure A32. Phase portraits of information rates

Γ_{x_{1}} (t)

vs.

Γ_{x_{2}} (t)

(t \geq 7.5)

.

Figure A32. Phase portraits of information rates

Γ_{x_{1}} (t)

vs.

Γ_{x_{2}} (t)

(t \geq 7.5)

.

Initial Conditions No.1 (IC1):

Figure A33. Initial Conditions No.1 (IC1): Phase portraits of information rates

Γ_{x_{1}} (t)

vs.

Γ_{x_{2}} (t)

(t \geq 7.5)

of CTL and AD subjects.

Figure A33. Initial Conditions No.1 (IC1): Phase portraits of information rates

Γ_{x_{1}} (t)

vs.

Γ_{x_{2}} (t)

(t \geq 7.5)

of CTL and AD subjects.

Initial Conditions No.2 (IC2):

Figure A34. Initial Conditions No.2 (IC2): Phase portraits of information rates

Γ_{x_{1}} (t)

vs.

Γ_{x_{2}} (t)

(t \geq 7.5)

of CTL and AD subjects.

Figure A34. Initial Conditions No.2 (IC2): Phase portraits of information rates

Γ_{x_{1}} (t)

vs.

Γ_{x_{2}} (t)

(t \geq 7.5)

of CTL and AD subjects.

Initial Conditions No.3 (IC3):

Figure A35. Initial Conditions No.3 (IC3): Phase portraits of information rates

Γ_{x_{1}} (t)

vs.

Γ_{x_{2}} (t)

(t \geq 7.5)

of CTL and AD subjects.

Figure A35. Initial Conditions No.3 (IC3): Phase portraits of information rates

Γ_{x_{1}} (t)

vs.

Γ_{x_{2}} (t)

(t \geq 7.5)

of CTL and AD subjects.

Initial Conditions No.4 (IC4):

Figure A36. Initial Conditions No.4 (IC4): Phase portraits of information rates

Γ_{x_{1}} (t)

vs.

Γ_{x_{2}} (t)

(t \geq 7.5)

of CTL and AD subjects.

Figure A36. Initial Conditions No.4 (IC4): Phase portraits of information rates

Γ_{x_{1}} (t)

vs.

Γ_{x_{2}} (t)

(t \geq 7.5)

of CTL and AD subjects.

Initial Conditions No.5 (IC5):

Figure A37. Initial Conditions No.5 (IC5): Phase portraits of information rates

Γ_{x_{1}} (t)

vs.

Γ_{x_{2}} (t)

(t \geq 7.5)

of CTL and AD subjects.

Figure A37. Initial Conditions No.5 (IC5): Phase portraits of information rates

Γ_{x_{1}} (t)

vs.

Γ_{x_{2}} (t)

(t \geq 7.5)

of CTL and AD subjects.

Initial Conditions No.6 (IC6):

Figure A38. Initial Conditions No.6 (IC6): Phase portraits of information rates

Γ_{x_{1}} (t)

vs.

Γ_{x_{2}} (t)

(t \geq 7.5)

of CTL and AD subjects.

Figure A38. Initial Conditions No.6 (IC6): Phase portraits of information rates

Γ_{x_{1}} (t)

vs.

Γ_{x_{2}} (t)

(t \geq 7.5)

of CTL and AD subjects.

of CTL and AD subjects.

Appendix B.4. Shannon Differential Entropy of $p (x_{1}, t)$ and $p (x_{2}, t)$

Appendix B.4.1. Time Evolution: Shannon Differential Entropy