**Information Transfer in Linear Multivariate Processes Assessed through Penalized Regression Techniques: Validation and Application to Physiological Networks**

### **Yuri Antonacci 1,2\*, Laura Astolfi 1,2, Giandomenico Nollo <sup>3</sup> and Luca Faes <sup>4</sup>**


Received: 16 May 2020; Accepted: 26 June 2020; Published: 1 July 2020

**Abstract:** The framework of information dynamics allows the dissection of the information processed in a network of multiple interacting dynamical systems into meaningful elements of computation that quantify the information generated in a target system, stored in it, transferred to it from one or more source systems, and modified in a synergistic or redundant way. The concepts of information transfer and modification have been recently formulated in the context of linear parametric modeling of vector stochastic processes, linking them to the notion of Granger causality and providing efficient tools for their computation based on the state–space (SS) representation of vector autoregressive (VAR) models. Despite their high computational reliability these tools still suffer from estimation problems which emerge, in the case of low ratio between data points available and the number of time series, when VAR identification is performed via the standard ordinary least squares (OLS). In this work we propose to replace the OLS with penalized regression performed through the Least Absolute Shrinkage and Selection Operator (LASSO), prior to computation of the measures of information transfer and information modification. First, simulating networks of several coupled Gaussian systems with complex interactions, we show that the LASSO regression allows, also in conditions of data paucity, to accurately reconstruct both the underlying network topology and the expected patterns of information transfer. Then we apply the proposed VAR-SS-LASSO approach to a challenging application context, i.e., the study of the physiological network of brain and peripheral interactions probed in humans under different conditions of rest and mental stress. Our results, which document the possibility to extract physiologically plausible patterns of interaction between the cardiovascular, respiratory and brain wave amplitudes, open the way to the use of our new analysis tools to explore the emerging field of Network Physiology in several practical applications.

**Keywords:** information dynamics; partial information decomposition; entropy; conditional transfer entropy; network physiology; multivariate time series analysis; State–space models; vector autoregressive model; penalized regression techniques; linear prediction

#### **1. Introduction**

Physiological systems such as the cerebral, cardiac, vascular and respiratory system exhibit a dynamic activity which results from the continuous modulation of multiple control mechanisms and changes transiently across different physiological states. Accordingly, the human body can be modeled as an ensemble of complex physiological systems, each with its own regulatory mechanisms, that dynamically interact to preserve the physiological functions [1]. These interactions are commonly studied in a non-invasive way by recording physiological signals that are subsequently elaborated to extract time series of interest which reflect the dynamic state of the system under analysis [2,3]. Many studies in the literature have provided strong evidence about the existence of a relationship between the properties of time series extracted and the physiological functions, even if most of these evidences come from the analysis of the dynamics within a single system (i.e., variability of heart rate, activity or connectivity within brain networks [4,5]) or at most between two systems (cardiovascular, cardio-respiratory and brain–heart interactions [6,7]). Only recently, with the introduction of the concept of network physiology grounded on a system-wide integration approach, it has been possible to analyze the physiological interactions in a fully multivariate fashion. With this approach, the various physiological systems that compose the human organism are considered to be the nodes of a complex network [8]. Nevertheless, identifying a network comprised of different dynamic physiological systems is a non-trivial task that requires the development of methodological approaches able to take into account the intrinsically multivariate nature of the network, and to describe the different aspects of network activity and connectivity dealing with complex dynamics and intricate topological structures.

Recent studies in the context of information theory have shown how the information processing in a network of multiple interacting dynamical systems, described by multivariate stochastic processes, can be dissected into basic elements of computation defined with the so-called framework of information dynamics [9]. These elements essentially reflect the new information produced at each moment in time about a target system in the network, the information stored in the target system, the information transferred to it from other connected systems and the modification of the information flowing from multiple sources to the target [10]. In particular, the information transfer defines the information that a group of systems designed as "sources" provide about the present state of the target [11]; information modification is strongly related to the concept of redundancy and synergy between two source systems sharing information about a target system, which refers to the existence of common information about the target that can be recovered when the sources are used separately (redundancy) or when they are used jointly (synergy) [12]. Thus, positive values of information modification indicate net synergy, which reflects the concept of information independence of the sources. On the other hand, negative values of information modification indicate redundancy, which reflects the fact that no additional information is conveyed about the target system when the two sources are considered together rather than in isolation [13]. Operational definitions of these concepts have been recently proposed, also showing how—for Gaussian processes modeled within a linear multivariate framework—the information transferred between two network nodes conditioning to the remaining nodes corresponds to the well-known measure of Granger causality (GC) formulated in a multivariate context [14], and the measures of redundancy and synergy can be obtained as separate measures through a so-called partial information decomposition (PID) [15].

The tools of information dynamics have contributed substantially to the development of the field of Network Physiology, with particular regard to the description of complex organ system interactions in various physiological states and conditions. In fact, measures information transfer and information modification have proven useful to the understanding of the dynamic interactions that are essential to produce different physiological states, e.g., wake and sleep [7,8,16,17], rest and physiological stress [18,19], relaxed conditions and mental workload [20,21], neutral states and emotion elicitation [22,23]. However, despite its growing appeal and widespread use in physiology and in diverse branches of science [24–27], the field of information dynamics is still under development and different aspects have to be further explored to fully exploit its potential. Recent developments have led to the formulation of a computational framework for the analysis of information dynamics which makes use of the state–space (SS) formulation of vector autoregressive models (VAR) and of the formation of reduced linear regression models [28,29] whose prediction error variance is related to the entropies needed for the computation of GC and PID measures [30]. The framework exhibits high computational reliability when compared with classical regression approaches for the estimation of Granger-causal measures [30], and is being increasingly used to assess information dynamics in the context of Network Physiology [3,19].

Nevertheless, being based entirely on linear parametric modeling, it suffers from the known vulnerability to the lack of data of the standard VAR identification techniques such as the Ordinary Least Square (OLS) or the Levison's recursive algorithm for the solution of Yule-Walker equations. This issue exposes the identification process to increased bias and variance of the estimated parameters [31], and may result in ill-posed regression problems when the regressor's matrix approaches singularity [32]. As pointed out in the literature, the ratio between the number of data samples available and the number of regression coefficients to be estimated should be at least equal to 10 to guarantee the accuracy of the estimation procedure [31,33,34]. This implies that the length of the time series used for VAR identification needs to increase proportionally with the number of processes jointly analyzed, which imposes a limitation to the size of the network that can be investigated if short datasets are available for the analysis. This is the case of common Network Physiology applications, where typically only short realizations of stationary multivariate physiological processes are available due to the different temporal scales and dynamics of the physiological signals involved.

To cope with the reduction of accuracy in the estimation process when dealing with a large number of time series and/or a small amount of data samples available, different strategies have been proposed in the literature such as the so-called partial conditioning [35] or the use of time-ordered restricted VAR models that are specifically built only for the computation of GC [36]. A former, more general solution is the use of penalized regression techniques that regularize a linear regression problem using one or more constraints [37]. Among them, the Least Absolute Shrinkage and Selection Operator (LASSO) uses a constraint based on the *l*<sup>1</sup> norm that if applied directly on the regression problem, yields to a sparse coefficients matrix which leads to a reduction of the mean square error in conditions of data paucity [38]. Penalized regression techniques implemented for GC analysis have been successfully applied in many different contexts, ranging from simulation studies [39] to the analysis of electroencephalographic signals [34,40,41], neuroimaging data [42] and Macroeconomic data [43]. In the present work, the LASSO regression is embedded in the VAR-SS framework for the computation of information dynamics, and is compared with the traditional OLS regression as regards its capability to estimate conditional information transfer and PID measures both in benchmark networks of simulated multivariate processes and in real networks of multiple physiological time series.

We show that it is possible, also in conditions of data paucity, to accurately reconstruct both the topology and the patterns of information transfer in networks of several coupled Gaussian systems exhibiting complex interactions, and to extract physiologically plausible patterns of interaction between the cardiovascular, respiratory and brain systems explored in healthy subjects during different conditions of mental stress elicited by sustained attention or mental arithmetic tasks [3,21,44].

The algorithms for the VAR-SS model identification based on the LASSO regression, with the subsequent computation of conditional information transfer and PID measures, are collected in the PID-LASSO MATLAB toolbox, which can be downloaded from http://github.com/YuriAntonacci/PID-LASSO-toolbox and http://lucafaes.net/PIDlasso.html (in Supplementary Materials).

#### **2. Materials and Methods**

#### *2.1. Vector Autoregressive Model Identification*

Let us consider a dynamical system Y, whose activity is mapped by a discrete-time stationary vector stochastic process composed of *M* real-valued zero-mean scalar processes, **Y** = [*Y*<sup>1</sup> ··· *YM*]. Considering the time step *n* as the current time, the present and the past of the vector stochastic process are denoted as **Y***<sup>n</sup>* = [*Y*1,*<sup>n</sup>* ··· *YM*,*n*] and **Y**<sup>−</sup> *<sup>n</sup>* = [**Y***n*−1**Y***n*−<sup>2</sup> ··· ], respectively. Moreover, assuming that **Y** is a Markov process of order *p*, its whole past history can be truncated using *p* time steps, i.e., using the

*Mp*-dimensional vector **Y***<sup>p</sup> <sup>n</sup>* such that **Y**− *<sup>n</sup>* <sup>≈</sup> **<sup>Y</sup>***<sup>p</sup> <sup>n</sup>* = [**Y***n*−<sup>1</sup> ··· **Y***n*−*p*]. Then, in the linear signal processing framework, the dynamics of *Y* can be completely described by the Vector autoregressive (VAR) model:

$$\mathbf{Y}\_{\text{fl}} = \sum\_{k=1}^{p} \mathbf{Y}\_{n-k} \mathbf{A}\_{k} + \mathbf{U}\_{\text{fl}} \tag{1}$$

where **A***<sup>k</sup>* is an *M* × *M* matrix containing the autoregressive (AR) coefficients, and **U** = [*U*<sup>1</sup> ··· *UM*] is a vector of *M* zero-mean white processes, denoted as innovations, with *M* × *M* covariance matrix **<sup>Σ</sup>** <sup>≡</sup> <sup>E</sup>[**U***<sup>T</sup> <sup>n</sup>***U***n*] (E is the expectation value).

Let us now consider a realization of the process **Y** involving *N* consecutive time steps, collected in the *N* × *M* data matrix [**y**1; ··· ; **y***N*], where the operator ";" stands for row separation, so that the *i th* row is a realization of **Y***i*, i.e., **y***<sup>i</sup>* = [*y*1,*i*...*yM*,*i*], *i* = 1, ..., *N*, and the *j th* column is the time series collecting all realizations of *Yj*, i.e., [*yj*,1...*yj*,*N*] *<sup>T</sup>*, *j* = 1, ..., *M*, . The Ordinary Least Square (OLS) identification finds an optimal solution for the problem (1) by solving the following linear quadratic problem:

$$\widehat{\mathbf{A}} = \arg\min\_{\mathbf{A}} ||\mathbf{y} - \mathbf{y}^p \mathbf{A}||\_{\mathcal{D}'} \tag{2}$$

where **<sup>y</sup>** = [**y***p*+1; ··· ; **<sup>y</sup>***N*] is the (*<sup>N</sup>* <sup>−</sup> *<sup>p</sup>*) <sup>×</sup> *<sup>M</sup>* matrix of the predicted values, **<sup>y</sup>***<sup>p</sup>* = [**y***<sup>p</sup> <sup>p</sup>*+1; ··· ; **<sup>y</sup>***<sup>p</sup> <sup>N</sup>*] is the (*N* − *p*) × *Mp* matrix of the regressors and **A** = [**A**1; ··· ; **A***p*] is the *Mp* × *M* coefficient matrix. The problem has a solution in a closed form **<sup>A</sup>** = ([**y***p*] *<sup>T</sup>***y***p*)−1[**y***p*] *<sup>T</sup>***y** for which the residual sum of squares is minimized (RSS) [33,45]. When *N* − *p* ≤ *Mp* the OLS does not guarantee the uniqueness of the solution since the matrix ([**y***p*] *<sup>T</sup>***y***p*) becomes singular [34,45]. Even in this situation, it is possible to solve the problem stated in Equation (1) through the Least Absolute Shrinkage and Selection Operator (LASSO) which introduces a constraint in the linear quadratic problem (2) [37]:

$$\widehat{\mathbf{A}} = \arg\min\_{\mathbf{A}} \left( ||\mathbf{y} - \mathbf{y}^p \mathbf{A}||\_2^2 + \lambda ||\mathbf{A}||\_1 \right). \tag{3}$$

In Equation (3), the additional term based on the *l*<sup>1</sup> norm forces a sparse a solution such that some of the VAR coefficients are shrunk to zero, with the shrinkage parameter *λ* controlling the trade-off between the number of non-zero coefficients selected in the matrix **A** and the residual sum of squares (RSS). Even if the problem (3) admits a solution, it will not be in a closed form since the *l*<sup>1</sup> norm is not differentiable at zero [38]. The optimal value of *λ* for the solution of the problem (3) requires a cross-validation approach for its determination. Typically, a predefined interval of values for *λ* is defined such that the biggest value provides an estimated AR matrix of zeroes and the lowest provides a dense AR matrix [46] (in this work, 300 values of *λ* were selected). Subsequently, using an hold-out approach, as described in [47], it is possible to independently draw 90% of the observations of the predicted values and of the regressors (rows of **y** and **y***p*) as training set and keeping the remaining 10% for the testing set. Training and test sets are then reduced to zero mean and unit variance and, for each assigned *λ*, the number of non-zero coefficients is evaluated for the matrix **A** estimated from the training set, and the corresponding RSS is computed on the test set. After repeating this operation several times (10 in this work) by randomly changing the training and testing sets, the optimal value of *λ* is chosen as the one that minimizes the ratio between RSS and the number of non-zero VAR coefficients [48]. The matrix of AR coefficients **A** is then estimated by using the estimated optimal value of *λ*.

#### *2.2. Measures of Information Transfer*

Considering the overall observed process **Y** = [*Y*<sup>1</sup> ··· *YM*], let us assume *Yj* as the *target* process and *Yi* as the *source* process, with the remaining *M* − 2 processes collected in the vector **Y***<sup>s</sup>* where *s* = {1, ..., *M*}\{*i*, *j*}. Then, the transfer entropy (TE) from *Yi* to *Yj* quantifies the amount of information that the past of the source, *Y<sup>p</sup> <sup>i</sup>*,*n*, provides about the present of the target, *Yj*,*n*, over and above the information already provided by the past of the target itself , *Y<sup>p</sup> <sup>j</sup>*,*n*, and is defined as follows [2,49]:

$$T\_{\dot{\imath}\rightarrow\dot{\jmath}} = I(\boldsymbol{\Upsilon}\_{\dot{\jmath},n};\boldsymbol{\Upsilon}\_{\dot{\imath},n}^p|\boldsymbol{\Upsilon}\_{\dot{\jmath},n}^p) = H(\boldsymbol{\Upsilon}\_{\dot{\jmath},n}|\boldsymbol{\Upsilon}\_{\dot{\jmath},n}^p) - H(\boldsymbol{\Upsilon}\_{\dot{\jmath},n}|\boldsymbol{\Upsilon}\_{\dot{\jmath},n}^p|\boldsymbol{\Upsilon}\_{\dot{\jmath},n}^p) \tag{4}$$

where *I*(·; ·|·) represents the conditional mutual information and *H*(·|·) represents the conditional entropy [50]. In the presence of two sources *Yi* and *Yk*, the information transferred towards the target *Yj* from the two sources taken together is quantified by the joint transfer entropy (jTE):

$$T\_{i\mathbf{ik}\rightarrow j} = I(\mathbf{Y}\_{\mathbf{j},\mathbf{n}};\mathbf{Y}\_{\mathbf{i},\mathbf{n}'}^p \mathbf{Y}\_{\mathbf{k},\mathbf{n}}^p | \mathbf{Y}\_{\mathbf{j},\mathbf{n}}^p) = H(\mathbf{Y}\_{\mathbf{j},\mathbf{n}} | \mathbf{Y}\_{\mathbf{j},\mathbf{n}}^p) - H(\mathbf{Y}\_{\mathbf{j},\mathbf{n}} | \mathbf{Y}\_{\mathbf{j},\mathbf{n}'}^p \mathbf{Y}\_{\mathbf{i},\mathbf{n}'}^p \mathbf{Y}\_{\mathbf{k},\mathbf{n}}^p) \tag{5}$$

where *Y<sup>p</sup> <sup>k</sup>*,*<sup>n</sup>* represents the past of the source *k*. Then, a possible way to decompose the jTE is that provided by the so-called partial information decomposition (PID). The PID expands the information transferred jointly from two sources to a target in four different quantities, reflecting the unique information transferred from each individual source to the target, measured by the unique TEs *Ui*→*<sup>j</sup>* and *Uk*→*j*, and the redundant and synergistic information transferred from the two sources to the target, measured by the redundant TE *Rik*→*<sup>j</sup>* and the synergistic TE *Sik*→*<sup>j</sup>* [51]. These four measures are related to each other and to the joint and individual TEs from each source to the target by the following equations:

$$T\_{ik \to j} = \mathcal{U}\_{i \to j} + \mathcal{U}\_{k \to j} + R\_{ik \to j} + S\_{ik \to j\prime} \tag{6}$$

$$T\_{i \to j} = \mathcal{U}\_{i \to j} + \mathcal{R}\_{ik \to j} \tag{7}$$

$$T\_{k \to j} = \mathcal{U}\_{k \to j} + \mathcal{R}\_{ik \to j} \tag{8}$$

In the PID defined above, the terms *Ui*→*<sup>j</sup>* and *Uk*→*<sup>j</sup>* quantify the parts of the information transferred to the target process *Yj* which are unique to the source processes *Yi* and *Yk*, respectively, mirroring the contributions to the predictability of the target that can be obtained from one of the sources but not from the other. Each of these unique contributions sums up with the redundant TE to retrieve the information transfer defined by the classical measure of the bivariate TE, thus indicating that *Rik*→*<sup>j</sup>* pertains to the part of the information transferred individually, yet redundantly from a source to the target. The term *Sik*→*<sup>j</sup>* refers to the synergy between the two sources while they transfer information to the target, intended as the information that is uniquely obtained taking the two sources *Yi* and *Yk* together, but not considering them alone. While several implementations of the PID exists depending on how a fourth equation is formulated to complete the definitions (6-8), in the case of joint Gaussian processes it has been shown that an unifying formulation is that defining the redundant transfer as the minimum information transferred individually by each source to the target, i.e., *Rik*→*<sup>j</sup>* = *min*(*Ti*→*j*, *Tk*→*j*) [15].

In addition to the measures defining the PID, another important information measure used to detect the topological structure of direct interactions in a network of *M* interacting processes is the conditional transfer entropy (cTE). With the notation introduced above for the overall vector process **Y**, the cTE from a driver process *Yi* to a target process *Yj* computed considering the other processes in the network collected in **Y***s*, is defined as:

$$T\_{i \to \vec{j} \mid s} = I(Y\_{\vec{j},n'} \mathbf{Y}\_{\vec{i},n}^p | \mathbf{Y}\_{\vec{j},n'}^p \mathbf{Y}\_{s,n}^p) = H(Y\_{\vec{j},n} | \mathbf{Y}\_{\vec{j},n'}^p \mathbf{Y}\_{s,n}^p) - H(Y\_{\vec{j},n} | \mathbf{Y}\_n^p) \tag{9}$$

The cTE quantifies the amount of information contained in the present state of the target process that can be predicted by the past states of the source process, above and beyond the information that is predicted already by the past states of the target and of the all other processes [14]. An implication of this definition is that non-zero values of the cTE *Ti*→*j*|*<sup>s</sup>* correspond to the presence of a direct causal interaction from *Yi* to *Yj*, which is typically depicted, in a network representation where nodes are associated with processes and edges with significant causal interactions, with an arrow connecting the *i th* and *j th* nodes.

#### *2.3. Computation of the Measures of Information Transfer for Multivariate Gaussian Processes*

When the observed multivariate process **Y** has a joint Gaussian distribution, the informationtheoretic measures described in Section 2.2 can be formulated in an exact way based on the linear VAR representation provided in Section 2.1. Indeed, it has been shown that the covariance matrices of the observed vector process and of the residuals of the formulation (1) contain, in the case of jointly distributed Gaussian processes, all of the entropy differences which are needed to compute the information transfer [52]. In turn, these entropy differences are expressed by the concept of partial covariance formulated in the context of linear regression analysis. Specifically, defining *Ej*|*j*,*<sup>n</sup>* = *Yj*,*<sup>n</sup>* <sup>−</sup> <sup>E</sup>[*Yj*,*n*|*Y<sup>p</sup> <sup>j</sup>*,*n*] and *Ej*|*ij*,*<sup>n</sup>* <sup>=</sup> *Yj*,*<sup>n</sup>* <sup>−</sup> <sup>E</sup>[*Yj*,*n*|*Y<sup>p</sup> i*,*n*,*Y<sup>p</sup> <sup>j</sup>*,*n*] as the prediction errors of a linear regression of *Yj*,*<sup>n</sup>* performed respectively on *Y<sup>p</sup> <sup>j</sup>*,*<sup>n</sup>* and [*Y<sup>p</sup> i*,*nY<sup>p</sup> <sup>j</sup>*,*n*], the conditional entropies *<sup>H</sup>*(*Yj*,*n*|*Y<sup>p</sup> <sup>j</sup>*,*n*) and *<sup>H</sup>*(*Yj*,*n*|*Y<sup>p</sup> j*,*n*,*Y<sup>p</sup> i*,*n*) can be expressed as functions of the prediction error variances *<sup>λ</sup>j*|*<sup>j</sup>* <sup>=</sup> <sup>E</sup>[*E*<sup>2</sup> *<sup>j</sup>*|*j*,*n*] and *<sup>λ</sup>j*|*ij* <sup>=</sup> <sup>E</sup>[*E*<sup>2</sup> *<sup>j</sup>*|*ij*,*n*] as follows [14,53]:

$$H(\boldsymbol{\chi}\_{\boldsymbol{j},n}|\boldsymbol{\chi}\_{\boldsymbol{j},n}^p) = \frac{1}{2} \ln 2\pi e \lambda\_{\boldsymbol{j}|\boldsymbol{j},\boldsymbol{\epsilon}} \tag{10a}$$

$$H(\mathcal{Y}\_{j,\mathfrak{u}}|\mathcal{Y}\_{j,\mathfrak{u}'}^p, \mathcal{Y}\_{i,\mathfrak{u}}^p) = \frac{1}{2} \ln 2\pi e \lambda\_{j|\{i\}^\nu} \tag{10b}$$

from which the TE from *Yi* to *Yj* can be retrieved using (7):

$$T\_{i \to j} = \frac{1}{2} \ln \frac{\lambda\_{j|j}}{\lambda\_{j|ij}}.\tag{11}$$

Following similar reasoning, the jTE from (*Yi*,*Yk*) to *Yj* can be defined as:

$$T\_{ik \to j} = \frac{1}{2} \ln \frac{\lambda\_{j|j}}{\lambda\_{j|ijk}} \, \tag{12}$$

where *<sup>λ</sup>j*|*ijk* <sup>=</sup> <sup>E</sup>[*E*<sup>2</sup> *<sup>j</sup>*|*ijk*,*n*] is the variance of the prediction error of a linear regression of *Yj*,*<sup>n</sup>* on (*Y<sup>p</sup> i*,*n*,*Y<sup>p</sup> j*,*n*,*Y<sup>p</sup> <sup>k</sup>*,*n*) with prediction error *Ej*|*ijk*,*<sup>n</sup>* <sup>=</sup> *Yj*,*<sup>n</sup>* <sup>−</sup> <sup>E</sup>[*Yj*,*n*|*Y<sup>p</sup> i*,*n*,*Y<sup>p</sup> j*,*n*,**Y***<sup>p</sup> <sup>s</sup>*,*n*], and the cTE from *Yi* to *Yj* given **Y***s* can be defined as:

$$T\_{\bar{\mathbf{i}}\rightarrow\bar{\mathbf{j}}|s} = \frac{1}{2} \ln \frac{\lambda\_{\bar{\mathbf{j}}|\bar{\mathbf{j}}s}}{\lambda\_{\bar{\mathbf{j}}|\bar{\mathbf{i}}\bar{\mathbf{j}}s}},\tag{13}$$

where *<sup>λ</sup>j*|*js* <sup>=</sup> <sup>E</sup>[*E*<sup>2</sup> *<sup>j</sup>*|*js*,*n*] is the variance of the prediction error of a linear regression of *Yj*,*<sup>n</sup>* on (*Y<sup>p</sup> <sup>j</sup>*,*n*, **<sup>Y</sup>***<sup>p</sup> <sup>s</sup>*,*n*) with prediction error *Ej*|*js*,*<sup>n</sup>* <sup>=</sup> *Yj*,*<sup>n</sup>* <sup>−</sup> <sup>E</sup>[*Yj*,*n*|*Y<sup>p</sup> j*,*n*,*Y<sup>p</sup> <sup>s</sup>*,*n*] and *<sup>λ</sup>j*|*ijs* <sup>=</sup> <sup>E</sup>[*E*<sup>2</sup> *<sup>j</sup>*|*ijs*,*n*] is the variance of the prediction error of a linear regression of *Yj*,*<sup>n</sup>* on **<sup>Y</sup>***<sup>p</sup> <sup>n</sup>* with prediction error *Ej*|*ijs*,*<sup>n</sup>* <sup>=</sup> *Yj*,*<sup>n</sup>* <sup>−</sup> <sup>E</sup>[*Yj*,*n*|**Y***<sup>p</sup> n*]. Moreover, from the definitions in Section 2.2 it is then possible to obtain the redundant TE, the synergistic TE and the unique TEs in addition to the cTE. Therefore, the computation of all the information measures amounts to calculate the partial variances to be inserted in Equations (11)–(13). In the following subsection we report how to derive such partial variances exploiting the State–Space formulation of the VAR model (1) [30].

#### 2.3.1. Formulation of State–Space Models

A discrete state–space (SS) model is a linear model in which a set of input, output and state variables are related by first order difference equations [29]. The VAR model (1) can be represented equivalently as an SS model ([54]) which relates the observed process **Y** to an unobserved state process **Z** through the observation equation

$$\mathbf{Y}\_n = \mathbf{C}\mathbf{Z}\_n + \mathbf{E}\_{n\prime} \tag{14}$$

and describes the update of the state process through the state equation

$$\mathbf{Z}\_{n+1} = \mathbf{A}Z\_n + \mathbf{K}E\_n.\tag{15}$$

The innovations **E***<sup>n</sup>* of Equations (14) and (15) are equivalent to the innovations **U***<sup>n</sup>* in (1) and thus have covariance matrix **<sup>Φ</sup>** <sup>≡</sup> <sup>E</sup>[**E***<sup>T</sup> <sup>n</sup>***E***n*] = **Σ**. This representation, typically denoted as "innovation form" SS model (ISS), also demonstrates the Kalman Gain matrix **K**, the state matrix **A** and the observation matrix **C**, which can all be computed from the original VAR parameters in (1) as reported in ([54]) . Starting from the parameters of an ISS model is possible to compute any partial variance *λj*|*a*, where the subscript *a* denotes any combination of indexes ∈ (1, ..., *M*), by evaluating the innovation of a "submodel" obtained removing from the observation Equation (14) the variables not included in *a*. Furthermore, in this formulation the state Equation (15) remains unaltered and the observation equation of relevant submodel becomes:

$$\mathbf{Y}\_n^{(a)} = \mathbf{C}^{(a)} \mathbf{Z}\_n + \mathbf{E}\_n^{(a)},\tag{16}$$

where the subscript *<sup>a</sup>* denotes the selection of the rows with indices *a* of a vector or a matrix. As demonstrated in [28,30], the submodel (15) and (16) is not in ISS form, but can be converted into ISS by solving a Discrete Algebraic Riccati equation (DARE). Then, the covariance matrix of the innovations **<sup>Φ</sup>**(*a*) <sup>=</sup> <sup>E</sup>[**E**(*a*)*<sup>T</sup> <sup>n</sup>* **<sup>E</sup>**(*a*) *<sup>n</sup>* ] includes the desired error variance *<sup>λ</sup>j*|*<sup>a</sup>* as diagonal element corresponding to the position of the target *Yj*. Thus, it is possible to compute all the partial variances needed for the evaluation of all the information measures introduced, starting from a set of ISS parameters. In particular, these parameters can be directly extracted by the knowledge of the parameters of the original VAR model (i.e., *A*1, ..., *Ap*, **Σ**) , which in this study are estimated by identifying the VAR model (1) making use of either the OLS method or the LASSO regression.

#### *2.4. Testing the Significance of the Conditional Transfer Entropy*

Since the cTE *Ti*→*j*|*<sup>s</sup>* is a measure of the information transferred directly (i.e., without following indirect paths) from the source *Yi* to the target *Yj*, and for Gaussian processes is equivalent to conditional Granger causality [14], it is of interest to perform the assessment of its statistical significance with the aim to establish the existence of a direct link from the *i th* node to the *j th* node of the observed network of interacting processes. In this work, the significance of cTE, computed after OLS identification of the VAR model, was tested generating sets of surrogate time series which share the same power spectrum of the original time series but are otherwise uncorrelated. Specifically, 100 sets of surrogate time series were generated using the Iterative Amplitude Adjusted Fourier Transform (IAAFT) procedure [55]; then, the cTE was estimated for each surrogate set, a threshold equal to the 95*th* percentile of its distribution on the surrogates was determined for each directed link, and the link was detected as statistically significant when the original cTE was above the threshold. In the case of LASSO, the statistical significance of the estimated cTE values was determined exploiting the sparseness of the identification procedure. Since LASSO model identification always produces a sparse matrix with several VAR coefficients equal to zero, the cTE values result exactly zero when the coefficients along the investigated direction are zero at each time lag; on the contrary, cTE is positive, and was considered to be statistically significant in this study, when at least one coefficient is non-zero along the considered direction.

#### **3. Simulation Experiments**

This section reports two simulation studies performing a systematic evaluation of the performances of the two VAR identification methodologies (OLS and LASSO) employed for the practical computation of the measures of information transfer in known networks assessed with different amount of data samples available. First, we study the behavior of the measures of information transfer and information modification in a four-variate VAR process specifically configured to reproduce coexisting forms of redundant and synergistic interactions between source processes sending information towards a target [15,30]. Second, with specific focus on the estimation of the cTE and of its statistical significance, we compared the ability of OLS and LASSO to reconstruct an assigned network topology in a ten-variate VAR process exhibiting a random interaction structure with fixed density of connected nodes [34,56]

#### *3.1. Simulation Study I*

#### 3.1.1. Simulation Design and Realization

Simulated multivariate time series (*M*=4) were generated as realizations of the following VAR(2) process depicted in Figure 1 [2,30,57]:

$$Y\_{1,n} = 2\rho\_1 \cos\left(2\pi f\_1\right) Y\_{1,n-1} - \rho\_1^2 Y\_{1,n-2} + \mathcal{U}\_{1,n} \tag{17a}$$

$$\mathcal{Y}\_{2,n} = 2\rho\_2 \cos\left(2\pi f\_2\right)\mathcal{Y}\_{2,n-1} - \rho\_2^2 \mathcal{Y}\_{2,n-2} + \mathcal{Y}\_{1,n-1} + \mathcal{U}\_{2,n} \tag{17b}$$

$$\mathcal{Y}\_{3,n} = 2\rho\_3 \cos \left(2\pi f\_3\right) \mathcal{Y}\_{3,n-1} - \rho\_3^2 \mathcal{Y}\_{3,n-2} + \mathcal{Y}\_{1,n-1} + \mathcal{U}\_{3,n} \tag{17c}$$

$$Y\_{4,n} = \frac{1}{2}Y\_{2,n-1} + \frac{1}{2}Y\_{3,n-1} + \mathcal{U}\_{4,n} \tag{17d}$$

In (17), **U** = [*U*<sup>1</sup> ... *U*4] is a vector of zero-mean uncorrelated white noises with unit variance (i.e., with covariance **Σ** ≡ *I*). The VAR parameters are selected to allow autonomous oscillations for *Y*1,*Y*2, and *Y*<sup>3</sup> by placing, in the VAR representation in the *Z*−domain, complex-conjugate poles with modulus *ρ<sup>i</sup>* and phase 2*π fi*, *i* = 1, 2, 3; here we set pole modulus *ρ*<sup>1</sup> = *ρ*<sup>2</sup> = *ρ*<sup>3</sup> = 0.95 and pole frequency *f*<sup>1</sup> = 0.1, *f*<sup>2</sup> = *f*<sup>3</sup> = 0.25. Moreover, interactions between different processes were set to allow a common driver effect *y*<sup>2</sup> ← *y*<sup>1</sup> → *y*<sup>3</sup> and unidirectional couplings *y*<sup>2</sup> → *y*<sup>4</sup> and *y*<sup>3</sup> → *y*4, with weights indicated in Figure 1. With these settings, 100 realizations of the processes were generated under different values of the parameter *K* defined as the ratio between the number of data samples available (*N*) and the number of AR coefficients to be estimated (*Mp*); the parameter *K* was varied in the range (1, 2, 5, 10, 30), so that the length of the simulated time series was *N* = 8 when *K* = 1 and *N* = 240 were when *K* = 30. For each realization and for each value of *K*, all the measures appearing in the PID of the information transfer were computed by exploiting the SS approach applied to the VAR parameters estimated through OLS or LASSO identification; PID analysis was performed considering either *Y*<sup>4</sup> or *Y*<sup>1</sup> as the target process, and both *Y*<sup>2</sup> and *Y*<sup>3</sup> as the source processes. Then, the bias and variance of each estimated PID measure were assessed, for each *K* and separately for OLS and LASSO, respectively as the absolute difference between the mean value of the measure over the 100 realizations and its theoretical value computed using the true values imposed for the VAR parameters, and as the sample variance estimated over the 100 realizations.

**Figure 1.** Graphical representation of the four-variate VAR ( Vector Autoregressive) process realized in the first simulation according to Equation (17). Network nodes represent the four simulated processes, and arrows represent the imposed causal interactions (self-loops depict influences from the past to the present sample of a process).

#### 3.1.2. Simulation Results

Figures 2 and 3 show the trends of bias and variance associated with the estimation of TE (*T*2→*j*, *T*3→*j*), redundant TE (*R*23→*j*), synergistic TE (*S*23→*j*) and unique TEs (*U*2→*j*, *U*3→*j*) respectively when *j* = 4 (target process *Y*4) and *j* = 1 (target process *Y*1), computed after VAR model identification using OLS (blue) and LASSO (red) and depicted as a function of the ratio *K* between time series length and number of model parameters.

As a general result, both figures show that the accuracy of all estimates of the PID measures is strongly influenced by the amount of data available, with a progressive increase of both the bias and the variance of the estimates with the decrease of the parameter *K*. The LASSO regression exhibits a substantially better performance in the estimation of the PID measures particularly when the amount of data samples is scarce (*K* ≤ 2). In the most challenging condition of *K* = 1 (number of AR coefficients equal to the number of data points) the results are reported only for the LASSO regression since in this condition for OLS it was impossible to evaluate the PID measures due to the non-convergence of the DARE equation solution during the computation. In the other cases (*K* ∈ (5, 10, 30)) the two identification methods show comparable trends, with slightly better performance exhibited by OLS identification in the assessment of non-zero PID measures (Figure 2), and by LASSO identification in the assessment of zero PID measures (Figure 3).

**Figure 2.** Accuracy of PID ( Partial Information Decomposition) measures computed for the VAR processes of Simulation I when *Y*<sup>4</sup> is taken as the target process. Panels report the bias (**a**–**c**) and the variance (**d**–**f**) relevant the computation of the TE (Transfer Entropy) from *Y*<sup>2</sup> to *Y*<sup>4</sup> and from *Y*<sup>3</sup> to *Y*<sup>4</sup> (a,d), the unique TE from *Y*<sup>2</sup> to *Y*<sup>4</sup> and from *Y*<sup>3</sup> to *Y*<sup>4</sup> (b,e) and the redundant and synergistic TE from *Y*<sup>2</sup> and *Y*<sup>3</sup> to *Y*<sup>4</sup> (c,f).

In fact, when *Y*<sup>4</sup> is taken as target process, the sources *Y*<sup>2</sup> and *Y*<sup>3</sup> send the same amount information towards the target and this information is entirely redundant (*T*2→<sup>4</sup> = *T*3→<sup>4</sup> = *R*23→<sup>4</sup> = 0.63, *U*2→<sup>4</sup> = *U*3→<sup>4</sup> = 0); moreover, a non-negligible amount of synergistic information transfer is present (*S*23→<sup>4</sup> = 0.56) [30]. As reported in Figure 2, the estimates of the non-zero quantities (*T*2→4, *T*3→4, *R*23→4, *S*23→4) assessed through LASSO-VAR identification exhibit higher variance than those assessed through the OLS, as well a slight negative bias which becomes relevant only in the case of the synergistic TE; in such a case the underestimation of *S*23→<sup>4</sup> is present also after OLS identification when *K* = 2 (Figure 2c).

When the process *Y*<sup>1</sup> is taken as the target, all the PID measures are null (*T*2→<sup>1</sup> = *T*3→<sup>1</sup> = *U*2→<sup>1</sup> = *U*3→<sup>1</sup> = *S*23→<sup>1</sup> = *R*23→<sup>1</sup> = 0) because no causal interactions are directed towards *Y*1. As shown in Figure 3, in this case the LASSO identification outperforms the OLS method, showing lower bias and variance for all values of *K* with evident improvement in the performance when *K* ≤ 2. Interestingly, for low values of *K* the LASSO regression detected the absence of synergy with more accuracy than that of redundancy (Figure 3c,f).

**Figure 3.** Accuracy of PID measures computed for the VAR processes of Simulation I when *Y*<sup>1</sup> is taken as the target process. Panels report the bias (**a**–**c**) and the variance (**d**–**f**) relevant the computation of the TE from *Y*<sup>2</sup> to *Y*<sup>1</sup> and from *Y*<sup>3</sup> to *Y*<sup>1</sup> (a,d), the unique TE from *Y*<sup>2</sup> to *Y*<sup>1</sup> and from *Y*<sup>3</sup> to *Y*<sup>1</sup> (b,e) and the redundant and synergistic TE from *Y*<sup>2</sup> and *Y*<sup>3</sup> to *Y*<sup>1</sup> (c,f).

#### *3.2. Simulation Study II*

#### 3.2.1. Simulation Design and Realization

Simulated multivariate time series (*M* = 10) were generated as realizations of a VAR(10) model fed by white Gaussian noises with variance equal to 0.1. The simulated networks have a ground-truth structure with a density of connected nodes equal to 50% in which non-zero AR parameters were set assigning randomly the lag in the range (1–10) and the coefficient value in the interval [−0.6, 0.6] [58]. A representative example of one possible generated network is shown in Figure 4, where the strength of the directed links is provided by the theoretical cTE computed between two processes starting from the true AR parameters. Under these constraints, 100 realizations (each with its specific network structure) of the VAR(10) process were generated with different values of the parameter *K* in the range

(1, 2, 5, 10, 30), so that the length of the simulated time series was *N* = 100 when *K* = 1 and *N* = 3000 were when *K* = 30. For each realization and for each value of *K*, the cTE between each pair of processes was computed by exploiting the SS approach applied to the VAR parameters estimated through OLS or LASSO identification. Then, the bias and variance of the cTE estimates obtained through OLS and LASSO identification were assessed separately for the connections with zero and non-zero cTE as explained in the following subsection.

**Figure 4.** Graphical representation for one of the ground-truth networks of Simulation II. Arrows represent the existence of a link, randomly assigned, between two nodes in the network. The thickness of the arrows is proportional to the strength of the connection, with a maximum value for the cTE equal to 0.15. The number of connections for each network is set to 45 out of 90.

#### 3.2.2. Performance Evaluation

The performances of LASSO and OLS were assessed both in terms of the accuracy in estimating the strength of the network links through the absolute values of the cTE measure, and in terms of the ability to reconstruct the network structure through the assessment of the statistical significance of cTE. The first analysis was performed separately for non-null and null links computing the bias of cTE through the comparison between the estimated and theoretical cTE values. Specifically, for each pair of network nodes represented by the processes *Yi* and *Yj*, the theoretical cTE obtained from the true VAR parameters, *Ti*→*j*|*s*, was compared with the corresponding estimated cTE value, *T <sup>i</sup>*→*j*|*s*, using a measure of absolute bias (*bias*) if the theoretical link is null, and a normalized measure of bias (*biasN*) if the theoretical link is non-null [59]:

$$
\hat{a}\text{ bias} = |T\_{i \to j|s} - \hat{T}\_{i \to j|s}|\_{\prime} \tag{18a}
$$

$$bias\_N = \left| \frac{T\_{\bar{i} \to j \mid s} - \hat{T}\_{\bar{i} \to j \mid s}}{T\_{\bar{i} \to j \mid s}} \right|. \tag{18b}$$

Then, for each network, the values of *bias* and *biasN* were averaged respectively across the 45 non-null links and across the 45 null links to get individual measures, denoted as *BIAS* and *BIASN*. Finally, the distributions of *BIAS* and *BIASN* were assessed across the 100 simulated network structures and presented separately for OLS and LASSO.

Second, the ability of OLS and LASSO to detect the absence or presence of network links based on the statistical significance of the cTE was tested comparing the two adjacency matrices representative of the estimated and theoretical network structures. This can be seen as a binary classification task where the existence (class 1) or absence (class 0) of each estimated connection is assessed (using surrogate data for OLS and looking for zero/non-zero estimated coefficients for LASSO) and compared with the underlying ground-truth structure. Performances were assessed through the computation of the false positive rate (FPR, measuring the fraction of null links for which a statistically significant cTE was detected), false negative rate (FNR, measuring the fraction of non-null links for which the cTE was detected as non-significant) and accuracy (ACC, measuring the fraction of false detections) parameters [40,60]. Each of these performance measures was obtained across the network links for

each individual network, and its distribution across the 100 simulated network structures was then presented separately for OLS and LASSO.

#### 3.2.3. Statistical Analysis

For this simulation study, five different repeated measures two-way ANOVA tests, one for each performance parameter (*BIAS*,*BIASN*,*FNR*,*FPR*,*ACC*) were performed, to evaluate the effects of different values of K (varied in the range [30, 10, 5, 2]) and different identification methodologies ([*OLS*, *LASSO*]) on performance parameters.

The Greenhouse–Geisser correction for the violation of the spherical hypothesis was used in all analyses. The Tukey's post-hoc test was used for testing the differences between sub-levels of ANOVA factors. The Bonferroni-Holm correction was applied for multiple ANOVAs computed on different performance parameters.

#### 3.2.4. Results of the Simulation Study

The results of the two-way repeated measures ANOVAs, expressed in terms of F-values and computed separately on all the performance parameters considering K and TYPE (identification method used) as main factors, are reported in Table 1.



The two-way ANOVAs reveal a strong statistical influence of the main factors K and TYPE and of their interaction on all the performance parameters analyzed. It is worth of note that the level *K* = 1 was not considered in the statistical analysis due to the non-convergence of the DARE equation for the OLS case.

Figure 5 reports the distribution of the parameters *BIAS* and *BIASN* according to the interaction factor K x TYPE.

**Figure 5.** Distribution of the bias parameters computed for the null links (*BIAS*, **a**) and for the non-null links (*BIASN*, **b**) considering the interaction factor K x TYPE, expressed as mean value and 95% confidence interval of the parameter computed across 100 realizations of simulation II for OLS (blue line) and LASSO (red line) for different values of K.

The comparison of the two VAR identification procedures shows that the trends for LASSO (red line) and OLS (blue line) are very different. In the analysis of the error committed in the estimation of the null links (parameter *BIAS*) the error of LASSO estimates is almost zero for all levels of K (even for *K* ≤ 2 that are the most challenging situations), while OLS estimates show a sharp increase of the error with the decrease of data samples available for the estimation of cTE (Figure 5a). The analysis of the error committed in the estimation of the non-null links (parameter *BIASN*, Figure 5b) highlights that for both methods the error increases with decreasing the value of K. The two identification methods exhibit different performance as a function of the number of data samples available for the estimation procedure: when such number is high (*K* = 30), the OLS assumes a significantly smaller bias than LASSO; when 10 ≤ *K* ≤ 5 there are no significant differences between the two methods; in the most challenging conditions with *K* < 5 OLS exhibits a drastic rise of *BIASN* towards 2 (which means an overestimation up to 200%), while LASSO identification allows limitation of the bias which remains below 1 even when *K* = 1.

Figure 6 reports the distributions of the parameters FPR, FNR and ACC according to the interaction K x TYPE. The analysis of the rate of false negatives (Figure 6a) shows that the number of links incorrectly classified as null increases while decreasing the amount of data available (*K* decreasing from 10 to 2), with values of FNR rising from about 0.1 to about 0.6 using the OLS, and remaining much lower (between 0 and 0.2) using LASSO identification. On the other hand, the analysis of the rate of false positives (Figure 6b) returns opposite trends, with several absent links incorrectly classified as non-null which is stable and almost negligible using OLS, and exhibits a slight growth that leads the FPR value from 0 with K=30 to about 0.25 for K=1. The overall performance assessed through the ACC parameter is better using LASSO identification (Figure 6c): the rate of correctly detected links is comparable in the favorable condition *K* = 30, while when *K* ≤ 10 LASSO shows better performance (significantly higher values of ACC) than OLS and can reconstruct the network structure with a very good accuracy (∼ 80%) even in the challenging condition of *K* = 1.

**Figure 6.** Distributions of *FNR* (**a**), *FPR* (**b**) and *ACC* (**c**) parameters considering the interaction factor K x TYPE, expressed as mean value and 95% confidence interval of the parameter computed across 100 realizations of simulation II for OLS (blue line) and LASSO (red line) for different values of K.

#### **4. Application to Physiological Time Series**

This section reports the application of the measures of information transfer, based on VAR models, estimated through OLS or LASSO identification, to a dataset of physiological time series previously collected with the aim of studying organ system interactions during different levels of mental stress [3]. The physiological time series measured for each subject were considered to be a realization of a vector stochastic process descriptive of the behavior of a composite dynamical system which forms a network of physiological interactions. Such network is composed of two distinct sub-networks, which are in turn formed by three nodes ("body" or peripheral sub-network) and four nodes (brain sub-network). The dynamic activity at each network node is quantified by a scalar process, as specifically defined in the next subsection.

#### *4.1. Data Acquisition and Pre-Processing*

Eighteen healthy participants with an age between 18 and 30 years were recorded during three different tasks inducing different levels of mental stress: a resting condition induced watching a relaxing video (R); a condition of mental stress induced by the execution of a mental arithmetic task (M) using an online tool in which the participants had to perform sums and subtractions of 3-digit numbers and write the solution in a text-box using the keyboard; a condition of sustained attention induced playing a serious game (G) which consisted of following a point moving on the screen using the mouse and trying to avoid different obstacles. All participants provided written informed consent. The experiment was approved by the Ethics Committees of the University of Trento. The study was in accordance with the Declaration of Helsinki.

The acquired physiological signals were the Electrocardiogram (ECG) signal, the respiratory signal (RESP) measured monitoring abdominal movements, the blood volume pulse (BVP) signal measured through a photoplethysmographic technique, and 14 Electroencephalogram (EEG) signals recorded at different locations in the scalp. After a pre-processing step performed in MatLab R2016b (Mathworks, Natick, MA, USA), seven physiological time series, each consisting of 300 data points and taken as a realization of the stochastic process representing the activity of specific physiological (sub)systems, were extracted from the recorded signals as follows: (1) the R-R tachogram, represented by the sequence of the time distances between consecutive R peaks of the ECG (process *η*); (2) The series of respiratory amplitude values, sampled at the onset of each detected R-R interval (process *ρ*): (3) the pulse arrival time (process *π*) obtained computing the time elapsed between each R peak in the ECG and the corresponding point of maximum derivative in BVP signal; the sequences of the EEG power spectral density, measured in consecutive time windows (lasting 2 s with 1 s overlap) of the EEG signal acquired at the electrode *Fz*, integrated within the bands 0.5 − 3*Hz* (process *δ*), 3 − 8*Hz* (process *θ*), 8 − 12*Hz* (process *α*), and 12 − 25*Hz* (process *β*). Before VAR modeling, the time series were reduced to zero mean and unit variance and checked for a restricted form of weak sense stationarity using the algorithm proposed in [61], which divides each time series into a given number of randomly selected sub-windows, assessing for each of them the stationarity of mean and variance. A detailed description of signal recording, experimental protocol and time series extraction can be found in [3,21].

#### *4.2. Information Transfer Analysis*

The seven time series obtained from each subject and from each condition were interpreted as a realization of a VAR process whose parameters *A*1, ..., *Ap*, **Σ** were estimated with the two different identification methods under analysis (i.e., OLS and LASSO). The model order *p* was estimated, for each experimental condition and for each subject, using the Bayesian Information Criterion [62]. Then, two different analyses were performed through the application of the SS approach:


link weights the percentage of subjects showing at least one statistically significant brain-to-body connection (and vice-versa). To study the involvement of each specific node in the network, the in-strength of each node was computed considering as link weights the cTE values of all network links pointing into the considered node.

#### *4.3. Statistical Analysis*

The effect of the different experimental conditions (R,M,G) on each PID measure computed for each target process (*j* = [*η*, *ρ*, *π*, *δ*, *θ*, *α*, *β*]) and for each VAR identification method (OLS, LASSO) was assessed with a Kruskal-Wallis test followed by a Wilcoxon rank sum test to assess statistical differences between pairs of conditions. Moreover, the Wilcoxon rank sum test was performed also to assess statistical differences between the two unique TEs (*Ui*→*j*,*Uk*→*j*) or between the redundant and synergistic TEs (*Rik*→*j*,*Sik*→*j*) assessed for a given experimental condition and for a given target process and identification method. Finally, in order to assess the effect of the experimental condition on the in-strength evaluated for each node in the network, a Kruskal-Wallis test was performed, followed by the Wilcoxon rank sum test between pairs of conditions.

#### *4.4. Results of Real Data Application*

The results of PID analysis, describing how information is transferred within the observed network of brain–body interactions, are reported respectively in Figure 7 (OLS results) and Figure 8 (LASSO results) for the targets belonging to the body sub-network (*η*,*ρ*,*π*), and in Figure 9 (OLS results) and Figure 10 (LASSO results) for the targets belonging to the brain sub-network (*δ*,*θ*,*α*,*β*). The results of cTE analysis, illustrating the topology of the detected physiological networks, are reported in Figure 11 (direct links), Figure 12 (brain–body interactions) and Figure 13 (in-strength). All analyses are performed identifying VAR models of dimension *Mp*, where *M* = 7 and *p* ∼ 4 (depending on the Bayesian Information Criterion) on time series of 300 points, which brought us to work with values *K* ∼ 10 for the parameter relating the amount of data sample available to the model dimension.

#### 4.4.1. Partial Information Decomposition

Figures 7 and 8 report, respectively for OLS and LASSO estimation, the distributions across subjects of the joint TE (*Tik*→*j*, left panels) directed to each target *j* belonging to the body sub-network from the two other body sources (index *i*) and from the four brain sources (index *k*), as well as of its decomposition into unique TEs (*Ui*→*<sup>j</sup>* and *Uk*→*j*, middle panels) and redundant and synergistic TEs (*Rik*→*j*, *Sik*→*j*, right panels), evaluated at rest (R), during mental stress (M) and serious game (G).

Figure 7 shows that for each target in the body sub-network, the trends of the joint TE (*Tik*→*j*, Figure 7a,d,g) are mostly determined by the processes belonging to the same sub-network, as documented by the substantial values of the unique information transfer *Ui*→*<sup>j</sup>* and the negligible values of the unique transfer *Uk*→*<sup>j</sup>* (Figure 7b,e,h, with statistically significant difference between *Ui*→*<sup>j</sup>* and *Uk*→*j*) and by the low values of the information transferred to *η*, *ρ* and *π* in a synergistic or redundant way from the brain and body sub-networks (Figure 7c,f,i). While for the targets *η* and *ρ* the PID measures did not vary significantly across conditions, the information transferred jointly from the brain and body sources towards the target *π* (Figure 7g) as well as the unique information transferred to *π* internally in the body sub-network (Figure 7h) decreased significantly moving from R to M and from R to G. This result documents a reduction of the causal interactions from RR interval and respiration towards the pulse arrival time during conditions of mental stress.

**Figure 7.** Partial Information Decomposition of brain–body interactions directed to the body nodes of the physiological network, assessed using OLS VAR identification. Box plots report the distributions across subjects (median: red lines; interquartile range: box; 10*th*–90*th* percentiles: blue lines) as well as the individual values (circles or triangles) of the PID measures (**a**,**d**,**g**: joint information transfer; **b**,**e**,**h**: unique information transfer; **c**,**f**,**i**: synergistic and redundant transfer) computed at rest (R), during mental stress (M) and during serious game (G) considering the RR interval (*η*), the respiratory amplitude (*ρ*), or the pulse arrival time (*π*) as the target process *j*, and the body and brain sub-networks as source processes *i* and *k*. Statistically significant differences between pairs of distributions are marked with ∗ (R vs. M), with # (R vs. G), with § (R vs. R), with ∼ (M vs. M) and with ◦ (G vs. G).

As reported in Figure 8, the trends of the joint TEs computed after LASSO identification when the processes *η* and *π* (a-g) are taken as target are comparable to those obtained with OLS identification and shown in Figure 7. In particular, also in this case a significant reduction of the joint TE directed to *π* is observed during the conditions M and G compared to R (Figure 8g), which is mostly due to a decrease of the unique information transferred to *π* from the body source (*Ui*→*j*, Figure 8h). Moreover, also in this case the unique TE directed towards *η* and *π* from the brain sub-network (*Uk*→*j*, Figure 8b,h) shows values very close to zero (b-h) and significantly lower than those of the unique TE *Ui*→*j*. While the synergistic TE *Sik*→*<sup>j</sup>* is almost zero for any target, the redundant TE *Rik*→*<sup>j</sup>* is significantly higher than *Sik*→*<sup>j</sup>* when the target is the vascular process *π* (Figure 8i). A result demonstrated specifically using the LASSO identification method is the absence of joint TE directed to the respiration process *ρ* (Figure 8d), documenting the absence of interactions directed toward respiration in all physiological conditions.

**Figure 8.** Partial Information Decomposition of brain–body interactions directed to the body nodes of the physiological network, assessed using LASSO-VAR identification. Box plots report the distributions across subjects (median: red lines; interquartile range: box; 10*th*–90*th* percentiles: blue lines) as well as the individual values (circles or triangles) of the PID measures (**a**,**d**,**g**: joint information transfer; **b**,**e**,**h**: unique information transfer; **c**,**f**,**i**: synergistic and redundant transfer) computed at rest (R), during mental stress (M) and during serious game (G) considering the RR interval (*η*), the respiratory amplitude (*ρ*), or the pulse arrival time (*π*) as the target process *j*, and the body and brain sub-networks as source processes *i* and *k*. Statistically significant differences between pairs of distributions are marked with ∗ (R vs. M), with # (R vs. G), with § (R vs. R), with ∼ (M vs. M) and with ◦ (G vs. G).

Figures 9 and 10 report, respectively for OLS and LASSO estimation, the distributions across subjects of the joint TE (*Tik*→*j*, left panels) directed to each target *j* belonging to the brain sub-network from the three other brain sources (index *k*) and from the three body sources (index *i*), as well as of its decomposition into unique TEs (*Ui*→*<sup>j</sup>* and *Uk*→*j*, middle panels) and redundant and synergistic TEs (*Rik*→*j*, *Sik*→*j*, right panels), evaluated at rest (R) and during mental stress (M) and serious game (G).

Considering the joint TE exchanged toward the brain rhythms, in contrast to what observed for the body sub-network (Figure 7a,e,g), the joint TE assessed through OLS identification shows a tendency to increase during M and especially during G compared to R (Figure 9 a,d,g,j); the increase is statistically significant for the *δ* (Figure 9a), and is supported by a significant increase of the redundant and synergistic TEs *Rik*→*<sup>j</sup>* and *Sik*→*<sup>j</sup>* which suggests an increased contribution of brain–body interactions to the rhythmic variations of the *δ* brain wave amplitude. An increase of the redundant brain–body interactions during stress states is observed also for the *θ* brain wave amplitude (Figure 9f). The analysis of the unique information transfer (Figure 9b,e,h,k) shows that the unique information provided by the brain sub-network (*Uk*→*j*) is generally larger than that provided by the body sub-network (*Uk*→*j*), with statistically significant differences during R and when the target of the unique transfer is given by the processes *θ*, *α* and *β*.

**Figure 9.** Partial Information Decomposition of brain–body interactions directed to the brain nodes of the physiological network, assessed using OLS VAR identification. Box plots report the distributions across subjects (median: red lines; interquartile range: box; 10*th*–90*th* percentiles: blue lines) as well as the individual values (circles or triangles) of the PID measures (**a**,**d**,**g**,**j**: joint information transfer; **b**,**e**,**h**,**k**: unique information transfer; **c**,**f**,**i**,**l**: synergistic and redundant transfer) computed at rest (R), during mental stress (M) and during serious game (G) considering the *δ*, *θ*, *α*, or *β* brain wave amplitude as the target process *j*, and the body and brain sub-networks as source processes *i* and *k*. Statistically significant differences between pairs of distributions are marked with ∗ (R vs. M), with # (R vs. G), with § (R vs. R), with ∼ (M vs. M) and with ◦ (G vs. G).

When PID directed towards the brain processes is computed using LASSO (Figure 10), a main result is that interactions are weak and do not vary significantly across physiological states. Notably, the joint TE and all PID terms relevant to the target *δ* are almost equal to zero in all conditions (Figure 10a,b,c). Similarly, also the values of the unique TE from the body sub-network to any brain process (*Ui*→*j*, Figure 10b,e,h,k) and of both the redundant and synergistic TE (*Rik*→*j*, *Sik*→*j*, Figure 10c,f,i,l) are zero in almost all subjects and conditions, indicating that the LASSO approach does not detect interactions directed from body to brain in this dataset.

**Figure 10.** Partial Information Decomposition of brain–body interactions directed to the brain nodes of the physiological network, assessed using LASSO-VAR identification. Box plots report the distributions across subjects (median: red lines; interquartile range: box; 10*th*–90*th* percentiles: blue lines) as well as the individual values (circles or triangles) of the PID measures (**a**,**d**,**g**,**j**: joint information transfer; **b**,**e**,**h**,**k**: unique information transfer; **c**,**f**,**i**,**l**: synergistic and redundant transfer) computed at rest (R), during mental stress (M) and during serious game (G) considering the *δ*, *θ*, *α*, or *β* brain wave amplitude as the target process *j*, and the body and brain sub-networks as source processes *i* and *k*. Statistically significant differences between pairs of distributions are marked with ∗ (R vs. M), with # (R vs. G), with § (R vs. R), with ∼ (M vs. M) and with ◦ (G vs. G).

#### 4.4.2. Conditional Information Transfer

Figure 11 reports the network of physiological interactions reconstructed through the detection of the statistically significant values of the conditional transfer entropy (*Ti*→*j*|*s*) computed for any pair of processes belonging to the brain and body sub-networks. The weighted arrows, depicting the most active connections among systems (arrows are present when at least 3 subjects show significant values of *Ti*→*j*|*s*) show a similar structure when estimated in the three analyzed conditions using OLS (Figure 11a–c) and LASSO (Figure 11d–f) . The main distinctive features are the existence of a densely connected sub-network of body interactions (red arrows), of a weakly connected sub-network of brain interactions (yellow arrows), and of changing patterns of brain–body interactions (blue arrows). In general, LASSO shows, for each condition analyzed, a greater sparsity in the estimated networks, preserving only the most active links detected by OLS.

Within body interactions are characterized mainly by cardiovascular links (interactions from *η* to *π*) and cardio-respiratory links (interactions between *η* and *ρ*), with a weaker coupling between *ρ* and *π* which exhibits a preferential direction from *ρ* to *π*; the use of LASSO elicits the unidirectional nature of cardio-respiratory interactions (from *ρ* to *η*). On the other hand, the topology of the brain sub-network is less stable in the three conditions and appears to lose consistency passing from REST to GAME; also in this case the use of LASSO leads to a greater sparsity, with nodes almost fully disconnected. As to brain–body interactions, they occur almost exclusively along the direction from

brain to body; in this case the use of LASSO demonstrates that interactions from brain to body increase during the GAME condition.

**Figure 11.** Topological structure for the networks of physiological interactions reconstructed during the three analyzes physiological states. Graphs depict significant directed interactions within the brain (yellow arrows) and body (red arrows) sub-networks as well as interactions between brain and body (blue arrows). Directed interactions were assessed counting the number of subjects for which the conditional transfer entropy (*Ti*→*j*|*s*) was detected as statistically significant using OLS (**a**–**c**) or LASSO (**d**–**f**) to perform VAR model identification. The arrow thickness is proportional to the number of subjects (n) for which the link is detected as statistically significant.

To quantify the overall extent of the brain–body interactions from the above estimated cTE networks, was computed the percentage of subjects with statistically significant values of the cTE along the direction from brain to body and in the opposite direction from body to brain. This was obtained considering the brain sub-network and the body sub-network as single nodes, and computing the in-strength to one sub-network by considering only the connections coming from the other sub-network. The average values are shown in Figure 12.

The results reported in Figure 12 show that interactions are found more consistently along the direction from brain to body than along the opposite direction. In particular, LASSO does not show any link directed from body to brain in any of the three analyzed conditions. In the resting condition (R), the percentage of active links directed from brain to body is similar for the two VAR identification methods. Then, OLS identification results in a larger number of links moving from R to M, and a decrease during G. Conversely, LASSO shows a decrease of the percentage of significant links during M and a sharp increase during G.

**Figure 12.** Bar plots reporting the in-strength index extracted from the cTE networks of Figure 11 by considering as link weights the percentage of subjects showing a brain-to-body connection (**a**) or a body-to-brain connection (**b**), computed at rest (R), during mental stress (M) and during serious game (G) for the two VAR identification methods. Please note that the in-strength computed along the direction from body to brain using LASSO is null in all conditions.

Figure 13 reports the distribution of the values of the in-strength index evaluated for each node of the network in each experimental condition. For both OLS and LASSO, the median value of the in-strength index (Figure 13a–c,h–j) is higher for the network nodes of the body sub-network than for those belonging to the brain sub-network (Figure 13d–g,k–n). An exception to this difference is the in-strength of the links directed towards the node *ρ*, which is very close to zero when assessed using LASSO identification (Figure 13i). Moreover, the estimated in-strength values are, on average, lower when assessed through LASSO than through OLS. Considering the in-strength of individual nodes, a statistically significant reduction is observed moving from R to G for the weights of the connections directed towards *π* ( Figure 13c,j), for both OLS and LASSO methods.

**Figure 13.** In-strength index computed for each node of the physiological network. Box plots report the distributions across subjects (median: red lines; interquartile range: box; 10*th*–90*th* percentiles: blue bars) as well as the individual values (circles) of the in-strength index (a-g) OLS, h-p LASSO) computed at rest (R), during mental stress (M) and during serious game (G) for each node (*η*,*ρ*,*π*,*δ*,*θ*,*α*,*β*). Statistically significant differences between pairs of distributions are marked with # (R vs. G).

#### **5. Discussion**

#### *5.1. Simulation Study I*

The first simulation study was designed to compare the performance of the traditional OLS approach and the LASSO regression, implemented for the identification of VAR models in their state–space formulation [28], in estimating the information measures related to PID. The decomposition of the information transferred jointly from two sources to a target process allows investigation of how information is modified in a non-trivial way through redundant and synergistic interactions between the sources [64] . In particular, the model structure adopted in our simulation highlights the coexistence of synergistic and redundant contributions to the target *Y*<sup>4</sup> from the two sources *Y*<sup>2</sup> and *Y*<sup>3</sup> even if they are not directly coupled [30]. In situations such as this, the adoption of PID is fundamental to elicit how the two sources contribute to the target with both redundant and synergistic information transfer: the redundant contribution refers to the common information that both sources convey to the target; the synergistic contribution is considered an extra information transferred towards the target and is ascribed to the weakest source in the system [15].

The analysis in Figures 2 and 3 shows an evident dependence of both the bias and the variance of all partial information decomposition measures on the factor *K*. This result is expected and reflects the well-known decrease of the prediction accuracy with the number of data samples available. In this context, our results document that the LASSO regression performs better in challenging conditions when the number of model parameters approaches the sample size (*K* ≤ 5). In these conditions it has been pointed out how OLS is not suitable for the solution of a regression problem and that its solution could even not exist [40,65]. On the other hand, LASSO shows high robustness to the lack of data points, which results in limited values of bias and variance [66]. We note that despite this better performance of LASSO, in the condition *K* = 1 all the PID measures that were different from zero (*T*2→4,*T*3→4,*S*23→4,*R*23→4) exhibit a consistent negative bias (Figure 2). This severe under estimation was previously highlighted in different scenarios, in which LASSO shrinkage produces biased estimation for the large coefficients and thus in some conditions could be sub-optimal in terms of estimation risk [67,68].

When the amount of data sample is not scarce compared to the number of model parameters (*K* > 5 ) the performance of the two identification methods is comparable, with slight differences depending on the true value of the PID measures. In the case of non-zero PID measures (Figure 2) OLS showed better performance than LASSO in terms of bias and variance. This result is mainly due to the effect of the constraint based on the *l*<sup>1</sup> norm that performs a variable selection but with an increased bias and variance in the performed estimate [34,38].

On the other hand, in the scenario in which all the PID measures are equal to zero (Figure 3), LASSO performs better than OLS in all the conditions analyzed as regards both the bias and the variance of the estimates of information transfer. This can be explained with the continuous shrinkage and selection of the most relevant coefficients that set to zero most of the estimated AR coefficients [48].

#### *5.2. Simulation Study II*

The second simulation was designed to compare the performance of OLS and LASSO identification in estimating the cTE in a network of multiple interacting processes. The tested measure is highly relevant, as it is equivalent to the multivariate (conditional) Granger causality measure estimated within the most accurate framework available, i.e., that of vector state–space models [28]. Within this framework, we assessed both the statistical significance and the accuracy of the estimated values of the cTE, thus comparing OLS and LASSO regarding their accuracy in detecting the network structure and the coupling strength.

The accuracy in the estimation of the cTE values was investigated across different K ratio levels by means of *BIAS* and *BIASN* used as performance parameters (Figure 5). As expected, both parameters

show a tendency to increase as the *K* ratio decreases. This tendency is evident particularly for OLS estimation, as already documented testing different VAR parameter identification approaches (e.g., the Levinson recursion for the solution of Yule-Walker equations) in the context of signal processing [31]. The situation becomes worse when approaching the condition *K* = 1, in which the matrix ([**y***p*] *<sup>T</sup>***y***p*)−<sup>1</sup> approaches singularity. Consequently, in this case the solution to the DARE equation necessary to convert the SS model into the ISS form did not converge, thus impeding OLS-based estimation of the cTE. In such conditions it is necessary to move to the use of penalized regression techniques [34,38,40]. Here we document that the LASSO regression leads to trends of the cTE bias which are consistently very low for any value of *K* in the estimation of the null links (Figure 5a), and rise with *K* but without exhibiting abrupt increases even for *K* = 1 in the estimation of the non-null links (Figure 5b). These good performances of LASSO identification confirm its higher tolerance to collinearity between regressors caused by the reduction of data samples available [69].

The reliability in the reconstruction of the network structure was investigated analyzing the performance of the two identification methods in terms of overall accuracy and rates of false negative and false positive detections. The ACC parameter appeared to be the best-suited indicator to synthesize the similarity between the estimated network and the ground-truth network [60]. Moreover, with the network structure simulated here, ACC is not affected by the class imbalance problem, a typical condition in sparse networks [70]. As expected, the ACC parameter decreased with the *K* ratio, with LASSO performing progressively better than OLS (Figure 6c). These results are in line with previous studies reporting the performance of different methods for the assessment of the statistical significance of causal interactions in different methodological contexts [33,34,56].

When the test was particularized to the rate of correct detection of null and non-null links, the performance under conditions of data paucity differ for the two identification methods, with LASSO showing better capability to correctly detect existing links (lower FNR) and OLS showing slightly better capability to correctly detect the absent links (lower FPR). In particular, by analyzing the trends of FNR (Figure 6a) LASSO showed better performance than OLS for *K* ≤ 10, especially when the conditions for the estimation become very challenging (*K* ≤ 5). This behavior is related to the shrinkage of the VAR parameters. In fact, the selected lambda tends to rise if the number of data samples decreases and this implies a greater sparsity of the estimated network with a high probability of producing false negatives [71]. In the same conditions, the value of FNR for OLS was around 60%. This poor performance is likely due to an inaccurate representation of the distribution of the cTE under the null hypothesis of uncoupling, estimated empirically using uncoupled surrogate time series, performed with very few data samples. On the contrary, while both methods display a low number of false positives for *K* > 5, LASSO tends to produce an over-selection of the estimated links when *K* ≤ 5. This result is in line with previous findings in the context of GC estimation, in which LASSO showed few extra links, observed for different combinations of degree of sparsity of the simulated network structure and *K* ratio [39,42].

#### *5.3. Real Data Application*

#### 5.3.1. Partial Information Decomposition Analysis

The main results of the partial decomposition of the information transfer within the network of brain and body interactions are that: (i) a significant information is transferred within the body sub-network, composed by the processes representative of the cardiac (*η*, heart period), vascular (*π*, pulse arrival time) and respiratory (*ρ*) dynamics, which is directed towards the *η* and *π* nodes as a result of respiration-related and cardiovascular effects; (ii) the information transferred to the nodes of the brain sub-network, representing the amplitude variations of the *δ*, *θ*, *β*, and *α* EEG waves, is lower and due almost exclusively to internal dynamics within this sub-network; (iii) a negligible amount of information is transferred between the two sub-networks as a result of their redundant or synergistic interaction. While these results are observed consistently using the two VAR identification methods

(see Figure 7, Figure 8, Figure 9 and Figure 10, respectively), the use of the LASSO regression allows the elicitation of them more clearly. From a methodological point of view, this behavior is a result of the inclination towards sparseness of the LASSO method, which shrinks towards zero most of the VAR parameters that have a small effect on the target dynamics [38]. Such inclination puts also in evidence other behaviors, such as the substantial absence of information directed to the *ρ* node of the body network and to the *δ* node of the brain network. While in the first case the result is physiologically plausible since cardio-respiratory interactions are known to be almost unidirectional in nature (i.e., previous studies have found that respiration significantly affects the cardiovascular variables without being affected by them [2,57,72]), in the second case it could be related to an underestimation of the information transfer with the LASSO technique, since the *δ* waves seem to play a role in the organization of brain dynamics [1,7,73].

As the results reported above were observed consistently independently on the analyzed physiological state, they could be interpreted as a hallmark of how the networks of brain and body interactions organize their dynamic communication evaluated in terms of information transfer. Nevertheless, the conditions of mental stress evoked by the mental arithmetic task and the sustained attention task were able to induce, when compared with the resting condition set as baseline, some significant modifications in the amount of information transferred toward some specific nodes. In particular, a significant reduction of the joint brain–body TE computed when *π* was taken as the target process was observed during the two stress conditions compared to rest. This joint information transfer was due almost exclusively to contributions of unique transfer from the *η* and *ρ* nodes of the body sub-network (Figures 7h and 8h), with a small amount of redundant brain–body information transfer (Figures 7h and 8i) and negligible amounts of synergistic transfer or unique transfer from the brain sub-network; the unique transfer reflects cardiac and respiratory effects on the variability of the pulse arrival time, while the redundant transfer is related to common mechanisms whereby such variability is influenced by the brain rhythms one side and the cardio-respiratory rhythms on the other side. In this context, the results here obtained are in line with those obtained in [3] where a significant reduction of total information transferred towards *π* was found while playing a serious game with respect to a resting condition. Analyzing the same dataset in terms of mutual information, the authors of [44] found a significant reduction of the information shared between the pulse arrival time (*π*) and the cardio-respiratory system (*η*, *ρ*) during the conditions M and G compared with R. The significant decrease of the static mutual information computed in [44] and the dynamic measure of the joint and unique TE computed in the present study can be viewed as different aspects of the weakening of cardiovascular and cardio-respiratory interactions during mental stress. Physiologically, the underlying mechanisms could include an increased modulation of peripheral vascular resistance during stress which, as highlighted in [53,74], could dampen the modulation of the pulse arrival time due to heart rate variability and respiration.

When the target process belongs to the brain sub-network, the information transfer estimated through the LASSO regression was almost null when directed towards *δ* and very small when directed towards *θ*, *α* or *β* (Figure 10a–c). This result may reflect the lack or significant connectivity towards the brain sub-network, or the lower sensitivity of penalized regression methods to weak connectivity. In fact, using OLS a certain amount of information transfer to the nodes of the brain network was detected, with a significant increment of the joint transfer entropy from R to G when *δ* is the target process (Figure 9a), that is mostly due to the significant increment of redundant and synergistic TEs (Figure 9c). Furthermore, a significant increase of the redundant TE (*Rik*→*j*) was also observed during M and G with respect to R when *θ* is the target process (Figure 9f). The involvement of the brain waves during mental stress tasks was also investigated using information measures in [44], finding a larger involvement of *δ* and *θ* activity compared to rest that agrees with the results obtained here in terms of redundant TE computed after OLS identification.

#### 5.3.2. Conditional Information Transfer Analysis

The analysis of the statistically significant values of the conditional information transfer (cTE measure) led us to detect specific topology structures for the sub-networks that compose the overall physiological network of brain and body interactions (Figure 11). First, a quite consistent topology was found across different physiological states for the interactions between the cardiovascular and respiratory systems (Figures 11a–c and 11d–f, red arrows), which is in line with a recent similar work performed in the context of information dynamics [3,18]. In particular, the strong link connection between *η* and *ρ* reflects a marked coupling between the heart rate variability and respiration, which is due to the well-known mechanisms such as respiratory sinus arrhythmia (RSA) [75] and cardio-respiratory synchronization [76]. This connection was detected as bidirectional using OLS, and as unidirectional from *ρ* to *η* using LASSO, confirming that the preferential direction of the cardio-respiratory interactions is that documenting the effect of respiration on the heart rate (RSA) [2,49,76]. Second, the information transferred from *η* to *π* reflects the well-known effect of the heart rate on stroke volume and arterial pressure which has a modulating effect on the arterial pulse wave velocity [77]. Moreover, the influence of respiration *ρ* on the pulse arrival time variability *π* reflects the breathing influences on the intra-thoracic pressure, blood pressure and blood flow velocity [77].

A further result relevant to the peripheral sub-network is the significant decrease of the in-strength relevant to the vascular node *π* observed for both OLS and LASSO moving from rest to the serious game condition independently (Figure 13c,j). This weaker topology is likely related to the significantly lower amount of information transferred towards *π* during the condition G compared to R (Figures 7g and 8g). From a physiological point of view, this lower transfer mediated by weaker topology could suggest a reduction of the efferent nervous system activity from the cardiac and respiratory centers and directed towards the vascular system during conditions of mental attention.

Compared with the body sub-network, the links of the brain sub-network form a structure which seems less consistent across the different experimental conditions (Figure 11, yellow arrows). While OLS estimation shows an apparent decrease in the number of connections moving from R to M and especially to G, the LASSO regression yields an almost disconnected sub-network of brain-brain interactions. In contrast to that observed in this work, in [3] a more connected brain sub-network was found during the mental arithmetic task with respect to the resting condition. This difference can be partially methodological, as different model order selection criteria (Akaike vs. Bayesian) and methods to assess the statistical significance of cTE (F-test vs. surrogate data) were used in [3] and in the present work. These choices could indeed affect the estimation procedure and provide slightly different results especially in the presence of weak connections as in this case [56,78,79].

Finally, exploration of the network of dynamical interactions between the brain and the peripheral systems led us to investigate how the EEG dynamics, mostly determined by the central nervous system, interact with the cardiovascular and respiratory dynamics regulated by the autonomic nervous system (Figure 11, blue arrows, and Figure 12). Although quantitative statistical comparison cannot be performed for the results reported in Figures 11 and 12 they document that brain–heart interactions are mostly oriented in the direction from brain to heart. This suggests that efferent autonomic commands directed to the peripheral systems follow in time the neural modulation of the brain wave amplitudes. Moreover, we find that the two mental stress conditions induce an enhancement of brain–body interactions, with a substantial increase of the number of significant links directed from the brain to the body sub-network and assessed using OLS during the mental arithmetic condition, or using LASSO during the serious game condition. The results based on OLS resemble those obtained recently on the same dataset [3], and recall previous findings highlighting significant correlations between the amplitude of brain oscillations (especially in the *β* band) and the heart rate and respiration dynamics [7,80]. The results based on LASSO highlight the emergence during sustained attention evoked by serious game playing of causal interactions from brain to the peripheral systems, mostly originating from the *θ*, *α* and *β* nodes and directed to the *ρ* and *η* nodes. These findings are supported

by previous studies suggesting that the neural mechanisms responsible for the generation of *α* and *θ* brain oscillations are crucial for attention tasks and can be correlated with the cardiac autonomic activity and to its respiratory determinants [81–83].

#### **6. Conclusions**

The aim of this work was to test the usefulness of penalized regression techniques for the computation of different parametric measures of information transfer in networks of coupled stochastic processes. In particular, we considered the LASSO regression, a well-known technique that has been extensively used in different research fields, and implemented it for the first time within the most advanced framework for the linear parametric estimation of information dynamics, i.e., that based on the state–space computation of conditional Granger causality and partial information decomposition in vector stationary stochastic processes [15,28,30]. Our comparative validation with the traditional least squares identification of vector stochastic processes (OLS estimator) highlighted that LASSO allows highly accurate estimation of not only the amount of information transferred between coupled processes, but also the topological structure of the underlying network, especially in conditions of data paucity which make OLS estimation unreliable or even not applicable. On the other hand, in favorable conditions of data size related to the dimension of the model to be identified the results of classical and penalized regression were fully overlapped, confirming the appropriateness of embedding LASSO into the framework for the linear parametric analysis of information dynamics.

The application of the two identification methods to the study of the network of physiological interactions within and between brain and peripheral dynamics has demonstrated consistent patterns of information transfer and similar network structures. Here, the main findings regard the detection of significant information transfer within the body sub-network sustained by cardiovascular and respiratory dynamics, with reduced cardio-respiratory effects on the vascular dynamics in the presence of mental stress, and the existence of weak but significant brain–body interactions directed from the brain rhythms to the peripheral dynamics, with enhanced link strength in conditions of mental stress. It is worth noting that these results were obtained for K = 10, a condition in which the two identification procedures showed comparable performance in the simulation studies. This finding suggests that even in conditions that allow the use of OLS, LASSO is able to detect the strongest interactions among those determined by the combined activity of the central and autonomic nervous systems, providing as outcome estimated patterns of information dynamics which are more straightforward and easy to interpret than those obtained with OLS.

The directed links between different physiological systems observed in this study can reflect either well-defined physiological mechanisms, such as the respiratory and heart rate effects on the pulse arrival time [74,84], or statistical associations with likely common determinants of physiological origin, like the brain–heart interactions which are thought to be mediated by dynamic alterations of the sympatho-vagal balance [7,22,85]. In either case, approaches like ours that allow the probing of the dynamic interaction among different organ systems can be very useful to show how an imbalanced interaction may have a negative impact on health [85]. Previous studies have indeed demonstrated pathological changes in brain–body interactions with clinical significance, for instance related to sleep stages and insomnia [86], to sleep apneas [87] or to schizophrenia [72]. However, the analysis of brain–body interactions in different experimental conditions such as those analyzed in this paper, is somehow still unexplored and further studies need to be performed in order to strengthen the validity of the results obtained in the present and in previous studies.

Future developments will aim at testing the efficiency of different penalized regression techniques like those based on a linear combination of the *l*<sup>1</sup> and *l*<sup>2</sup> norms such as Elastic-net regression [88], or those based on a combination of OLS and LASSO such as adaptive LASSO, in order to overcome the problem related with the oracle property of LASSO [67]. Moreover, the comparison of penalized regression techniques with more specific approaches to dimensionality reduction in the computation of Granger causality and related measures [35,36] is envisaged to evaluate what approach is

recommended for a reliable estimation of information dynamics in different conditions of network size and data length. Finally, future studies will also deal with the introduction of penalized regression techniques in the study of the information transfer within networks whose structure changes in time [89], or displaying dynamics which encompass multiple temporal scales [30].

**Supplementary Materials:** Supplementary Material to this article is freely available for download from http://github.com/YuriAntonacci/PID-LASSO-toolbox and http://lucafaes.net/PIDlasso.html.

**Author Contributions:** Conceptualization, Y.A., L.A., G.N. and L.F.; Data curation, G.N. and L.F.; Formal analysis, Y.A. and L.F.; Methodology, Y.A. and L.F.; Software, Y.A. and L.F.; Supervision, L.A., G.N. and L.F.; Validation, Y.A., L.A. and L.F.; Writing—original draft, Y.A. and L.F.; Writing—review & editing, Y.A., L.A., G.N. and L.F. All authors have read and agreed to the published version of the manuscript.

**Funding:** The study was supported by Sapienza University of Rome—Progetti di Ateneo 2017 (RM11715C82606455), 2018 (RM11916B88C3E2DE), 2019 (RM11916B88C3E2DE), Progetti di Avvio alla Ricerca 2019 (AR11916B88F7079E), by Stiftelsen Promobilia, Research Project DISCLOSE and by Ministero dell'Istruzione, dell'Università e della Ricerca—PRIN 2017 (PRJ-0167), "Stochastic forecasting in complex systems"

**Conflicts of Interest:** The authors declare no conflict of interests.

#### **References**


c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Patterns of Heart Rate Dynamics in Healthy Aging Population: Insights from Machine Learning Methods**

**Danuta Makowiec 1,\* and Joanna Wdowczyk <sup>2</sup>**


Received: 26 October 2019; Accepted: 4 December 2019; Published: 9 December 2019

**Abstract:** Costa et. al (Frontiers in Physiology (2017) 8255) proved that abnormal features of heart rate variability (HRV) can be discerned by the presence of particular patterns in a signal of time intervals between subsequent heart contractions, called RR intervals. In the following, the statistics of these patterns, quantified using entropic tools, are explored in order to uncover the specifics of the dynamics of heart contraction based on RR intervals. The 33 measures of HRV (standard and new ones) were estimated from four hour nocturnal recordings obtained from 181 healthy people of different ages and analyzed with the machine learning methods. The validation of the methods was based on the results obtained from shuffled data. The exploratory factor analysis provided five factors driving the HRV. We hypothesize that these factors could be related to the commonly assumed physiological sources of HRV: (i) activity of the vagal nervous system; (ii) dynamical balance in the autonomic nervous system; (iii) sympathetic activity; (iv) homeostatic stability; and (v) humoral effects. In particular, the indices describing patterns: their total volume, as well as their distribution, showed important aspects of the organization of the ANS control: the presence or absence of a strong correlation between the patterns' indices, which distinguished the original rhythms of people from their shuffled representatives. Supposing that the dynamic organization of RR intervals is age dependent, classification with the support vector machines was performed. The classification results proved to be strongly dependent on the parameters of the methods used, therefore determining that the age group was not obvious.

**Keywords:** heart rate variability; entropy; fragmentation; aging in human population; factor analysis; support vector machines classification

#### **1. Introduction**

The cardiac tissue of the human heart is under the constant influence of the autonomic nervous system (ANS), the part of the nervous system that works largely without our consciousness. There are two branches of ANS, the sympathetic and vagal subsystems, which acting oppositely, the sympathetic increasing and vagal reducing the heart rate, control the homeostasis in the cardiovascular system, i.e., the proper supply of nutrients to each cell of the organism [1,2]. The maintenance of a stable heart rhythm involves different reflex feedback mechanisms, which makes the whole phenomenon complex. With age or with disease, a gradual impairment of the functioning of the complex interplay between these mechanisms could develop [3–5].

There are methods like measurement of norepinephrine spillover, microneurography, and imaging of cardiac sympathetic nerve terminals that can give information about the actual state of ANS [4]. However, it turns out that changes in the activity of ANS reveal themselves in the time intervals

between heartbeats, in the dynamics of so-called RR intervals [6,7]. Partially, this is due to the fact that the activities of sympathetic and vagal subsystems differ in response delay [2,8]. The effect of the vagal system can be seen immediately in the same heart beat or in the next beat in case of a tetanic stimulation [9]. The response of the sympathetic activity is assumed to occur within a few seconds and lasts for a few seconds [8]. This way, the analysis of heart rate fluctuations, called heart rate variability (HRV), has become a noninvasive technique, which potentially can be used to assess ANS activity.

The ANS control over the heart is strong in the sense that it dominates all other possible sources of heart rhythm variation, including tissue remodeling, especially at the initial stage [5]. The tissue remodeling due to inflammation or fibrosis could lead to abnormal rhythms, called also erratic rhythms [10], which with time could develop into arrhythmia.

Many efforts have been made in the aim of getting through HRV as much information as possible on the functioning of ANS and the state of cardiac tissue [5,11–15]. Standard studies use the global indices of variability such as the standard deviation of RR intervals or the power of specific oscillations in the RR interval signal. In particular, it has been found that the presence or absence of some oscillations, called low frequency, is associated with sympathetic activity, while others, called high frequency, with vagal activity. However, the relation between the variations in RR intervals and the control mechanisms or other aspects possibly influencing the heart rate is still not explained. After more than thirty years of this research, disappointment has developed; see [16] for the critical review. The criticism refers to their weak repeatability and/or weak predictability. Furthermore, a vivid discussion is taking place on the meaning of HRV [17,18]. Thus, HRV, its physiological background, and diagnostic benefits still require careful elucidation and wait for verification of both the concept and methods of estimating.

The dynamics of changes in RR intervals can be represented symbolically as a sequence of accelerations and decelerations [19,20]. It has turned out that short term patterns, constructed as short subsequences of the sequence of accelerations and decelerations, could be a good source for studying the relationship between events that shape the HRV. Especially, their relation to the vagal tone has been established [20]. Recently, Costa et. al [21] proposed to symbolize the RR intervals by patterns that were supposed to discern the abnormality of heart rhythm related with the emergence of erratic rhythms. Consequently, the concept of fragmentation and fragmentation measures has been developed.

In the following, we investigate the characterization of RR intervals provided by the fragmentation measures (indices relying on counting specific events), especially by comparing their performance to the corresponding entropy measures (indices built to quantify the distribution of the counted events). Together, we show results obtained from other standard HRV indices, known to describe the short term variability. In total, 33 HRV measures were used to describe the HRV during the nocturnal rest of 181 healthy subjects of different ages from twenty years old to octogenarians. We assumed that the signals of healthy people at different ages should provide the ability to extract the specificity of heart rate dynamics with healthy aging.

An enormous progress in machine learning achievements, together with their excellent implementations on user-friendly platforms [22], pushed many of us to test whether this new methodology can help in explaining the phenomenon of HRV and in the diagnosis of cardiovascular diseases [23,24]. Traditional machine learning (ML) is close to the statistical methods of data analysis where each item in the dataset, here a four hour signal, is described by a set of features [25]. However, it is also said that "machine learning is statistics minus any checking of models and assumptions" [26]. This is because implementations of many ML algorithms can be effective even when the data are gathered without a carefully controlled experimental design and in the presence of complicated nonlinear interactions. Because of that, sometimes, ML is located as the common domain between hackers and traditional mathematical statistics [27].

HRV of a given subject can be expressed using several measures. Many of them are related either by mathematical formulas or by the concept of the physiological phenomenon they describe. Which one to choose for analysis? ML techniques allow considering all of the measures, called ML features, and investigating the relationships between them. In the following, we practice with two ML techniques: exploratory data analysis and classification. In particular, we applied the exploratory factor analysis to identify possible hidden variables driving a given set of features. The classification task was performed with support vector machine (SVM). SVM is a supervised learning model that has a clear theoretical background, which is important in the case of reading the results.

Moreover, we also benefit directly from the ML flexibility. As the validation for the obtained results, we propose to consider outcomes arising from the analysis of surrogate signals. The surrogates were provided by random shuffling of the real RR interval signals. Random shuffling destroys the time patterns; however, it preserves the distribution of RR intervals. We assumed that this way, we could filter out the patterns of the specific dynamics that was present only in the the real signals, from the overall statistical relations.

The article is organized as follows. We start with the presentation of the study group of subjects, the methods of ECG recording, and the construction of RR interval signals in Section 2.1. The description of the HRV indices together with the relationship between fragmentation measures and corresponding entropic measure are presented in Sections 2.2 and 2.3. Section 2.4 is for the propagation of ML methods in the HRV analysis. An introduction to exploratory factor analysis and to classification with SVM is provided. In Section 2.5, the specification of the statistical methods used is given. The results and their discussion are presented in Section 3. Subsequently, in Section 3.1, the outcomes of the factor analysis together with their interpretations are given. In Section 3.2, we show and discuss observations obtained from investigations of entropic measures. Finally, we test whether SVM methods are able to display changes emerging with biological aging better than the classical regression methods. These results are given in Section 3.3. Section 4 contains the summarizing discussion and closing remarks.

#### **2. Methods**

#### *2.1. Data Acquisition*

Healthy volunteers meeting the following inclusion criteria [28]: age 18–89 years old and sinus heart rhythm in ECG, were included in the study. The exclusion criteria were as follows: presence of ischemic heart disease, heart failure, hemodynamically significant valvular heart disease, multi-drug controlled hypertension, or the presence of abnormalities in additional tests indicating organ complications of hypertension, the presence of symptomatic atherosclerosis or its features in physical examination, a history of atrial fibrillation or other arrhythmia during Holter recording, significant disorders of atrioventricular and intraventricular conduction in ECG, diabetes and other diseases significantly affecting the phenomenon of sinus rhythm variability, taking medications that significantly affect the sinus node, the presence of numerous artifacts in the 24 h electrocardiographic Holter recordings, nicotinism of more than 5 cigarettes a day, pregnancy, and finally, no consent to participate in the study. Prior to the enrollment, in order to confirm sinus rhythm and exclude abnormalities indicating cardiovascular diseases, a 12 lead electrocardiogram was recorded. Volunteers were then subjected to echocardiographic examination, which evaluated the occurrence of possible organ complications of hypertension, as well as other abnormalities implying the presence of cardiovascular diseases. In the next stage, twenty four hour recording of the electrocardiographic signal was carried out using the Digicorder 483 digital recorders from Delmar and Lifecard from Delmar Reynolds. The study was approved by the Ethic Committee of the Medical University of Gdansk (NO. NKEBN/142-653/2019).

The recordings were analyzed on the Delmar Reynolds system (SpaceLabs Healthcare, USA). The sampling rate of ECG was 128 Hz, which ensured 8 ms accuracy for the identification of R-peaks in the QRS complex. The quality of the ECG recordings and accuracy of R-peak detection were verified by visual inspection by experienced cardiologists. All normal beats were carefully annotated, so that only normal sinus rhythms were considered in our investigations.

In total, 181 signals were analyzed. The set of recordings was divided into groups corresponding to the age decade of a person: 20's (30 subjects: 17 women), 30's (21 subjects: 11 women), 40's (33 subjects: 13 women), 50's (31 subjects: 13 women), 60's (27 subjects: 12 women), 70's (22 subjects: 10 women), 80's (17 subjects: 11 women).

The period of nocturnal rest was discerned individually, in each recording separately, according to the appearance of consecutive hours with a low heart rate. From each recording, the four hour signal with normal-to-normal RR intervals {*RR*(*n*) : *n* = 0, ... , *N*} was extracted. All gaps were annotated, which was used in the construction of a series of patterns, namely only consecutive in time RRintervals were mapped to a signal of RR actions {*δRR*(*n*) = *RR*(*n*) − *RR*(*n* − 1) : *n* = 1, ... , *N*}. Small gaps of a size of one or two missing values were filled with medians from the surrounding {−3, +3} neighbors. The extra editing procedure was applied to RR actions as follows: if the difference *δRR* between two consecutive RR intervals was larger than 300 ms or smaller than −300 ms, then this *δRR* was replaced by the interval 300 ms, −300 ms, respectively.

#### *2.2. Entropic Measures of HRV*

For each signal with RR intervals {*RR*(*n*)} and its signal of RR actions {*δRR*(*n*)}, the series of decelerations, accelerations, or no action is defined as follows:

$$\{\delta RR(n) = \begin{cases} > 0, & \text{deceleration} \quad : d \\ < 0, & \text{acceleration} \quad : a \\ = 0, & \text{no action} \quad : 0 \end{cases} \}\tag{1}$$

The fragmentation indices of Costa et al. [21] were designed to collect the information about the presence of specific short segments of accelerations and/or decelerations, which were supposed to show the essence of heart rate dynamics. In particular, the probability of segments of two alternating actions: *ad* and *da* or three alternating actions: *ada* and *dad* was of interest. Similarly, the short sequences with the same actions: *aaa* or *ddd* were found important in the description of heart rate dynamics. The following definitions were applied by us:


It was obvious that the symbolization (1) depended on the resolution of a signal. Moreover, this symbolization did not take into account the size of an action, whether the action was strong or weak. Because each resolution of a recording provides natural quantization to the recorded values, let us use the resolution Δ of a given signal of RR intervals to represent the space Π<sup>1</sup> of its quantified RR actions:

$$\delta \text{RR}(n) \in \{-M\Delta, \dots, -\Delta, 0, \Delta, \dots, M\Delta\} \qquad \text{where} \quad M = \max\_{n} \{ \frac{|\delta RR(n)|}{\Delta} \} \tag{2}$$

Accordingly, the spaces of two or three subsequent in time actions can be considered:

$$\begin{aligned} \Pi\_2 &=& \{ (\delta RR(n), \delta RR(n+1)) \} = \{ (i, j) : |i|, |j| \le M \}, \\ \Pi\_3 &=& \{ (\delta RR(n), \delta RR(n+1), \delta RR(n+2)) \} = \{ (i, j, k) : |i|, |j|, |k| \le M \}, \end{aligned}$$

with constant *M* defined as in (2). The three spaces Π1, Π2, and Π<sup>3</sup> were finite and for each signal different. They collected the quantified patterns of the short term dynamics of the heart beats of a given person. The probabilistic structure of these spaces can be estimated by the Shannon entropy,

$$\begin{aligned} E\_1 &= \begin{array}{rcl} -\sum\limits\_{i \in \Pi\_1} p(i) \ln p(i) \\\ E\_2 &=& -\sum\limits\_{(i,j) \in \Pi\_2} p(i,j) \ln p(i,j) \\\ E\_3 &=& -\sum\limits\_{(i,j,k) \in \Pi\_3} p(i,j,k) \ln p(i,j,k) \end{array} \end{aligned}$$

It is easy to see that if the RR actions occur independently of each other, then *E*<sup>2</sup> = 2*E*<sup>1</sup> and *E*<sup>3</sup> = 3*E*1, while *E*<sup>1</sup> attains its maximal value.

The stochastic features of the short term dynamics can be evaluated by [29]:


The entropy of transition rates *ST* evaluates a given system dynamics as if it were a Markov chain [30], i.e., memoryless dynamics driven by a table of transition rates. It has been proven that *ST* is equal to approximate entropy [31], a popular nonlinear metrics used in HRV, however applied to RR intervals. If elements of the analyzed signal are independent of each other, then *ST* = *E*1. The self-transfer entropy *sTE*, the notion based on transfer entropy [32], accounts for the influence of the past on the current action. It estimates memory effects that are not encoded in a transition matrix of a Markov chain model. In case of a signal with independent elements *sTE* = 0.

The fragmentation measures are based on counting events ignoring the distribution of events. Thanks to the entropic approach, the relevance of particular fragmentation patterns can be included. Accordingly, let us consider indices based on the partial entropy, i.e., on the entropy related to the distribution of the particular patterns of accelerations and decelerations:

$$\begin{array}{rcl} E\_{(ad)} &=& -\sum\_{-i,j=1,\ldots,M} p(i,j) \ln p(i,j) \\ E\_{(da)} &=& -\sum\_{i,-j=1,\ldots,M} p(i,j) \ln p(i,j) \\ E\_{(ada)} &=& -\sum\_{-i,j,-k=1,\ldots,M} p(i,j,k) \ln p(i,j,k) \\ E\_{(da)} &=& -\sum\_{i,-j,k=1,\ldots,M} p(i,j,k) \ln p(i,j,k) \\ E\_{(aaa)} &=& -\sum\_{-i,-j,-k=1,\ldots,M} p(i,j,k) \ln p(i,j,k) \\ E\_{(ddd)} &=& -\sum\_{i,j,k=1,\ldots,M} p(i,j,k) \ln p(i,j,k) \\ \end{array}$$

The fragmentation indexes ignore also the presence of non-action events. We will observe the role of these events, counting their appearance as nzero.

#### *2.3. The Set of Considered HRV Measures*

The standard HRV measures are usually grouped according to the methods of their computations: time domain, frequency domain, or nonlinear measures; see [11,15] for the definitions and interpretation. Furthermore, they are often divided due to the supposed phenomena they describe: short term correlations or long term correlations [16].

Here, the following standard time domain measures were considered: the average of all RR intervals (meanRR), the average of all heart rates (meanHR), standard deviation of all RR intervals (stdRR), square root of the mean of the sum of squares differences between adjacent RR intervals (RMSSD), the percentage of differences between adjacent RR intervals that are longer than 50 ms (pNN50) and longer than 20 ms (pNN20). The frequency domain HRV measures relied on estimation of the power spectral density computed with the Lomb–Scargle periodogram. The frequency bands were: for very low frequency (VLF, 0.003–0.04 Hz), low frequency (LF, 0.04–0.15 Hz) and high frequency (HF, 0.15–0.4 Hz). The frequency domain measures were extracted from the power spectral density for each frequency band and relative powers of VLF (rVLF), LF (rLF), and HF (rHF). Additionally, the two nonlinear measures arising from the Poincare plot: sd1 and sd2, were included also; see [33] for the definition.

In total, thirty three HRV measures were included in a set of features used in the ML analysis. Many of them are known to be correlated, as for example meanRR and meanHR. Nevertheless, we considered them to see how mathematical relationships translate into the correlation analysis. For further discussion, we grouped the HRV indices according to the known properties they describe or the mathematics involved:


The vector of 33 features { *<sup>f</sup>* (*i*) = (*<sup>f</sup>* (*i*) meanRR, ... , *f* (*i*) *<sup>n</sup>*zero )} was estimated for each of 181 signals. We studied features of the full recording, of 240 min. Furthermore, the same features were calculated for segments of the recording, here 5 min segments, though any other segmentation was possible. A set of all 5 min segments of one person, namely 48 items, was taken into account. Moreover, we considered statistics found for physiologically justified extremes of the segmented 5 min features. In particular, the segments representing the minimum of heart rate, which could be attributed to deep sleep [12,13,34], were considered. Furthermore, the segments with the minimum of stdRR were investigated. The reduced HRV is often attributed to the transition from deep sleep to the REM phase of sleep [12].

Finally, the same analysis was performed for shuffled signals. The shuffling of RR intervals was performed ten times with the procedure random.shuffle of the numpy library of Python. Shuffling RR intervals preserved the distribution of RR intervals, but it destroyed the patterns of RR actions specific for a given system dynamics. The resulting distribution of RR actions was different because in the case of shuffled RR intervals, for any action *δ*, we have:

$$p(\delta) = \sum\_{(RR, RR - \delta)} p(RR, RR - \delta) \qquad \text{where } p(RR, RR - \delta) = p(RR)p(RR - \delta)$$

which leads to the maximally random distribution of RR actions for a given distribution of RR intervals.

#### *2.4. Machine Learning Methods*

Factor analysis (FA) and classification with support vector machine (SVM) are among the standard methods of ML based on the features [22,27]. FA is used to identify relationships among features of interest. These relationships arise based on the assumption that our observations are due to the linear relation between several hidden factors and some added Gaussian noise. Consequently, these factors can be found as the eigenvectors of the correlation matrix of features. Each vector describes the underlying relationships between the feature and the hidden factor. In the following, we considered only those factors for which the eigenvalue was greater than 1 (the Kaiser–Guttman rule).

Classification is a central goal of many ML procedures. Among the most popular feature based methods are linear discriminant analysis, random forests, gradient boosting, and SVM. All of them belong to the class of supervised learning, i.e., methods that build the classification by learning the data. SVM has a clear intuitive interpretation, at least in the linear case. The SVM method constructs a classification decision function by optimization of the margin, i.e., the area at the decision function. The points that are closest to the decision boundary are called the support vectors. Therefore, it has a clear intuitive interpretation in the case when the decision function is linear. In the following, we limited our investigations to SVM.

SVM can be used with kernels to solve the nonlinear classification. The most popular kernels are Gaussians, which estimate the distance between any pair of feature points *f* (*i*), *f* (*j*) as *k*(*f* (*i*), *f* (*j*)) = exp{−*γ*|| *<sup>f</sup>* (*i*) <sup>−</sup> *<sup>f</sup>* (*j*)||2}. Accordingly, they are tuned by the value of parameter *<sup>γ</sup>*: the cut-off for the Gaussian ball. Depending on *γ*, the classification can be quite general (large *γ*) or more specific for the studied signals (small *γ*). In the following, we assumed *γ* = 0.2, which is smaller than *γ* = 0.5 of the default procedure setting. "C" is the second regularization parameter of the SVM kernel procedures. It trades between the correct classification and maximization of the decision function's margin. Our estimates used C = 1, which is a default value of the applied numerical methods. With the above settings, we obtained the stable classification results. Eventually, by the SVM, we were given the posterior probability for each data point to belong to a given class [35]. These probabilities will be presented as the mean ± std of 50 runs.

All estimates were done with homemade Python scripts. We used the Python libraries: factor\_analyzer packet [36] and from scikit learn [37]: sklearn.svm.SVC for numerical estimates and matplotlib for visualization of the results.

#### *2.5. Statistical Methods*

For each feature separately, the linear regression by least squares: *index* = *a*<sup>0</sup> + *a*1· *age*, was estimated in order to detect their dependence on age. The quality of the regression was evaluated by *R*<sup>2</sup> and the *p*-value of the estimated coefficients. Within that test, the analysis of variance (Holm–Sidak method for pairwise comparison), the normality test (Shapiro–Wilk), and the equal variance test (Brown–Forsythe) were performed. In case the normality test failed, the Kruskal–Wallis one way analysis of variance on ranks was performed with Dunn's method applied for pairwise comparison.

The SigmaPlot 13.0 software (Systat Software, Inc., San Jose, CA, USA) was utilized in all tests. The results were confronted with estimates provided by generalized least squares (Python libraries [38], namely: statmodels.api.GLS, statmodels.stats.anova, statsmodels.formula.api.ols).

#### **3. Results and Their Possible Interpretation**

#### *3.1. Factor Analysis of 240 min Recordings*

The FA was performed on the set of features when the values of each feature were normalized. The FA identified five groups: the hidden factors, which could be supposed to drive the set of observed features. The relationships among the features and factors found in the 240 min signals are presented in Table 1. Each of the considered features depended on each factor. However, the strength of this dependence significantly changed from one factor to another factor. For each HRV index, in bold, we point at the factor that drove the given index, namely the feature related to the factor with the biggest value. For comparison, the factor analysis results obtained for shuffled signals are displayed in parentheses.


**Table 1.** The factor design as the coefficients of linear combinations of the investigated features found in 240 min signals. The coefficients in parentheses are obtained for signals with shuffled values. For each index, its maximal value is bold. Below the factor name, the percent of the explained variance by this factor is given.

It turns out from Table 1 that the maximal values for the considered measures were greater than 0.65, often close to one, indicating the crucial role played by the given factor on the given index. Moreover, these values were distinct from the values obtained for shuffled signals. Following this idea, we grouped the most important features for each factor. In the case of physiological signals, these groups can be interpreted as follows:


The above observations can lead us to the hypothesis about the possible physiological interpretation of the five factors of FA that drive the observations in our data as follows (see Figure 1):

Factor I: vagal nervous system activity including respiration;

Factor II: mechanisms of maintaining the dynamical balance in ANS;

Factor III: sympathetic nervous system activity;

Factor IV: mechanisms responsible for the overall system stability;

Factor V: long term regulatory mechanisms that mainly are based on humoral activity.

As the dynamic landscape measures and fragmentation indices were found to belong to different factors, Factor I versus Factors II and III, respectively, then one can suggest that they represent different aspects of HRV phenomena. However, the measures concentrated on patterns with inflection points: *Ead*, *Eda*, *p*(*ad*), and *p*(*da*) seemed to be driven by two factors: I and II. It is interesting that Factor I influenced more strongly the partial entropies: *Ead* and *Eda* than the corresponding counters: *p*(*ad*) and *p*(*da*), whereas in the case of Factor II, we saw the opposite relation. Therefore, this observation might suggest that the distribution of the inflection patterns reflected rather the vagal activity, while the number of these events referred to maintaining the balance in ANS.

It turned out that the main factors governing the characteristics of shuffled signals were different from those found in the original series; see the values in parentheses in Table 1. However, again, the dominant features in each factor could be grouped and then named. This time, however, the names followed the statistical phenomena that these features represented; see Figure 1, right.

**Figure 1.** The graph of the hidden factors' generators of the studied features identified by factor analysis (FA) in RR intervals (240 min) (**left part**) and in shuffled signals (**right part**).

In the case when FA is limited to the set of entropic measures, i.e., indices from the dynamical landscape and from partial entropy served as the features set, then we obtained only two significant factors. The first factor contained all dynamical landscape measures and *Ead* and *Eda*, while the second one was concentrated on the three event partial entropies. Hence, the entropic indices were divided into the measures of the vagal activity and the remaining ones.

FA for 5 min segments (all 48 segments from each person were taken into account) provided six important factors. The long term dependencies (Factor V in 240 min signals) were moved to the first factor with short term dependencies, which could be expected because long term and short term indices now worked in similar time scales. However, it was surprising that Factor I of 240 min signals was divided into the three new factors. These new factors were the domination of the total, rVLF, and rHF (the first factor), of *E*<sup>3</sup> and *ST* (the second factor). All remaining indices of 240 min Factor I formed the third factor. One might think that the physiological and statistical components were mixed.

#### *3.2. Visualization of the Stochastic Relations between Features*

In general, if variables are strongly correlated, then we can use the value of one variable to predict the value of the other variable. This way, the correlation coefficient became a measure of dependence (at least in the statistical sense) between the features. Consequently, the correlation coefficients can be used in clustering the features. The correlation matrix on the basis of which the factors of Table 1 were identified is shown in Figure 2. As the validation for the observed correlations, we display the correlation matrix obtained from shuffled signals.

**Figure 2.** Tables of correlation values between studied features estimated from the analysis of original 240 min signals (**left**) and when the 240 min signals were randomly shuffled (**right**). One can identify factors, clusters of strongly correlated features, and then observe the correlations between different factors.

The cluster structure of the analyzed features was easily discerned. By the naked eye, in Figure 2, one can identify the factors discussed in the previous subsection. Starting from the the obvious anticorrelation between meanRR and meanHR, it is noticeable that the presence of the monotonic patterns *ddd* or *aaa* was rather anticorrelated with the appearance of the alternate patterns, *ad*, *da*, *dad*, and *ada*, and very weakly correlated with the values of the total entropy, *E*3, *E*2, and *E*1, and dynamic measures *ST* and *sTE*. One can observe also how these relations changed when correlations among the features were estimated from the shuffled signals.

It turned out that in the case of shuffled signals, the time and nonlinear indices, except the general features of Factor IV, became strongly correlated. Furthermore the features estimated by Fourier analysis displayed independence from all other measures. These facts could support the hypothesis that in the case of shuffled signals, the correlation matrix revealed only the mathematical relations between features. Consequently, the comparison between correlations detected in our original

signals and correlations found in the shuffled signals suggested the hypothesis that the dynamics of decelerations and accelerations were not random, but followed special patterns. This observations strongly motivated our interest in the short term patterns.

In Figure 3 are displayed correlations found for five minute signals. Here, in the estimates of features for each person, we included all 48 segments of a 240 min long recording. It means that we studied correlations in a set of 33 features and with 48 × 181 = 8688 patients. Accordingly, such patients were not independent, as was demanded by statistical analysis rules. However, remaining in the spirit of ML, we accepted this violation. One can think that such analysis is like the stroboscopic observation of a system. The deep learning methods are perfectly suitable for this kind of analysis. However, this approach we leave for our future investigations.

**Figure 3.** Tables of correlation coefficients between studied features estimated from the sets of real five minute signals (**left**) and when the five minute analysis is done on signals randomly shuffled (**right**). One can notice the correlations inside the factors.

It turned out that the set of our 33 features, observed in a stroboscopic way, provided similar factorization of measures to that one obtained in the case of estimates with the whole 240 min signals, though the values of the correlation coefficients were lower. Distinctly from the 240 min analysis, the frequency measures occurred as being independent of all others. Moreover, the large cluster consisting of short term and dynamic landscape measures revealed some intrinsic structure: the indices *E*<sup>3</sup> and *ST* were detected as independent of all other indices of the cluster.

The absence of known mathematical relations between features, as well as the appearance of surprising correlations in the shuffled signals suggested that correlation analysis could be misled by the poor information obtained from the five minute segments of signals. The local fluctuations could break the probabilistic relations in the sense that we could not see the expected dependence among the variables. An accidental variation that was actually recorded in a signal drove the estimates. Concluding, HRV outcomes obtained from five minute segments were found misleading. This problem will be investigated further in the next subsection.

#### *3.3. Graphs of Strong Correlations within Entropic Measures*

The entropic measures considered by us, i.e., measures that are based on total or partial entropy, are strongly mathematically related. One should expect that these relations are revealed by the correlation coefficients. If we assume that by strong correlations, we mean the correlation coefficient greater than 0.8, then the following picture of the strongly correlated features emerges from our data.

In Figure 4, two graphs of strong correlations are plotted: for features estimated from the 240 min original signals and from the shuffled signals. Together, we show the scatter plots between *E*<sup>3</sup> and the most important for the system dynamics indices, namely of stochastic dynamics *ST* and *sTE* and partial entropies that construct *E*3: *Eddd* and *Edad*.

**Figure 4.** The graphs of strong correlations (*θ* = 0.8) between entropic measures estimated from 240 min signals: original signals (**left**) and shuffled signals (**right**). Below, the scatter plots (with regression lines) between *E*<sup>3</sup> and important other entropic indices are shown.

Evidently, all expected mathematical relations are displayed in the graph, which shows the relations obtained from the shuffled signals. This graph is almost complete: the features are strongly correlated. However, the original signals seemed to not follow the statistics. Especially, let us point at the links between *E*<sup>3</sup> and *sTE* and between *E*<sup>3</sup> and *ST*. The strong correlations between *E*<sup>3</sup> versus *ST* and *E*<sup>3</sup> versus *sTE* were present only among original signals, whereas they were absent in the shuffled signals. A different relation was observed for correlations between *E*<sup>3</sup> and *Eddd* and *Edad*. There was a noticeable distinction between the values of indices obtained from original signals and from shuffled signals. Additionally, the variability among these values influenced the correlation. Therefore, one can see the structure of the correlations obtained from the original signals as specific for the dynamics of the studied physiological system.

On the other hand (see Figure 5), in the case of shuffled signals divided into five minute segments, and when each feature was represented by 48 values, we obtained an almost empty graph of strong correlations. Notice the difference in the dispersions of values of the displayed features in the corresponding scatter plots. These results could suggest that the features were calculated from too short signals to preserve the mathematical relations. However, of note is the fact that the strong correlations between *sTE* with *E*<sup>3</sup> and *ST* with *E*<sup>3</sup> were still present in the graph representing relations estimated from the original signals. Hence, we had evidence that the relations between *sTE*, *ST*, and *E*<sup>3</sup> could represent important physiological information.

**Figure 5.** The graphs of strong correlations (*θ* = 0.8) between entropic measures estimated from all five minute segments of studied signals: original signals (**left**) and shuffled signals (**right**). Below, the scatter plots (with regression lines) between *E*<sup>3</sup> and other important entropic indices are shown.

Finally, let us show the correlation analysis performed for the features calculated from the five minute segment, which displayed a minimal stdRR for a given person; see Figure 6 (left). This graph shows many relations that were presented in the graph of 240 min series, including relations between *sTE*, *ST*, *Eddd*, and *Edad* with *E*3. The significant reduction in total HRV, which corresponds to the segment with minimal stdRR, could correspond to the moments of transitions from NREM to REM sleep, where a shift of sympatho-vagal balance toward a vagal withdrawal and a possible sympathetic predominance is reported [12,34].

On the other hand, the graph corresponding to the five minute segments with the minimal meanHR (see Figure 6 (right)), together with the corresponding scatter plots, showed similarity to the graph constructed on base of the shuffled signal rather. Here, we did not observe the strong relationships between *sTE* and *ST* with *E*3. This observation agrees with the common belief that during deep sleep, where the minimal HR was expected, the system was driven solely by the strong activity of the vagal nervous systems and that the sympathetic activity was switched off [12,13,34].

Concluding our observations on correlations among *E*<sup>3</sup> and *sTE*, *ST*, *Eddd*, and *Edad*, we can hypothesize that the structure of these relationships can be an indicator of the sympathetic system activity.

**Figure 6.** The graphs of strong correlations between entropic measures when five minute segments with the minimal stdRR (**left**) and minimal HR (**right**) are chosen for the analysis. Below, the scatter plots (with regression lines) between *E*<sup>3</sup> and other important entropic indices are shown.

#### *3.4. Classification with SVM*

Let us start with the presentation of the age dependence of each studied variable found by the regression analysis. In Table 2, we subsequently show the results of the normality test, the equal variance test, the age groups found significantly statistically different, and the linear regression results with the quality evaluated by the *R*<sup>2</sup> Pearson correlation coefficient and the two linear model coefficients with their statistical significance.

One can see from Table 2 that almost all studied features displayed a dependence on age. Therefore, one can expect that these features could serve as a good proposition for the automatic classification by SVM. We performed the classification with the linear SVM and with SVM acting on Gaussian kernels with *γ* = 0.2 and regulation *C* = 1. Each classifier constructed a decision function, which provided the probability that a given person belonged to the given age decade. Because of the stochastic methods used in probability estimates, each classifier run could provide a different result. Typical probabilities obtained for linear SVM and for nonlinear SVM when features were found from 240 min segments are listed graphically in Figure 7. Together, we show the validation of the obtained results by presenting probabilities found by the same classifiers for the shuffled signals.

One can learn from Figure 7 that, in general, the maximum for class belonging revealed the true person age decade in the case of many persons. Especially, the age seemed to be properly estimated in the groups of the signals describing young and elderly people. However, the classification of the adult persons, at the age group of 40's, 50's, or 60's, was not clear. The maximal probability among these classes was not obvious. Consequently, the winning class could be incidental. Notice that this effect was evident in the case of the shuffled signals, independent of which classifier, linear or nonlinear, was applied.


**Table 2.** The linear regression analysis of the discussed features.

Notation used for the quantification of statistical significance: #: *p* < 0.001, \*: *p* < 0.05, *NS*: *p* ≥ 0.05. W-S test: result of the Wilk–Shapiro test for normality: + passed, − failed; B-F test: result of the Brown–Forsythe test for equal variance: + passed, − failed; age groups found significantly different by ANOVA or Kruskal–Wallis ANOVA on ranks in case W-S failed; *R*<sup>2</sup> Pearson correlation coefficient for the estimated linear regression with its statistical significance; *a<sup>P</sup>* <sup>0</sup> the intercept value with its statistical significance; *a<sup>P</sup>* <sup>1</sup> the linear regression coefficient with its statistical significance.

**Figure 7.** Typical matrix plots of probabilities provided by the decision functions of the applied classifiers. The results, obtained for 181 persons, are arranged according to the age decade of a person (vertical axis) and the age decade class (horizontal axis).

The improvement of the classifier quality could be observed when the classification task was limited to the four classes: 20's, 40's, 60's, and 80's. These results are shown in Figure 8A,B. The effort of the classification SVM algorithms can be evaluated by reading the classification outcomes provided by the shuffled signals; see Figure 8C. Additionally, we tested the improvement of classifiers when the classification task was restricted to the adult people: 40's, 50's, 60's, and 70's; Figure 8D. We see the essential refinement of the automatic classification: the wining class was evident in the case of the classes of 20's, 40's, 60's, and 80's. In particular, an average of 77.7 ± 1.5 people (score = 72.6 ± 1.4%) were classified correctly by the linear SVM. In the case of nonlinear SVM classification, the winning class agreed almost everywhere with the true age decade of a person; on average, only seven incorrect classifications (score = 93.6 ± 5.3%). However, when the classification task was performed on the features of 40's, 50's, 60's, and 70's, the mean score = 65.9 ± 17.0% was lower and varied significantly from run to run, suggesting instability in the numerical estimates.

One can worry that the automatic classification task based on 33 features in the population of 107 signals could not be properly fitted because the age groups were too small to couple with such a wide set of features. Therefore, in the plots of Figure 9, we report the results found when the set of features was restricted to (A) entropic indices (dynamical landscape and partial entropy) and (B) best\_10 measures. The set of best\_10 indices was constructed with the highest classification score achieved on the set of all signals. This set consisted of {meanRR, total, sd2, PAS, PSS, PIP, *Edad*, *Eda*, *p*(*ad*), *p*(*ada*)}. One can see that restriction of the set of features limited the classification quality, namely the score was significantly smaller than in the case when all features were taken into account.

In Figure 9C,D, we also show the probabilities provided by signals representing the five minute segments with minimal stdRR and with minimal HR. While the segments corresponding to the minimal HR provided a very accurate solution for the classification task, the segments extracted according to minimal stdRR correctly discerned only the young and elderly people.

**Figure 8.** Typical matrix plots of the probabilities provided by the decision functions of the applied classifiers. Results for 107 of 181 persons (**A**–**C**) and for 134 of 181 persons (**D**) are arranged according to their age decade.

**Figure 9.** Typical matrix plots of decision function probability for SVM with the Gaussian kernel and *γ* = 0.2. Here, classification results are given for different sets of features used in the classification: (**A**) entropic measures and (**B**) best\_10 measures; and when five minute segments with the special characteristic were extracted: (**C**) minimal stdRR and (**D**) minimal HR.

#### **4. Discussion and Conclusions**

The intensive studies on healthy populations proved the dependence between the biological age of a human and many HRV indices [21,39–44]. A vivid discussion is running whether by this dependence, the assessment of the autonomic function can be achieved [17,18,45]. Consistent results have been obtained after autonomic provocations by chemical blockade of vagal or sympathetic activity [46] and due to postural change, which boosts the sympathetic tone [47,48]. However, it has been also suggested that HRV may be dominated by tissue properties rather than by ANS regulation [17,49]. Namely, the excitability of the sinoatrial node cell membrane could be claimed as a main source of HRV as it determines the organism's homeostasis. In people who have had a heart transplant, one can observe the heart rhythm, which is shaped without the direct ANS control because of the denervation

of the donor heart by surgical dissection of postganglionic neurons [50]. These rhythms occur different from the rhythms observed in the healthy people of a similar age, independently of how long after the surgery [29].

The external stressors such as structural heart disease, hypertension, and possibly diabetes are known to induce a slow, but progressive process of structural remodeling in the cardiac tissue [51,52]. Therefore, the abnormal levels of short term HRV indices observed are supposed to be related to so-called erratic rhythms, i.e., rhythms probably resulting from remodeling of the cardiac tissue [10,21,53–55]. Accordingly, the higher HRV values cannot be attributed solely to the better organization of the feedback reflexes driving the organism's response to the actual body needs, but rather, the characteristics of the cardiac cells and the structure of their interconnections should be taken into account [56].

In the following, the specially chosen ML methods were applied to the set of thirty three features, HRV indices, estimated from 240 min nocturnal recordings of 181 healthy people of different ages to test whether the automated methods of ML could advance the research on separating HRV indices into those of ANS origin and the erratic part. The choice of sleeping period for the analysis was motivated by the limitation of possible artifacts. However, also, the nocturnal rest displayed a special organization in which the time periods with strong vagal activity and strong withdrawal of sympathetic activity of deep sleep were switched into REM periods where ANS activity was similar to the awake state [13,34]. Although a discussion on this subject is beyond the scope of this article, it is worth noting that our analysis could directly benefit from this specific nocturnal ANS activity; we have found arguments supporting the basic concepts of HRV:


In particular, we found that entropic indices operating on the whole set of patterns: the dynamic landscape measures, refer to the vagal activity rather, while the corresponding counting measures describe the sympathetic-vagal balance in ANS. Therefore, both characterizations: the total volume of patterns, as well as their distribution are important in studies of ANS activity as they describe distinct aspects of the ANS control organization.

We have found the ML methods to be advancing versatile validations of the known results and common intuitions. We were allowed to practice comprehensively, to verify many aspects of the studied phenomena in an unlimited way. In particular, we utilized the flexibility of the ML methods, using the shuffling as a validation method. However, also, we were concerned about them to avoid possible pitfalls. Additionally, what is extremely profitable, we were given attractive frames for the presentation of the results.

ML analysis issued a warning about the use of short segments of recordings in research based on statistical properties. Many of the HRV measures rely on features constructed following the assumption that the signals are stationary. However, RR intervals are not stationary, which was proven by many methods. Accordingly, the HRV measures estimated from short signals may overestimate the role of fluctuations, and the results are overtaken by incidental events. We observed this effect while testing measures revealing pattern statistics. We found that the presence or absence of a strong correlation between some pattern HRV indices could indicate a specific dynamical order, which was attributed to the original heart rhythms only. However this arrangement was seen only with sufficiently long signals. However, under the controlled conditions, such as minimal meanHR or maximal stdRR, the short signals provided a satisfactory description of the corresponding physiological state.

The collection of features obtained for one person from the subsequent five minute segments might be the source data for other types of analysis: the stroboscopic approach, which could assign a new role to short segments. The deep learning methods can be applied, and different insights into the organization of the dynamics in RR intervals can be offered.

Supposing that the dynamic organization of RR intervals was age dependent, the classification with SVM was performed. However, the methods of classifications used by us seemed to fail with our data. Although most of studied indices displayed dependence on age, the decision functions of the SVM methods applied to these indices were proven weak in their ability to discern the age. The methods, in general, recognized the group's decade, but belonging to the group was not obvious. Probably, the set of considered signals was too small.

**Author Contributions:** Conceptualization, D.M.; methodology, D.M.; data curation, J.W.; writing, review and editing, D.M. and J.W.

**Funding:** This research received no external funding.

**Acknowledgments:** The authors acknowledge the support given by Marta Zarczy ´nska-Buchowiecka (Medical University of Gdansk) in recording and editing the Holter signals.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **References**


c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Complexity of Cardiotocographic Signals as A Predictor of Labor**

**João Monteiro-Santos 1,2,\*, Teresa Henriques 1,2, Inês Nunes 2,3,4, Célia Amorim-Costa 2,3,5, João Bernardes 2,5,6 and Cristina Costa-Santos 1,2**


Received: 14 November 2019; Accepted: 13 January 2020; Published: 16 January 2020

**Abstract:** Prediction of labor is of extreme importance in obstetric care to allow for preventive measures, assuring that both baby and mother have the best possible care. In this work, the authors studied how important nonlinear parameters (entropy and compression) can be as labor predictors. Linear features retrieved from the SisPorto system for cardiotocogram analysis and nonlinear measures were used to predict labor in a dataset of 1072 antepartum tracings, at between 30 and 35 weeks of gestation. Two groups were defined: Group A—fetuses whose traces date was less than one or two weeks before labor, and Group B—fetuses whose traces date was at least one or two weeks before labor. Results suggest that, compared with linear features such as decelerations and variability indices, compression improves labor prediction both within one (C-Statistics of 0.728) and two weeks (C-Statistics of 0.704). Moreover, the correlation between compression and long-term variability was significantly different in groups A and B, denoting that compression and heart rate variability look at different information associated with whether the fetus is closer to or further from labor onset. Nonlinear measures, compression in particular, may be useful in improving labor prediction as a complement to other fetal heart rate features.

**Keywords:** labor; fetal heart rate; entropy; data compression; complexity analysis; nonlinear analysis; preterm

#### **1. Introduction**

Worldwide, approximately 15 million infants are born preterm (after less than 37 completed weeks of gestation) each year [1]. Over one-third of the world's estimated 3 million annual neonatal deaths are related to preterm birth [2–4]. Even after surviving the neonatal period, infants born preterm are at increased risk of delayed childhood development and low economic productivity [5]. Therefore, interventions to reduce the preterm birth rate are of utmost importance.

Clinical decisions during labor and delivery in developed countries are strongly based on cardiotocography (CTG) [6–8], which has been one of the most used tools in assessing fetal wellbeing since the early '60s. CTG combines fetal heart rate (FHR), obtained using a Doppler ultrasound probe or electrocardiogram electrodes, with uterine contractions (UC) measurements, obtained using an abdominal or intra-uterine pressure transducer. Both provide relevant information about the fetal condition and early detection of preterm labor and abnormal labor progress [7,9,10].

Despite the importance of assessing the wellbeing of the fetus and mother, poor agreement among physicians in the analysis and classification of CTGs is still a problem, even among experienced obstetricians, resulting in a high false positive rate [6,11,12]. In daily practice, FHR and UC are displayed on a printout or monitor to be visually interpreted by a clinician. Even when following specific, well-accepted guidelines (for example, the International Federation of Obstetrics and Gynecology (FIGO), associated with high sensitivity and low specificity [13]), interpretation of CTG relies on the clinician's opinion and daily practice. This leads to a chance that adherence to conventional guidelines could be more harmful than beneficial [14].

The beat-to-beat variation of FHR reflects the influence of the fetus' autonomic nervous system (ANS) and its components (sympathetic and parasympathetic) in the heart. Therefore, it is an indicator of the fetal pathophysiological status, which can be used in the assessment of fetal wellbeing [15] and its well-known influence on labor onset and progression [16]. A certain level of unpredictable fetal heart rate variability (fHRV) reflects sufficient capabilities of the organism in search of optimal behavior. Reduced fHRV is linked with limited capabilities and mental disorders [17]. The linear modeling approach is used to quantify sympathetic and parasympathetic control mechanisms and their balance through the measurement of spectral low- and high-frequency components. However, it has been shown that not all information carried by beat-to-beat variability can be explained by these components [18]. For this matter, in the past couple of decades, and with the fast development of computation, new signal processing and pattern recognition methodologies (namely entropy and compression) have been developed and applied to many different fields, including the analysis of fHRV [19,20]. These approaches can reveal relevant clinical information not exposed by temporal or frequency analysis [21].

Systems, such as Omniview SisPorto [22–24] and NST-Expert, which later became CAFE [25], can automatically deal with CTG assessment and then overcome the limitations of the visual assessment of CTGs mentioned above, but clinical judgment remains highly dependent on CTG analysis [26]. Since all FHR processing and analysis in these systems is based on morphological features provided by FIGO guidelines, they lack the integration of nonlinear indices that would allow them to be optimized.

The ability to predict preterm labor can improve the wellbeing of both fetus and mother. The successful prediction of preterm labor is an essential part of a decision support system for physicians to implement measures that adequately reduce related fetal morbidity and mortality (like the administration of corticosteroids to the mother in order to accelerate lung maturation and therefore decrease the risk of respiratory distress in the newborn).

The main objective of this work is to evaluate how useful nonlinear parameters, namely entropy and compression, can be as labor predictors by using antepartum FHR and UC traces one or two weeks before labor.

#### **2. Materials and Methods**

#### *2.1. Nonlinear Methods*

#### 2.1.1. Compression

The Kolmogorov Complexity (KC) [27] is defined as the function mapping a string *x* in an integer, bounded to a Turing Machine φ. The KC reflects the increase in new patterns along a given sequence. In this case, the word complexity refers to the algorithmic complexity, defined according to information theory, as the length of the shortest program p able to print the string *x*.

$$\text{KC}\_{\phi}(\mathbf{x}) = \begin{cases} \min\{ |p| : \phi(p) = \mathbf{x} \}, & \text{if } \phi(p) = \mathbf{x} \\ \infty & \text{if } p \text{ does not exist} \end{cases} \tag{1}$$

For a random string, the output of the KC function will be the length of the original string, as any compression effort will end in information loss. On the other hand, the more reoccurring patterns, the less complex the string is.

Although this concept is objective, its applicability is limited by the fact that KC is not computable. Compressors are a close upper-bounded approximation of the KC function. For over 30 years, data compression software has been developed for data storage and transmission efficiency purposes. More recently, compression has been utilized in research fields like music, literature, internet traffic, and health [28–30].

In this work, we will assess the algorithmic complexity of FHR and UC signals by applying the Gzip compressor. Gzip [31] combines two classical algorithms—Lempel–Ziv (LZ77) [32], a dictionary based algorithm, and Huffman scheme [33]—by encoding sequences of high probability using shorter bits in comparison with lower probability strings, where longer bits are used. The amount of compression obtained depends on the input file size and the distribution of common substrings.

The idea is that for a given time series, the compression ratio (CR), i.e., the compressed size of the file divided by its original size, can be used to assess the complexity. A random series will have CR close to 1, whereas a series full of patterns will be highly compressible and, therefore, the CR will be close to 0. The Gzip with maximum compression levels and values presented represents the percentage of CR.

#### 2.1.2. Entropy

In 1991, Pincus developed the Approximate Entropy (ApEn), a regularity statistic tool used to quantify a system's complexity based on the notion of entropy [34]. The ApEn measures the irregularity of time series and is defined as the logarithmic likelihood that the patterns of a time series that are close to each other will remain close when longer patterns are compared.

Later, in 2000, Richman and Moorman [35] proposed Sample Entropy (SampEn). Similar to ApEn, the SampEn measures time series irregularity. However, it does so with some major advantages: (1) self-matches are not counted, reducing bias; (2) it agrees much better than ApEn statistics with the theory for random numbers with known probabilistic character over a broad range of operating conditions; (3) the conditional probabilities are not estimated in a template manner. Instead, they are computed directly as the logarithm of conditional probability rather than from the ratio of the logarithmic sums, showing relative consistency in cases where ApEn does not [36].

To use either ApEn or SampEn, decisions on two different parameters, *m*, and *r*, have to be made. The *m* parameter is the embedding dimension, i.e., the length of sequences to be compared, while the tolerance parameter *r* works as a similarity threshold. Two patterns are considered similar if the difference between any pair of corresponding measurements is less than or equal to *r*. Values of 0.1, 0.15, or 0.2 standard deviations (SD) are usually used for parameter *r*, while *m* is mostly considered as 2 [37]. In this work, tolerance of 0.1 SD and an embedding dimension of 2 were used.

#### *2.2. Data*

The FHR data used for this study were from a retrospective cross-sectional study [38]. Each FHR trace corresponds to distinct fetuses from a singleton pregnancy. The selected traces were acquired between July 2005 and November 2010 during hospitalization in a tertiary care university hospital. All traces were acquired at least 48 h before delivery to guarantee they included no labor time. Furthermore, the traces included were at least 20 min long, during which the signal quality was over 80%, and the signal loss was less than 33%.

The cardiotocographic signals were acquired using an external ultrasound sensor applied to the maternal abdomen. The ultrasound signal is filtered, envelope rectified and digitized at a sampling rate of 800 Hz with a 12-bit precision [39]. Then, an autocorrelation function is used to calculate the heart period and the similarity between pulses of two consecutive heartbeats, as described in [40]. Via the digital outputs of the fetal monitors, resulting traces were analyzed using the Omniview SisPorto® 3.7 system [23] at a sampling rate of 4 Hz (Figure 1).

**Figure 1.** Example of a fetal heart rate (FHR) time series.

SisPorto features used in this paper are summarily described in Table 1. Note that the SisPorto system does not perform any average or reduction in FHR/UC signals.



The 1072 traces selected ranged from 30 to 35 gestational weeks. Two groups were defined: Group A—fetuses whose traces date was less than two weeks before labor, and Group B—fetuses whose traces date was at least two weeks before labor. Physiological fetal and maternal features, such as maternal age (mAge) and baby gender, as well as some tracing characteristics such as trace duration and signal quality, were compared in both groups. Linear indices for uterine contraction analysis comprised of mean\_UC (median of UC mean from 10min nonoverlapping blocks), sd\_UC (median of UC standard deviation from 10min nonoverlapping blocks) and cv\_UC (coefficient of variability of UC).

Two complexity measures, Gzip and SampEn, were considered in this work. Because the value of these measures depends on the trace size, each tracing was split into non-overlapping blocks of 10 min. Both Gzip and SampEn were computed for each block. Then, the median value of CR and SampEn for each fetus was used. Both complexity measures were calculated for FHR (Gzip\_FHR and SampEn\_FHR) and UC signals (Gzip\_UC and SampEn\_UC).

#### *2.3. Statistical Analysis*

Normality for continuous variables was evaluated by visual inspection of the frequency distribution (histogram). For normally distributed variables, the values for each group are presented as mean ± SD, and an independent samples t-test was performed. On the other hand, for skewed continuous variables, the values are presented as median (minimum-maximum), and the Mann–Whitney test was used to compare the two groups. The categorical variables were compared in the two groups applying the Chi-Square test or Fisher's exact test as applicable.

Logistic regression, using Hosmer–Lemeshow to test the goodness of fit, was used to predict which fetuses will be born preterm in the next two weeks. Variables were selected using Wald's backwards method. The concordance statistic (C-statistic), measured by the area under the receiver operating characteristic curve, was computed to assess the model's discrimination.

Akaike Information Criterion (AIC), AIC = 2*k* − 2 log(*L*), where *k* is the number of parameters and *L* the maximum value of the likelihood function, was used for model comparison, where a lower result suggests a better model.

Statistical analysis was performed with IBM SPSS Statistics for Windows, version 24 (IBM, Armonk, NY, USA).

#### **3. Results**

A total of 1072 antepartum tracings were used, 96 of which were born in the following two weeks (Group A). The main clinical characteristics of the group in which fetuses were born in the next two weeks (Group A) and the group in which they were not (Group B) are presented and compared in Table 2. Note that no differences were found between the groups for these variables.


**Table 2.** Fetal and maternal features from Group A—fetuses whose traces date was less than two weeks before labor, and Group B—fetuses whose traces date was at least two weeks before labor.

SisPorto features were also compared between the two groups (Table 3). Statistical significance was found with variables iDec (*p* < 0.001), which was lower in fetuses who would be born in the next two weeks, and average long-term variability (abLTV), which was higher in fetuses who would be born in the next two weeks (*p* = 0.038).

Furthermore, while SampEn was not able to find differences between the traces from babies in the two groups with FHR and UC signals, Gzip was (*p* = 0.024 for FHR, *p* = 0.013 for UC), being lower in fetuses who would be born in the next two weeks (Group A) for FHR signals, while the opposite happened for UC signals. The standard deviation of UC was also significantly higher for Group A (*p* = 0.020).


**Table 3.** SisPorto and nonlinear features from Group A—fetuses whose traces date were less than two weeks before labor, and Group B—fetuses whose traces date were at least two weeks before labor.

Logistic regression, including all relevant variables (*p* < 0.05)—Gzip\_FHR, Gzip\_UC, sd\_UC, iDec, a week of CTG (wCTG), and abLTV—was then performed using a backward selection model. The model obtained included the variables Gzip, iDec and a week of CTG (wCTG). Also, interactions between Gzip and wCTG were considered but found to be non-significant. Results from the logistic regression can be found in Table 4.


**Table 4.** Logistic regression for labor prediction in two weeks or less.

<sup>a</sup> No iDec was set as reference instance.

From this logistic regression model, abLTV and UC variables were removed from the initial set of predictors made by the model, and a C-statistic of 0.704 was obtained, with a 95% confidence interval range of 0.651–0.758. Also, the AIC obtained for this model was 603.763. The process was repeated considering all relevant physiological and linear features but without Gzip. This model, now without Gzip but with abLTV, achieved an AIC of 605.5 and a C-statistic of 0.691 (0.639–0.742).

The groups were also redefined and tested again. The same analysis as before was performed, except Group A consisted of fetuses who were born less than one week (instead of two weeks) from trace acquisition (n = 27, all preterm) and Group B consisted of all other fetuses (n = 1045, term and preterm babies), which were born as term and preterm babies. SisPorto and nonlinear features were compared between the groups, as carried out in our previous analysis (results in Appendix A).

The logistic regression results are shown in Table 5. Note that the same variables were included in the logistic regression.


**Table 5.** Logistic regression for labor prediction in one week or less.

This model achieved an AIC of 235.3 and a C-statistic of 0.728 (0.619–0.836), which is a small improvement compared with the first one described in this paper.

In Table 6, Spearman's correlation coefficient between Gzip and different physiological measures of variability was calculated. Moreover, the same coefficient was calculated for each group. Statistically significant results were found for abLTV and avLTV for two weeks labor prediction.

**Table 6.** Spearman's correlation coefficient and respective 95% confidence interval (CI) between Gzip\_FHR and short- and long-term variabilities given by SisPorto. Confidence intervals were calculated using bootstrapping. Bold means significant differences between groups.


#### **4. Discussion**

This study enhances the importance of the inclusion of nonlinear indices in clinical practice. In particular, the results suggest that the Gzip compression ratio, a measure of the time series complexity, may improve the predictability of labor onset when applied to FHR and UC signals.

The main objective of this work was to predict labor within two weeks. Both groups included preterm and term babies. In Group A, 46 of 90 were term babies, born between 36 and 37 weeks of gestational age; while in Group B, 44 of 976 fetuses were preterm. No statistical significance was found between term and preterm cases in Group A or Group B.

The information captured by compression relates to the information comprised of other physiological features, such as short and long term variabilities [41]. In our study, Gzip\_FHR has a Spearman's correlation coefficient of −0.524 and 0.5 with abSTV and avSTV's variabilities, respectively. These results contrast with a previous study [41] where correlation values were much higher in absolute value (−0.851 and 0.774). Some different characteristics of the datasets used in each study can explain these differences. On the one hand, the dataset of our study was acquired in an antepartum setting, while the data from the previous study were recorded during the intrapartum. In line with this, the difference observed in the two studies suggests that compression looks at physiological regulatory mechanisms that differ between both settings. On the other hand, another possible explanation is the different sampling rates used in the two studies (4 Hz here, versus 2 Hz in the other study). This may indicate that some information is lost when using 2 Hz. This inkling is supported by the results of Gonçalves et al. [42], who found nonlinear differences between both sampling rates. However, the study of Gonçalves et al. [42] is an intrapartum study, and the tolerance parameter for entropy was computed using an automatic threshold proposed by Lu [43]. A multiscale analysis of scale two would be affected by the latter hypothesis (as it mimics a 2 Hz sampling rate), but in our study, no difference was found. Govindan et al. [44] suggested a different approach, modifying the definition of sample entropy using a time delay. Future studies should compare several methods to study the oversampling question.

When factoring by group, we found significant differences in correlations between Gzip and abLTV and avLTV (Table 6). Different studies [45–48] found HRV changes, such as variability increase and pattern formation throughout fetal maturation, captured by nonlinear indices. Here, different patterns arise in the two groups presented, meaning that compression attains different information from HRV when compared with usual metrics. However, no statistical significance was found in one-week labor prediction analysis. We believe this might be due to low statistical power, as the number of individuals in Group A was 27, making confidence intervals too wide.

Some papers [49,50] indicate different gender development throughout gestation and suggest taking this into account in model creation. Though it was taken into consideration, no significant results were found.

The mean compression ratio (instead of median) of the tracings' block was also considered, and the results obtained were similar. These results suggest robustness of compression regarding skewness and outliers, as well as low intra-tracing variability. Furthermore, multiscale analysis [51] was also performed both for SampEn and Gzip up to five scales, since we were using intervals of 10 min (~1440 data points), but no improvement was found.

Two different definitions for the groups were tested. The same analysis as before was performed, considering Group A as babies who were born preterm less than two weeks, and then less than one week, from trace acquisition (n = 27). As shown in Tables 4 and 5, the logistic regression included the same variables. A small improvement was verified when considering one week, compared with two weeks, from labor. These results reinforce the stability of compression when predicting labor time.

Nonlinear FHR features recognition is a problem in the clinical community because clinicians do not always know how to interpret it. Although entropy has been associated with the activity of central nervous system regulation [52,53], there are still no direct associations between compression and the fetus' physiology. Compression looks for patterns in the series, and a healthy fetus is linked with a high compression ratio (a more chaotic signal leads to fewer patterns that are able to be compressed). In contrast, an unhealthy fetus, under the response of its regulatory system, creates a heart rate signal with more patterns, leading to a lower compression ratio. There is evidence that sympatho-vagal activity, and probably also central nervous system activity, are associated with the onset and progression of labor, namely via sympathetic activation and vagal inhibition mechanisms [16]. A continuous decrease in the sympathetic stress response during the last weeks before labor was also reported [54], contrary to a stable baseline sympathetic level. Being able to find links between these events and nonlinear indices is key for medical acceptance of these tools in daily practice. Therefore, it is imperative that a more thorough analysis of the FHR changes captured by compression is carried out in particular.

These results are relevant since an early prediction of labor as a decision support system for physicians can improve both fetus and mother assessment and care. In particular, being capable of predicting preterm labor is of extreme importance, as major risks to fetus and mother are associated with it.

This work has some limitations. The number of preterm cases is small, considering the week of the CTG variable is included. Because of this, only fetuses between weeks 30 and 35 of gestational age were selected, limiting the interpretability of the results. Although all the cases were hospitalized, no knowledge of the hospitalization cause is known.

Future studies should validate these models in larger datasets and, if possible, test them in different settings, such as during hospitalization and regular appointments.

#### **5. Conclusions**

Prediction of labor is of extreme importance since physicians will be able to take preventive measures to ensure that both baby and mother will be as prepared as possible. In this work, it was shown that nonlinear measures, compression in particular, can improve labor prediction.

**Author Contributions:** J.M.-S., T.H., and C.C.-S. substantial contribution to conception and design; T.H., C.C.-S., J.B., I.N., and C.A.-C. revise critically for important intellectual content; J.M.-S. and T.H. wrote the paper. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the project "Digi-NewB" funded from the European Union's Horizon 2020 research and innovation programme under grant agreement No 689260.

**Acknowledgments:** J. Santos acknowledges the support of the doctorate scholarship from the Clinical Research and Health Services Program, funded by the European Social Fund under the Portuguese Norte 2020 research and innovation program (grant agreement n◦ Norte-08-5369-FSE-000063). The authors also acknowledge the SisPorto project based at the department of Obstetrics and Gynecology of the School of Medicine, University of Porto.

**Conflicts of Interest:** João Bernardes currently receives royalties from the development of the commercially available SisPorto system for CTG monitoring.

#### **Appendix A**


**Table A1.** SisPorto and nonlinear features from Group A—fetuses whose traces date were less than one week before labor, and Group B—fetuses whose traces date were at least one week before labor.


**Table A1.** *Cont.*

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*
