Integrated Information in the Spiking–Bursting Stochastic Model

Kanakov, Oleg; Gordleeva, Susanna; Zaikin, Alexey

doi:10.3390/e22121334

Open AccessArticle

Integrated Information in the Spiking–Bursting Stochastic Model

by

Oleg Kanakov

¹,

Susanna Gordleeva

^2,3

and

Alexey Zaikin

^4,5,6,*

¹

Faculty of Radiophysics, Lobachevsky State University of Nizhny Novgorod, 603950 Nizhny Novgorod, Russia

²

Institute of Biology and Biomedicine, Lobachevsky State University of Nizhny Novgorod, 603950 Nizhny Novgorod, Russia

³

Center for Technologies in Robotics and Mechatronics Components, Innopolis University, 420500 Innopolis, Russia

⁴

Institute of Information Technology, Mathematics and Mechanics, Lobachevsky State University of Nizhny Novgorod, 603950 Nizhny Novgorod, Russia

⁵

Institute for Women’s Health and Department of Mathematics, University College London, London WC1E 6BT, UK

⁶

Centre for Analysis of Complex Systems, Sechenov University, 119991 Moscow, Russia

^*

Author to whom correspondence should be addressed.

Entropy 2020, 22(12), 1334; https://doi.org/10.3390/e22121334

Submission received: 22 October 2020 / Revised: 18 November 2020 / Accepted: 20 November 2020 / Published: 24 November 2020

(This article belongs to the Special Issue Uncertainty in Large Neural Systems: Validation, Explanation and Correction of Multidimensional Intelligence in a Multidimensional World)

Download

Browse Figures

Versions Notes

Abstract

:

Integrated information has been recently suggested as a possible measure to identify a necessary condition for a system to display conscious features. Recently, we have shown that astrocytes contribute to the generation of integrated information through the complex behavior of neuron–astrocyte networks. Still, it remained unclear which underlying mechanisms governing the complex behavior of a neuron–astrocyte network are essential to generating positive integrated information. This study presents an analytic consideration of this question based on exact and asymptotic expressions for integrated information in terms of exactly known probability distributions for a reduced mathematical model (discrete-time, discrete-state stochastic model) reflecting the main features of the “spiking–bursting” dynamics of a neuron–astrocyte network. The analysis was performed in terms of the empirical “whole minus sum” version of integrated information in comparison to the “decoder based” version. The “whole minus sum” information may change sign, and an interpretation of this transition in terms of “net synergy” is available in the literature. This motivated our particular interest in the sign of the “whole minus sum” information in our analytical considerations. The behaviors of the “whole minus sum” and “decoder based” information measures are found to bear a lot of similarity—they have mutual asymptotic convergence as time-uncorrelated activity increases, and the sign transition of the “whole minus sum” information is associated with a rapid growth in the “decoder based” information. The study aims at creating a theoretical framework for using the spiking–bursting model as an analytically tractable reference point for applying integrated information concepts to systems exhibiting similar bursting behavior. The model can also be of interest as a new discrete-state test bench for different formulations of integrated information.

Keywords:

integrated information; discrete-state stochastic model; computational biology; neuron-astrocyte networks

1. Introduction

Integrated information (II) [1,2,3,4] is a measure of internal information exchange in complex systems and it has recently attracted a lot of interest, because initially it was proposed to quantify consciousness [5]. Despite the fact that this initial aim is still a matter of research and debate [6,7,8,9], the II concept itself is by now a widely acknowledged tool in the field of complex dynamics analysis [10,11,12]. The general concept gave rise to specific “empirical” formalizations of II [13,14,15,16] aimed at computability from empirical probability distributions based on real data. For a systematic taxonomy of II measures, see [17], and a comparative study of empirical II measures in application to Gaussian autoregressive network models has been recently done in [18].

Our recent study [19] addressed the role of astrocytic regulation of neurotransmission [20,21,22] in generating positive II via small networks of brain cells—neurons and astrocytes. Empirical “whole minus sum” II, as defined in [13], was calculated in [19] from the time series produced by a biologically realistic model of neuro-astrocytic networks. A simplified, analytically tractable stochastic “spiking–bursting” model (in complement to the realistic one) was designed to describe a specific type of activity in neuro-astrocytic networks which manifests itself as a sequence of intermittent system-wide excitations of rapid pulse trains (“bursts”) on the background of random “spiking” activity in the network [23,24]. The spiking–bursting model is a discrete-time, discrete-state stochastic process which mimics the main features of this behavior. The model was successfully used in [19] to produce semi-analytical estimates of II in good agreement with direct computation of II from time series of the biologically realistic network model. We have suggested a possible explanation that a generation of positive II was the reason why mammal brain evolved to develop an astrocyte network to overlap with a network of neurons, but, still, it remained unclear what are the underlying mechanisms driving a complex neural behavior to generate positive II. In this paper we address this challenging question.

The present study aims at creating a theoretical formalism for using the spiking–bursting model of [19] as an analytically tractable reference point for applying integrated information concepts to systems exhibiting similar bursting behavior (in particular, to other neuron–astrocyte networks). The analytical treatment is based on exact and asymptotic expressions for integrated information in terms of exactly known probability distributions for the spiking–bursting model. The model is constructed as the simplest possible (although essentially non-Gaussian) to reflect the features of neuron–astrocyte network dynamics which lead to generating positive II. We also aim at extending the knowledge of comparative features of different empirical II measures, which are currently available mainly in application to Gaussian autoregressive models [17,18], by applying two such measures [13,16] to our discrete-state model.

In Section 2 and Section 3 we specify the definitions of the II measures used and the model. Specific properties of the model which lead to redundancy in its parameter set are addressed in Section 4. In Section 5 we provide an analytical treatment for the empirical “whole minus sum” [13] version of II in application to our model. This choice among other empirical II measures is inherited from the preceding study [19] and is in part due to its easy analytical tractability, and also due to its ability to change sign, which naturally identifies a transition point in the parameter space. This property may be considered a violation of the natural non-negativeness requirement for II [16]; on the other hand, the sign of the “whole minus sum” information has been given interpretation in terms of “net synergy” [25] as a degree of redundancy in the evolution of a system [18]. In this sense this transition may be viewed as a useful marker in its own right in the tool-set of measures for complex dynamics. This motivates our particular focus on identifying the sign transition of the “whole minus sum” information in the parameter space of the model. We also identify a scaling of II with a small parameter which determines time correlations in the bursting (astrocytic) subsystem.

In Section 6 we compare the outcome of the “whole minus sum” II measure [13] to that of the “decoder based” measure

Φ^{*}

, which was specifically designed in [16] to satisfy the non-negativeness property. We compute

Φ^{*}

directly by definition from known probability distributions of the model. Despite their inherent difference consisting in changing or not changing sign, the two compared measures are shown to bear similarities in their dependence upon model parameters, including the same scaling with the time correlation parameter.

2. Definition of II Measures in Use

The empirical “whole minus sum” version of II is formulated according to [13] as follows. Consider a stationary stochastic process

ξ (t)

(binary vector process), whose instantaneous state is described by N binary digits (bits), each identified with a node of the network (neuron). The full set of N nodes (“system”) can be split into two non-overlapping non-empty subsets (“subsystems”) A and B; such a splitting is referred to as bipartition

A B

. Denote by

x = ξ (t)

and

y = ξ (t + τ)

two states of the process separated by a specified time interval

τ \neq 0

. The states of the subsystems are denoted as

x_{A}

,

x_{B}

,

y_{A}

,

y_{B}

.

Mutual information between x and y is defined as

I_{x y} = H_{x} + H_{y} - H_{x y},

(1)

where

H_{x} = - \sum_{x} p (x) {log}_{2} p (x)

(2)

is entropy (base 2 logarithm gives result in bits); summation is hereinafter assumed to be taken over the whole range of the index variable (here x),

H_{y} = H_{x}

, due to assumed stationarity.

Next, a bipartition

A B

is considered, and “effective information”

Φ_{eff}

as a function of the particular bipartition is defined as

Φ_{eff} (A B) = I_{x y} - I_{x_{A}, y_{A}} - I_{x_{B}, y_{B}} .

(3)

Finally, “whole minus sum” II denoted as

Φ

is defined as effective information calculated for a specific bipartition

A B^{MIB}

(“minimum information bipartition”) which minimizes specifically normalized effective information:

\begin{matrix} Φ = Φ_{eff} (A B^{MIB}), \end{matrix}

(4a)

\begin{matrix} A B^{MIB} = {argmin}_{A B} [\frac{Φ_{eff} (A B)}{min {H (x_{A}), H (x_{B})}}] . \end{matrix}

(4b)

Note that this definition prohibits positive II, whenever

Φ_{eff}

turns out to be zero or negative for at least one bipartition

A B

.

We compare the result of the “whole minus sum” effective information (3) to the “decoder based” information measure

Φ^{*}

, which is modified from its original formulation of [16] by setting the logarithms base to 2 for consistency:

Φ^{*} (A B) = I_{x y} - I_{x y}^{*} (A B),

(5a)

where

I_{x y}^{*} (A B) = max_{β} [- \sum_{y} p (y) {log}_{2} \sum_{x} p (x) q_{A B} {(y | x)}^{β} + \sum_{x y} p (x y) {log}_{2} q_{A B} {(y | x)}^{β}],

(5b)

q_{A B} (y | x) = p (y_{A} | x_{A}) p (y_{B} | x_{B}) = \frac{p (x_{A} y_{A}) p (x_{B} y_{B})}{p (x_{A}) p (x_{B})} .

(5c)

3. Spiking–Bursting Stochastic Model

Physiologically, spikes are short (about 1 millisecond in duration) pulses of voltage (action potential) across the neuronal membrane. Bursts are rapid sequences of spikes. The main feature of the neuron–astrocyte network model in [19] is the presence of network-wide coordinated bursts, when all neurons are rapidly spiking in the same time window. Such bursts are coordinated by the astrocytic network and occur on the background of weakly correlated spiking activity of individual neurons. The spiking–bursting model was suggested in [19] as the simplest mathematical description of this behavior. In this model, time is discretized into small bins, and neurons are represented by binary digits taking on values 0 or 1, denoting the absence or the presence of at least one spike within the specific time bin. Respectively, a network-wide burst is represented by a time interval during which all neurons are locked at value 1 (which corresponds to a train of rapid spiking in the underlying biological system). The idea behind the model is illustrated by the graphical representation of its typical time evolution, as shown in Figure 1. The graphs of the model dynamics can be seen as envelopes of respective time recordings of membrane voltage in actual neurons: each short rectangular pulse of the model is assumed to correspond to at least one narrow spike of voltage, and a prolonged pulse (several discrete time bins in duration) represents a spike train (burst).

Mathematically, this “spiking–bursting” model is a stochastic model, which produces a binary vector valued, discrete-time stochastic process. In keeping with [19], the model is defined as a combination

M = {V, S}

of a time-correlated dichotomous component V which turns on and off system-wide bursting (that mimics global bursting of a neuronal network, when each neuron produces a train of pulses at a high rate [19]), and a time-uncorrelated component S describing spontaneous (spiking) activity (corresponding to a background random activity in a neural network characterized by relatively sparse random appearance of neuronal pulses—spikes [19]) occurring in the absence of a burst. The model mimics the spiking–bursting type of activity which occurs in a neuro-astrocytic network, where the neural subsystem normally exhibits time-uncorrelated patterns of spiking activity, and all neurons are under the common influence of the astrocytic subsystem, which is modeled by the dichotomous component V and sporadically induces simultaneous bursting in all neurons. A similar network architecture with a “master node” spreading its influence on subordinated nodes was considered, for example, in [1] (Figure 4b therein).

The model is defined as follows. At each instance of (discrete) time the state of the dichotomous component can be either “bursting” with probability

p_{b}

, or “spontaneous” (or “spiking”) with probability

p_{s} = 1 - p_{b}

. While in the bursting mode, the instantaneous state of the resulting process

x = ξ (t)

is given by all ones:

x = 11 . . 1

(further abbreviated as

x = 1

). In cases of spiking, the state x is a (time-uncorrelated) random variate, which is described by a discrete probability distribution

s_{x}

(where an occurrence of “1” in any bit is referred to as a “spike”), so that the resulting one-time state probabilities read

\begin{matrix} p (x \neq 1) & = p_{s} s_{x}, \end{matrix}

(6a)

\begin{matrix} p (x = 1) & = p_{1}, p_{1} = p_{s} s_{1} + p_{b}, \end{matrix}

(6b)

where

s_{1}

is the probability of spontaneous occurrence of

x = 1

(hereafter referred to as a system-wide simultaneous spike) in the absence of a burst. (In a real network, “simultaneous” implies occurring within the same time discretization bin [19]).

To describe two-time joint probabilities for

x = ξ (t)

and

y = ξ (t + τ)

, consider a joint state

x y

which is a concatenation of bits in x and y. The spontaneous activity is assumed to be uncorrelated in time, which leads to the factorization

s_{x y} = s_{x} s_{y} .

(7)

The time correlations of the dichotomous component are described by a

2 \times 2

matrix

p_{q \in {s, b}, r \in {s, b}} = (\begin{matrix} p_{s s} & p_{s b} \\ p_{b s} & p_{b b} \end{matrix})

(8)

whose components are joint probabilities to observe the respective spiking (index “s”) and/or bursting (index “b”) states in x and y. (In a neural network these correlations are conditioned by burst duration [19]; e.g., if this (in general, random) duration mostly exceeds

τ

, then the correlation is positive.) The probabilities obey

p_{s b} = p_{b s}

(due to stationarity),

p_{b} = p_{b b} + p_{s b}

,

p_{s} = p_{s s} + p_{s b}

, thereby allowing one to express all one- and two-time probabilities describing the dichotomous component in terms of two independent quantities, which for example, can be a pair

{p_{s}, p_{s s}}

; then

\begin{matrix} p_{s b} & = p_{b s} = p_{s} - p_{s s}, \end{matrix}

(9a)

\begin{matrix} p_{b b} & = 1 - (p_{s s} + 2 p_{s b}), \end{matrix}

(9b)

or

{p_{b}, ρ}

as in [19], where

ρ

is the Pearson correlation coefficient defined by

p_{s b} = p_{s} p_{b} (1 - ρ) .

(10)

In Section 4 we justify the use of another effective parameter

ϵ

(13) instead of

ρ

to determine time correlations in the dichotomous component.

The two-time joint probabilities for the resulting process are then expressed as

\begin{matrix} \begin{matrix} p (x \neq 1, y \neq 1) & = p_{s s} s_{x} s_{y}, \\ p (x \neq 1, y = 1) & = π s_{x}, p (x = 1, y \neq 1) = π s_{y}, \\ p (x = 1, y = 1) & = p_{11}, \end{matrix} \end{matrix}

(11a)

\begin{matrix} π = p_{s s} s_{1} + p_{s b}, p_{11} = p_{s s} s_{1}^{2} + 2 p_{s b} s_{1} + p_{b b} . \end{matrix}

(11b)

Note that the above notation can be applied to any subsystem instead of the whole system (with the same dichotomous component, as it is system-wide anyway).

The mentioned probabilities can be interpreted in terms of the underlying biological system as follows (see details in [19]):

p_{b}

is the probability of observing the astrocytic subsystem in the excited (high calcium concentration) state, which induces global bursting activity in all neurons, within a specific time discretization bin;

p_{b b}

is the probability of observing the mentioned state in two time bins separated by the time lag

τ

, and

ρ

is the respective time-delayed Pearson correlation coefficient of the astrocytic activity;

s_{x}

is the probability of observing a specific spatial pattern of spiking x within one time bin in spontaneous neuronal activity (in the absence of an astrocyte-induced burst), and in particular

s_{1}

is the probability that all neurons fire a spike within one time bin in spontaneous activity. In this sense

s_{1}

measures the overall strength of spontaneous activity of the neuronal subsystem. When spiking activity is independent across neurons, the set of parameters

{s_{1}, p_{b}, ρ}

fully determines the “whole minus sum” II in the spiking–bursting model. In [19] these parameters were fitted to match (in the least-squares sense) the two-time probability distribution (11) to the respective “empirical” (numerically obtained) probabilities for the biologically realistic model of the neuron–astrocyte network. This fitting produced the dependence of the spiking–bursting parameters

{s_{1}, p_{b}, ρ}

upon the biological parameters; see Figure 7 in [19].

4. Model Parameter Scaling

The spiking–bursting stochastic model, as described in Section 3, is redundant in the following sense. In terms of the model definition, there are two distinct states of the model which equally lead to observing the same one-time state of the resultant process with 1s in all bits: firstly—a burst, and secondly—a system-wide simultaneous spike in the absence of a burst, which are indistinguishable by one-time observations. Two-time observations reveal a difference between system-wide spikes on one hand and bursts on the other, because the latter are assumed to be correlated in time, unlike the former. That said, the “labeling” of bursts versus system-wide spikes exists in the model (by the state of the dichotomous component), but not in the realizations. Proceeding from the realizations, it must be possible to relabel a certain fraction of system-wide spikes into bursts (more precisely, into a time-uncorrelated portion thereof). Such relabeling would change both components of the model

{V, S}

(dichotomous and spiking processes), in particular, diluting the time correlations of bursts, without changing the actual realizations of the resultant process. This implies the existence of a transformation of model parameters which keeps realizations (i.e., the stochastic process as such) invariant. The derivation of this transformation is presented in Appendix A and leads to the following scaling.

\begin{matrix} s_{x \neq 1} & = α s_{x \neq 1}^{'}, \end{matrix}

(12a)

\begin{matrix} 1 - s_{1} & = α (1 - s_{1}^{'}), \end{matrix}

(12b)

\begin{matrix} p_{s^{'}} & = α p_{s}, \end{matrix}

(12c)

\begin{matrix} p_{s^{'} s^{'}} & = α^{2} p_{s s}, \end{matrix}

(12d)

where

α

is a positive scaling parameter, and all other probabilities are updated according to Equation (9).

The mentioned invariance in particular implies that any characteristic of the process must be invariant to the scaling (12a–d). This suggests a natural choice of a scaling-invariant effective parameter

ϵ

defined by

p_{s s} = p_{s}^{2} (1 + ϵ)

(13)

to determine time correlations in the dichotomous component. In conjunction with a second independent parameter of the dichotomous process, for which a straightforward choice is

p_{s}

, and with full one-time probability table for spontaneous activity

s_{x}

, these constitute a natural full set of model parameters

{s_{x}, p_{s}, ϵ}

.

The two-time probability table (8) can be expressed in terms of

p_{s}

and

ϵ

by substituting Equation (13) into Equation (9):

p_{q \in {s, b}, r \in {s, b}} = (\begin{matrix} p_{s}^{2} + ϵ p_{s}^{2} & p_{s} p_{b} - ϵ p_{s}^{2} \\ p_{s} p_{b} - ϵ p_{s}^{2} & p_{b}^{2} + ϵ p_{s}^{2} \end{matrix}) .

(14)

The requirement of non-negativeness of probabilities imposes simultaneous constraints

\begin{matrix} ϵ \geq - 1 \end{matrix}

(15a)

and

\begin{matrix} p_{s} \leq p_{s max} = \{\begin{matrix} \frac{1}{1 + ϵ} (1 - \sqrt{| ϵ |}) & if - 1 \leq ϵ < 0, \\ \frac{1}{1 + ϵ} & if ϵ \geq 0, \end{matrix} \end{matrix}

(15b)

or equivalently,

- ϵ_{max}^{2} \leq ϵ \leq ϵ_{max} = \frac{p_{b}}{p_{s}} .

(16)

Comparing the off-diagonal term

p_{s b}

in (14) to the definition of the Pearson’s correlation coefficient

ρ

in (10), we get

ϵ = ρ \frac{p_{b}}{p_{s}} = ρ ϵ_{max};

(17)

thus, the sign of

ϵ

has the same meaning as that of

ρ

. Hereinafter we limit ourselves to non-negative correlations

ϵ \geq 0

.

5. Analysis of the Empirical “Whole Minus Sum” Measure for the Spiking–Bursting Process

In this Section we analyze the behavior of the “whole minus sum” empirical II [13] defined by Equations (3) and (4) for the spiking–bursting model in dependence of the model parameters, particularly focusing on its transition from negative to positive values.

5.1. Expressing the “Whole Minus Sum” Information

Mutual information

I_{x y}

for two time instances x and y of the spiking–bursting process is expressed by inserting all one- and two-time probabilities of the process according to (6), (11) into the definition (1), (2). The full derivation is given in Appendix B and leads to an expression which was used in [19]

I_{x y} = 2 (1 - s_{1}) {p_{s}} + 2 {p_{1}} - {(1 - s_{1})}^{2} {p_{s s}} - 2 (1 - s_{1}) {π} - {p_{11}},

(18)

where we denote for compactness

{q} = - q {log}_{2} q .

(19)

We exclude from further consideration the following degenerate cases which automatically give

I_{x y} = 0

by definition (1):

s_{1} = 1, or p_{s} = 0, or p_{s} = 1, or ρ = ϵ = 0,

(20)

where the former two correspond to a deterministic “always 1” state for which all entropies in (1) are zero, and the latter two produce no predictability, which implies

H_{x y} = H_{x} + H_{y}

.

The particular case

s_{1} = 0

in (18) reduces to

I_{x y} |_{s_{1} = 0} = 2 ({p_{s}} + {p_{b}}) - ({p_{s s}} + 2 {p_{s b}} + {p_{b b}}),

(21a)

which coincides with mutual information for the dichotomous component taken alone and can be seen as a function of just two independent parameters of the dichotomous component, for which we chose

p_{s}

and

ϵ

as suggested in Section 4. Using the expressions for the two-time probabilities (14), we rewrite (21a) in the form

\begin{matrix} I_{x y} |_{s_{1} = 0} & = 2 ({p_{s}} + {p_{b}}) - ({p_{s}^{2} + ϵ p_{s}^{2}} + 2 {p_{s} p_{b} - ϵ p_{s}^{2}} + {p_{b}^{2} + ϵ p_{s}^{2}}), where p_{b} = 1 - p_{s}, \\ = I_{0} (p_{s}, ϵ) . \end{matrix}

(21b)

Expression (21b) explicitly defines a function

I_{0} (p_{s}, ϵ)

, which turns out to be a universal function allowing one to express mutual information (18) and effective information (3) in terms of the model parameters, as we show below. Typical plots of

I_{0} (p_{s}, ϵ)

versus

p_{s}

at several fixed values of

ϵ

are shown with blue solid lines in Figure 2.

The formula (18) can be recovered back from (21a,b) by virtue of the scaling (12a–d), by assuming

s_{1}^{'} = 0

in (21b) and substituting the corresponding scaled value

p_{s^{'}} = (1 - s_{1}) p_{s}

as per (12c) in place of the first argument of function

I_{0} (p_{s^{'}}, ϵ)

defined in (21b), while parameter

ϵ

remains invariant to the scaling. This produces a simplified expression

I_{x y} = I_{0} ((1 - s_{1}) p_{s}, ϵ),

(22)

which is exactly equivalent to (18) for any

s_{1}

. We emphasize that hereinafter expressions containing

I_{0} (\cdot, \cdot)

—(22), (23), (30b), etc.—imply that

p_{s}

in (21b) must be substituted with the actual first argument of

I_{0} (\cdot, \cdot)

, e.g., by

(1 - s_{1}) p_{s}

in (22). The same applies when the approximate expression for

I_{0} (\cdot)

(35) is used.

Given a bipartition

A B

(see Section 2), this result is applicable as well to any subsystem A (B), with

s_{1}

replaced by

s_{A}

(

s_{B}

) which denote the probability of a subsystem-wide simultaneous spike

x_{A} = 1

(

x_{B} = 1

) in the absence of a burst, and with same parameters of the dichotomous component (here

p_{s}

,

ϵ

). Then effective information (3) is expressed as

Φ_{eff} = I_{0} ((1 - s_{1}) p_{s}, ϵ) - I_{0} ((1 - s_{A}) p_{s}, ϵ) - I_{0} ((1 - s_{B}) p_{s}, ϵ) .

(23)

Hereafter in this section we assume the independence of spontaneous activity across the network nodes (neurons), which implies

s_{A} s_{B} = s_{1},

(24)

then (23) turns into

Φ_{eff} = f (s_{A}),

(25a)

where

f (s) = I_{0} ((1 - s_{1}) p_{s}, ϵ) - I_{0} ((1 - s) p_{s}, ϵ) - I_{0} ((1 - s_{1} / s) p_{s}, ϵ) .

(25b)

Essentially, according to (25a,b), the function

f (s)

shows the dependence of effective information

Φ_{eff}

upon the choice of the bipartition, which is characterized by the value of

s_{A} = s

(if A is any non-empty subsystem, then

s_{A}

is defined as the probability of spontaneous occurrence of 1s in all bits in A in the same instance of the discrete time), while the function parameter

s_{1}

determines the intensity of spontaneous spiking activity. Note that the function

I_{0} (\cdot, \cdot)

in (21b) is defined only when the first argument is in the range

(0, 1)

; thus, the definition domain of

f (s)

in (25b) is

s_{1} < s < 1 .

(26)

5.2. Determining the Sign of the “Whole Minus Sum” Information

According to (4), the necessary and sufficient condition for the “whole minus sum” empirical II to be positive is the requirement that

Φ_{eff}

be positive for any bipartition

A B

. Due to (25a,b), this requirement can be written in the form

min_{s \in {s_{A}}} f (s) > 0,

(27)

where

{s_{A}}

is the set of

s_{A}

values for all possible bipartitions

A B

.

Expanding the set of s in (27) to the whole definition domain of

f (s)

(26) leads to a sufficient (generally, stronger) condition for positive II

f (s) > 0 for all s \in (s_{1}, 1) .

(28)

Note that

f (s)

by definition (25b) satisfies

f (s = s_{1}) = f (s = 1) = 0

,

f^{'} (s = s_{1}) > 0

and (due to the invariance to mutual renaming of subsystems A and B)

f (s_{1} / s) = f (s)

. (All mentioned properties and subsequent reasoning can be observed in Figure 3, which shows a few sample plots of

f (s)

). The latter symmetry implies that the quantity of extrema of

f (s)

on

s \in (s_{1}, 1)

must be odd, one of them always being at

s = \sqrt{s_{1}}

. If the latter is the only extremum, then it is a positive maximum, and (28) is thus fulfilled automatically. In case of three extrema,

f (\sqrt{s_{1}})

is a minimum, which can change sign. In both these cases the condition (28) is equivalent to the requirement

f (\sqrt{s_{1}}) > 0,

(29)

which can be rewritten as

g (s_{1}) > 0,

(30a)

where

g (s_{1}) = f (\sqrt{s_{1}}) = I_{0} ((1 - s_{1}) p_{s}, ϵ) - 2 I_{0} ((1 - \sqrt{s_{1}}) p_{s}, ϵ) .

(30b)

The reasoning above essentially reduces the problem of determining the sign of II to determining the sign of the extremum

f (\sqrt{s_{1}})

.

The equivalence of (29) to (28) could be broken if

f (s)

had five or more extrema. As suggested by the numerical calculation on a grid of

p_{s} \in [0.01, 0.99]

and

ρ \in [0.01, 1]

, both with step 0.01, this exception never holds, although we did not prove this rigorously. Based on the reasoning above, in the following we assume the equivalence of (29) (and (30)) to (28).

A typical scenario of transformations of

f (s)

with the change of

s_{1}

is shown in Figure 3. Here the extremum

f (\sqrt{s_{1}})

(shown with a dot) transforms with the decrease of

s_{1}

from a positive maximum into a minimum, which in turn decreases from positive through zero into negative values.

Note that by construction, the function

g (s_{1})

defined in (30b) expresses effective information

Φ_{eff}

from (3) for the hypothetic bipartition characterized by

s_{A} = s_{B} = \sqrt{s_{1}}

, which may or may not exist in the actual particular system. If such “symmetric” bipartition exists, then the value

s_{A} = \sqrt{s_{1}}

belongs to the set

{s_{A}}

in (27), which implies that (29) (same as (30)) is equivalent not only to (28), but also to the necessary and sufficient condition (27). Otherwise, (28) (equivalently, (29) or (30)), formally being only sufficient, still may produce a good estimate of the necessary and sufficient condition in cases when

{s_{A}}

contains values which are close to

\sqrt{s_{1}}

(corresponding to nearly symmetric partitions, if such exist).

Except for the degenerate cases (20),

g (s_{1})

is negative at

s_{1} = 0

g (s_{1} = 0) = - I_{0} (p_{s}, ϵ) < 0

(31)

and has a limit

g (s_{1} \to 1 - 0) \to + 0

(

- 0

and

+ 0

denote the left and right one-sided limits), because

lim_{s_{1} \to 1 - 0} \frac{I_{0} ((1 - s_{1}) p_{s}, ϵ)}{2 I_{0} ((1 - \sqrt{s_{1}}) p_{s}, ϵ)} = 2;

(32)

hence,

g (s_{1})

changes sign at least once on

s_{1} \in (0, 1)

. According to numerical evidence, we assume that

g (s_{1})

changes sign exactly once on

(0, 1)

without providing a rigorous proof for the latter statement (it was confirmed up to machine precision for each combination of

p_{s} \in [0.01, 0.99]

and

ρ \in [0.01, 1]

, both with step 0.01; also note that for the asymptotic case (38) this statement is rigorous). In line with the above, the solution to (30a) has the form

s_{1}^{min} (p_{s}, ϵ) < s_{1} < 1,

(33)

where

s_{1}^{min} (p_{s}, ϵ)

is the unique root of

g (s_{1})

on

(0, 1)

. Several plots of

s_{1}^{min} (p_{s}, ϵ)

versus

p_{s}

at

ϵ

fixed and versus

ϵ

at

p_{s}

fixed, which are obtained by numerically solving for the zero of

g (s_{1})

, are shown in Figure 4 with blue solid lines.

This result identifies a region in the parameter space of the model, where the “whole minus sum” information is positive. From the viewpoint of the underlying biological system, the quantity

s_{1}^{min}

determines the minimal sufficient intensity of spontaneous neuronal spiking activity for positive II. According to the result in Figure 4, within the assumption of independent spiking across the network (24), values

s_{1} ≳ 0.17

lead to positive II regardless of other parameter values, and this threshold decreases further when

p_{s}

is increased, which implies decreasing the frequency of occurrence of astrocyte-activated global bursting

p_{b} = 1 - p_{s}

.

5.3. Asymptotics for Weak Correlations in Time

Further insight into the dependence of mutual information

I_{x y}

(and, consequently, of

Φ_{eff}

and II) upon parameters can be obtained by expanding the definition of

I_{0} (p_{s}, ϵ)

in (21b) in powers of

ϵ

(limit of weak correlations in time), which yields

I_{0} (p_{s}, ϵ) = \frac{1}{2 log 2} {(\frac{p_{s}}{1 - p_{s}})}^{2} \cdot ϵ^{2} + O (ϵ^{3}) .

(34)

Estimating the residual term (see details in Appendix C) indicates that the approximation by the leading term

I_{0} (p_{s}, ϵ) \approx \frac{ϵ^{2}}{2 log 2} {(\frac{p_{s}}{1 - p_{s}})}^{2}

(35)

is valid when

\begin{matrix} | ϵ | & ≪ 1, \end{matrix}

(36a)

\begin{matrix} | ϵ | & ≪ {(\frac{p_{b}}{p_{s}})}^{2} = ϵ_{max}^{2} . \end{matrix}

(36b)

Solving (36b) for

p_{s}

rewrites it in the form of an upper bound for

p_{s}

p_{s} < \frac{1}{1 + \sqrt{| ϵ |}}

(36c)

(the use of “≪” sign is not appropriate in (36c), because this inequality does not imply a small ratio between its left-hand and right-hand parts). Note that the inequalities (36b), (36c) are not weaker than the formal upper bounds

ϵ_{max}

in (16) and

p_{s max}

in (15) which arise from the definition of

ϵ

(13) due to the requirement of positive probabilities.

Approximation (35) is plotted in Figure 2 with red dashed lines along with corresponding upper bounds of approximation applicability range (36c) denoted by red dots (note that large

ϵ

violates (36a) anyway, thus in this case (36c) has no effect). Mutual information (35) scales with

ϵ

within range (36) as

ϵ^{2}

and vanishes with

ϵ \to 0

. The same holds for effective information (23). Since the normalizing denominator in (4b) contains one-time entropies which do not depend on

ϵ

at all, this scaling of

Φ_{eff}

does not change the minimum information bipartition, finally implying that II also scales as

ϵ^{2}

. That said, as factor

ϵ^{2}

does not affect the sign of

Φ_{eff}

, the lower bound

s_{1}^{min}

in (33) exists and is determined only by

p_{s}

in this limit.

Substituting the approximation (35) for

I_{0} (\cdot, \cdot)

into the definition of

g (s_{1})

in (30b) after simplifications reduces the equation

g (s_{1}) = 0

to the following (see the comment below Equation (22)):

p_{s} (\sqrt{2} - 1) s_{1} - \sqrt{s_{1}} + (1 - p_{s}) (\sqrt{2} - 1) = 0,

(37)

whose solution in terms of

s_{1}

on

0 < s_{1} < 1

equals

s_{1}^{min}

, according to the reasoning behind Equation (33). Solving (37) as a quadratic equation in terms of

\sqrt{s_{1}}

produces a unique root on

(0, 1)

, which yields

s_{1}^{min} (p_{s}) |_{ϵ \to 0} = {(\frac{1 - \sqrt{1 - 4 p_{s} (1 - p_{s}) {(\sqrt{2} - 1)}^{2}}}{2 p_{s} (\sqrt{2} - 1)})}^{2} .

(38)

Result of (38) is plotted in Figure 4 with red dashed lines: in panel (a) as a function of

p_{s}

, and in panel (b) as horizontal lines whose vertical position is the result of (38), and horizontal span denotes the estimated applicability range (36b) (note that condition (36a) also applies, and becomes stronger than (36b) when

p_{s} < 1 / 2

).

6. Comparison of Integrated Information Measures

In this Section we compare the outcome of two versions of empirical integrated information measures available in the literature, one being the “all-minus-sum” effective information

Φ_{eff}

(3) from [13] which is used elsewhere in this study, and the other “decoder based” information

Φ^{*}

as introduced in [16] and expressed by Equations (5a–c). We calculate both measures by their respective definitions using the one- and two-time probabilities from Equations (6a,b) and (11a–d) for the spiking–bursting model with

N = 6

bits, assuming no spatial correlations among bits in spiking activity, with same spike probability P in each bit. In this case

s_{x} = P^{m (x)} {(1 - P)}^{N - m (x)}, P = s_{1}^{\frac{1}{N}},

(39)

where

m (x)

is the number of ones in the binary word x.

We consider only a symmetric bipartition with subsystems A and B consisting of

N / 2 = 3

bits each. Due to the assumed equal spike probabilities in all bits and in the absence of spatial correlations of spiking, this implies complete equivalence between the subsystems. In particular, in the notations of Section 5 we get

s_{1} = s_{A} s_{B}, s_{A} = s_{B} = \sqrt{s_{1}} .

(40)

This choice of the bipartition is firstly due to the fact that the sign of effective information for this bipartition determines the sign of the resultant “whole minus sum” II (although the actual value of II is determined by the minimal information bipartition, which may be different). This has been established in Section 5 (see reasoning behind Equations (27)–(30) and further on); moreover, the function

g (s_{1})

introduced in Equation (30b) expresses effective information for this particular bipartition

Φ_{eff} (A B) = g (s_{1}),

(41)

thus the analysis of effective information sign in Section 5 applies to this symmetric bipartition.

Moreover, the choice of the symmetric bipartition is consistent with available comparative studies of II measures [18], where it was substantiated by the conceptual requirement that highly asymmetric partitions should be excluded [2], and by the lack of a generally accepted specification of minimum information bipartition; for further discussion, see [18].

We have studied the dependence of the mentioned effective information measures

Φ_{eff}

and

Φ^{*}

upon spiking activity, which is controlled by

s_{1}

, at different fixed values of the parameters

p_{s}

and

ϵ

characterizing the bursting component. Typical dependence of

Φ_{eff}

and

Φ^{*}

upon

s_{1}

, taken at

p_{s} = 0.6

with several values of

ϵ

, is shown in Figure 5, panel (a).

The behavior of the “whole minus sum” effective information

Φ_{eff}

(41) (blue lines in Figure 5) is found to agree with the analytical findings of Section 5:

$Φ_{eff}$ transitions from negative to positive values at a certain threshold value of $s_{1} = s_{1}^{min}$ , which is well approximated by the formula (38) when $ϵ$ is small, as required by (36a,b); the result of Equation (38) is indicated in each panel of Figure 5 by an additional vertical grid line labeled $s_{1}^{min}$ on the abscissae axis—cf. Figure 4;
$Φ_{eff}$ reaches a maximum on the interval $s_{1}^{min} < s_{1} < 1$ and tends to zero (from above) at $s_{1} \to 1$ ;
$Φ_{eff}$ scales with $ϵ$ as $ϵ^{2}$ , when (36a,b) hold.

To verify the scaling observation, we plot the scaled values of both information measures

Φ_{eff} / ϵ^{2}

,

Φ^{*} / ϵ^{2}

in the panels (b)–(d) of Figure 5 for several fixed values of

p_{s}

and

ϵ

. Expectedly, the scaling fails at

p_{s} = 0.7

,

ϵ = 0.4

in panel (d), as (36b) is not fulfilled in this case.

Furthermore, the “decoder based” information

Φ^{*}

(plotted with red lines in Figure 5) behaves mostly the same way, apart from being always non-negative (which was one of key motivations for introducing this measure in [16]). At the same time, the sign transition point

s_{1}^{min}

of the “whole minus sum” information associates with a rapid growth of the “decoder based” information. When

s_{1}

is increased towards 1, the two measures converge. Remarkably, the scaling as

ϵ^{2}

is found to be shared by both effective information measures.

7. Discussion

In general, the spiking–bursting model is completely specified by the combination of a full single-time probability table

s_{x}

(consisting of

2^{N}

probabilities of all possible outcomes, where N is the number of bits) for the time-uncorrelated spontaneous activity, along with two independent parameters (e.g.,

p_{s}

and

ϵ

) for the dichotomous component. This combination is, however, redundant in that it admits a one-parameter scaling (12) which leaves the resultant stochastic process invariant.

Condition (30) was derived assuming that spiking activity in individual bits (i.e., nodes, or neurons) constituting the system is independent among the bits, which implies that the probability table

s_{x}

is fully determined by N spike probabilities for individual nodes. The condition is formulated in terms of

p_{s}

,

ϵ

and a single parameter

s_{1}

(system-wide spike probability) for the spontaneous activity, agnostic of the “internal structure” of the system, i.e., the spike probabilities for individual nodes. This condition provides that the “whole minus sum” effective information is positive for any bipartition, regardless of the mentioned internal structure. Moreover, in the limit (36) of weak correlations in time, the inequality (30a) can be explicitly solved in terms of

s_{1}

, producing the solution (33), (38).

In this way, the inequality (33) together with the asymptotic estimate (38) supplemented by its applicability range (36) specifies the region in the parameter space of the system, where the “whole minus sum” II is positive regardless of the internal system structure (sufficient condition). The internal structure (though still without spike correlations across the system) is taken into account by the necessary and sufficient condition (27) for positive II.

The mentioned conditions were derived under the assumption of absent correlation between spontaneous activity in individual bits (24). If correlation exists and is positive, then

s_{1} > s_{A} s_{B}

, or

s_{B} < s_{1} / s_{A}

. Then comparing the expressions for

Φ_{eff}

(23) (general case) to (25) (space-uncorrelated case), and taking into account that

I_{0} (p_{s})

is an increasing function, we find

Φ_{eff} < f (s_{A})

, cf. (25a). This implies that any necessary condition for positive II remains as such. Likewise, in the case of negative correlations we get

Φ_{eff} > f (s_{A})

, implying that a sufficient condition remains as such.

8. Conclusions

The present study substantiates, refines and quantifies qualitative observations in regard to II in the spiking–bursting model which were initially made in [19]. The existence of lower bounds in spiking activity (characterized by

s_{1}

) required for positive “whole minus sum” II which was noticed in [19] is now expressed in the form of an explicit inequality (33) with the estimate (38) for the bound

s_{1}^{min}

. The observation of [19] that typically

s_{1}^{min}

is mostly determined by burst probability and weakly depends upon time correlations of bursts also becomes supported by the quantitative result (33), (38). In particular, there is a range of spiking activity intensity

s_{1} ≳ 0.17

, where the “whole minus sum” information is positive regardless of other system parameters, provided the spiking activity is spatially uncorrelated or negatively correlated across the system. When the burst probability is decreased (which implies less frequent activation of the astrocyte subsystem), the threshold value for spiking activity

s_{1}^{min}

also decreases.

We found that II scales as

ϵ^{2}

, where

ϵ

is proportional (as per Equation (17)) to the Pearson’s time delayed correlation coefficient of the bursting component (which essentially characterizes the duration of bursts), for

ϵ

small (namely, within (36)), when other parameters (i.e.,

p_{s}

and spiking probability table

s_{x}

) are fixed. For the “whole minus sum” information, this is an analytical result. Note that the reasoning behind this result does not rely upon the assumption of spatial non-correlation of spiking activity (between bits), and thus applies generally to arbitrary spiking–bursting systems. According to a numerical calculation, this scaling approximately holds for the “decoder based” information as well.

Remarkably, II can not exceed the time delayed mutual information for the system as a whole, which in case of the spiking–bursting model in its present formulation is no greater than 1 bit.

The model provides a basis for possible modifications in order to apply integrated information concepts to systems exhibiting similar, but more complicated behavior (in particular, to neuronal [26,27,28,29] and neuron–astrocyte [24,30] networks). Such modifications might incorporate non-trivial spatial patterns in bursting, and causal interactions within and between the spiking and bursting subsystems.

The model can also be of interest as a new discrete-state test bench for different formalizations of integrated information, while available comparative studies of II measures mainly focus on Gaussian autoregressive models [17,18].

Author Contributions

Formal analysis, software, visualization, writing–original draft preparation, O.K.; conceptualization, methodology, validation, writing–review and editing, O.K., S.G. and A.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Ministry of Science and Higher Education of the Russian Federation: analytical studies (Section 4 and Section 5) by project number 075-15-2020-808, numerical studies (Section 6) by project number 0729-2020-0061. S.G. thanks the RFBR (grant number 20-32-70081). A.Z. is thankful for the MRC grant MR/R02524X/1. The APC was funded by the Ministry of Science and Higher Education of the Russian Federation (projects 075-15-2020-808 and 0729-2020-0061 in equal shares).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Derivation of Parameters Scaling of the Spiking–Bursting Model

In order to formalize the reasoning in Section 4, we introduce an auxiliary 3-state process W with set of one-time states

{s^{'}, d, b}

, where

s^{'}

and b are always interpreted as spiking and bursting states in terms of Section 3, and d is another state, which is assumed to produce all bits equal 1 like in a burst, but in a time-uncorrelated manner (which is formalized by Equation (A4) below) like in a system-wide spike. When W is properly defined (by specifying all necessary probabilities, see below) and supplemented with a time-uncorrelated process S as a source of spontaneous activity for the state

s^{'}

, these together constitute a completely defined stochastic model

{W, S}

.

This 3-state based model may be mapped on equivalent (in terms of resultant realizations) 2-state based models as in Section 3 in an ambiguous way, because the state d may be equally interpreted either as a system-wide spike, or as a time-uncorrelated burst, thus producing two different dichotomous processes (which we denote as V and

V^{'}

) for the equivalent spiking–bursting models. The relationship between the states of W, V and

V^{'}

is illustrated by the following diagram.

(A1)

As soon as d-states of W are interpreted in V as (spiking) s-states, the spontaneous activity process S accompanying V has to be supplemented with system-wide spikes whenever

W = d

, in addition to the spontaneous activity process

S^{'}

for

V^{'}

. In order to maintain the absence of time correlations in spontaneous activity (which is essential for the analysis in Section 5), we assume time-uncorrelated choice between

W = s^{'}

and

W = d

when

V = s

(which manifests below in Equation (A4)). Then the difference between the spontaneous components S and

S^{'}

comes down to a difference in the corresponding one-time probability tables

s_{x}

and

s_{x}^{'}

.

In the following, we proceed from the dichotomous process V defined as in Section 3, then define a consistent 3-state process W, and further obtain another dichotomous process

V^{'}

for an equivalent model. Finally, we establish the relation between the corresponding probability tables of spontaneous activity

s_{x}

and

s_{x}^{'}

.

The first dichotomous process V has states denoted by

{s, b}

and is related to W according to the rule

V = s

when

W = s^{'}

or

W = d

, and

V = b

whenever

W = b

(see diagram (A1)). Assume fixed conditional probabilities

\begin{matrix} p (W = s^{'} | V = s) & = α, \end{matrix}

(A2a)

\begin{matrix} p (W = d | V = s) & = β = 1 - α, \end{matrix}

(A2b)

which implies one-time probabilities for W as

p_{s^{'}} = α p_{s}, p_{d} = β p_{s} .

(A3)

The mentioned requirement of time-uncorrelated choice between

W = s^{'}

and

W = d

when

V = s

is expressed by factorized two-time conditional probabilities

\begin{matrix} p (W = s^{'} s^{'} | V = s s) & = α^{2}, \end{matrix}

(A4a)

\begin{matrix} p (W = s^{'} d | V = s s) & = α β = p (W = d s^{'} | V = s s), \end{matrix}

(A4b)

\begin{matrix} p (W = d d | V = s s) & = β^{2} . \end{matrix}

(A4c)

Given the two-time probability table for V (8) along with the conditional probabilities (A2) and (A4), we arrive at a two-time probability table for W

(A5)

Note that (A5) is consistent both with (A3), which is obtained by summation along the rows of (A5), and with (8), which is obtained by summation within the line-separated cell groups in (A5):

\begin{matrix} p_{s s} & \equiv p_{s^{'} s^{'}} + p_{s^{'} d} + p_{d s^{'}} + p_{d d} \end{matrix}

(A6a)

\begin{matrix} p_{s b} & \equiv p_{s^{'} b} + p_{d b} \end{matrix}

(A6b)

\begin{matrix} p_{b s} & \equiv p_{b s^{'}} + p_{b d} \end{matrix}

(A6c)

\begin{matrix} p_{b b} & \equiv p_{b b} . \end{matrix}

(A6d)

Consider the other dichotomous process

V^{'}

with states

{s^{'}, b^{'}}

obtained from W according to the rule

V^{'} = b^{'}

when

W = d

or

W = b

, and

V^{'} = s^{'}

whenever

W = s^{'}

(see diagram (A1)). The two-time probability table for

V^{'}

is obtained by another partitioning of the table (A5)

(A7)

with subsequent summation of cells within groups, which yields

\begin{matrix} p_{s^{'} s^{'}} & = α^{2} p_{s s}, \end{matrix}

(A8a)

\begin{matrix} p_{s^{'} b^{'}} & = α (β p_{s s} + p_{s b}) = p_{b^{'} s^{'}}, \end{matrix}

(A8b)

\begin{matrix} p_{b^{'} b^{'}} & = β^{2} p_{s s} + 2 β p_{s b} + p_{b b} . \end{matrix}

(A8c)

The corresponding one-time probabilities for

V^{'}

read

\begin{matrix} p_{s^{'}} & = α p_{s}, \end{matrix}

(A9a)

\begin{matrix} p_{b^{'}} & = β p_{s} + p_{b} . \end{matrix}

(A9b)

In order to establish the relation between the one-time probability tables of spontaneous activity

s_{x}

and

s_{x}^{'}

, we equate the resultant one-time probabilities of observing a given state x as per (6) for the two equivalent models

{V, S}

and

{V^{'}, S^{'}}

\begin{matrix} p (x \neq 1) & = p_{s} s_{x} = p_{s^{'}} s_{x}^{'}, \end{matrix}

(A10a)

\begin{matrix} p (x = 1) & = p_{s} s_{1} + p_{b} = p_{s^{'}} s_{1}^{'} + p_{b^{'}} . \end{matrix}

(A10b)

Taking into account (A9), we finally get

\begin{matrix} s_{x \neq 1} & = α s_{x \neq 1}^{'}, \end{matrix}

(A11a)

\begin{matrix} 1 - s_{1} & = α (1 - s_{1}^{'}) . \end{matrix}

(A11b)

Equations (A8), (A9) and (A11) fully describe the transformation of the spiking–bursting model which keeps the resultant stochastic process invariant by the construction of the transform. Taking into account that the dichotomous process is fully described by just two independent quantities, e.g.,

p_{s}

and

p_{s s}

, all other probabilities being expressed in terms of these due to normalization and stationarity, the full invariant transformation is uniquely identified by a combination of (A11a,b), (A8a) and (A9a), which together constitute the scaling (12).

Note that parameter

α

within its initial meaning (A2) may take on values in the range

0 < α \leq 1

(case

α = 1

producing the identical transform). That said, in terms of the scaling (12a-d), all values

α > 0

are equally possible, so that mutually inverse values

α = α_{1}

and

α = α_{2} = 1 / α_{1}

produce mutually inverse transforms.

Appendix B. Expressing Mutual Information for the Spiking–Bursting Process

One-time entropy

H_{x}

for the spiking–bursting process is expressed by (2) with probabilities

p (x)

taken from (6):

H_{x} = \sum_{x} {p (x)} = \sum_{x} {p_{s} s_{x}} + {p_{1}} - {p_{s} s_{1}},

(A12)

where the additional terms besides the sum over x account for the specific expression (6b) for

p (x = 1)

. Using the relation

{a b} \equiv a {b} + {a} b,

(A13)

which is derived directly from (19), and collecting similar terms, we arrive at

H_{x} = p_{s} H_{s} - p_{s} {s_{1}} + (1 - s_{1}) {p_{s}} + {p_{1}},

(A14)

where

H_{s}

is the entropy of the spiking component taken alone

H_{s} = \sum_{x} {s_{x}} .

(A15)

Two-time entropy is expressed similarly, by substituting probabilities

p (x y)

from (11) into the definition of entropy and taking into account the special cases with

x = 1

and/or

y = 1

:

\begin{matrix} H_{x y} = \sum_{x y} {p (x y)} = \sum_{x y} {p_{s s} s_{x} s_{y}} & - \sum_{x} {p_{s s} s_{x} s_{1}} + \sum_{x} {π s_{x}} \\ - \sum_{y} {p_{s s} s_{1} s_{y}} + \sum_{y} {π s_{y}} + {p_{s s} s_{1}^{2}} - 2 {π s_{1}} + {p_{11}} . \end{matrix}

(A16)

Further, applying (A13) and using the notation (A15), we find

\begin{matrix} \sum_{x y} {p_{s s} s_{x} s_{y}} & = p_{s s} \sum_{x y} {s_{x} s_{y}} + {p_{s s}} \sum_{x y} s_{x} s_{y} \\ = p_{s s} \cdot 2 H_{s} + {p_{s s}}, \end{matrix}

(A17a)

where we used the reasoning that

\sum_{x y} {s_{x} s_{y}}

is the two-time entropy of the spiking component taken alone, which is (due to the postulated absence of time correlations in it) twice the one-time entropy

H_{s}

(this of course can equally be found by direct calculation). Similarly, we get

\begin{matrix} \sum_{x} {p_{s s} s_{x} s_{1}} & = p_{s s} s_{1} \sum_{x} {s_{x}} + {p_{s s} s_{1}} \sum_{x} s_{x} \\ = p_{s s} s_{1} H_{s} + {p_{s s} s_{1}} \end{matrix}

(A17b)

and exactly the same expression for

\sum_{y} {p_{s s} s_{1} s_{y}}

, and also

\begin{matrix} \sum_{y} {π s_{y}} = \sum_{x} {π s_{x}} & = π \sum_{x} {s_{x}} + {π} \sum_{x} s_{x} \\ = π H_{s} + {π} . \end{matrix}

(A17c)

Substituting (A17a–c) into (A16), using (A13) where applicable, and collecting similar terms with the relation

p_{s s} + π - p_{s s} s_{1} \equiv p_{s}

(A18)

taken into account, we arrive at

H_{x y} = 2 p_{s} H_{s} + {(1 - s_{1})}^{2} {p_{s s}} - 2 p_{s} {s_{1}} + 2 (1 - s_{1}) {π} + {p_{11}} .

(A19)

Finally, the expression (18) for mutual information is obtained by inserting (A14) and (A19) into the definition (1), with stationarity

H_{y} = H_{x}

taken into account.

Appendix C. Expanding I 0 in Powers of ϵ

Taylor series expansion for a function

f (x)

up to the quadratic term reads

f (x_{0} + ξ) = f (x_{0}) + f^{'} (x_{0}) ξ + f^{''} (x_{0}) \frac{ξ^{2}}{2} + R (ξ) .

(A20)

The remainder term

R (ξ)

can be represented in the Lagrange’s form as

R (ξ) = f^{'''} (c) \frac{ξ^{3}}{6},

(A21)

where c is an unknown real quantity between

x_{0}

and

x_{0} + ξ

.

The function

f (x)

can be approximated by omitting

R (ξ)

in (A20) if

R (ξ)

is negligible compared to the quadratic term, for which it is sufficient that

|f^{'''} (c) \frac{ξ^{3}}{6}| ≪ |f^{''} (x_{0}) \frac{ξ^{2}}{2}|

(A22a)

for any c between

x_{0}

and

x_{0} + ξ

, namely, for

c \in \{\begin{matrix} (x_{0}, x_{0} + ξ), & if ξ > 0, \\ (x_{0} - | ξ |, x_{0}), & if ξ < 0 . \end{matrix}

(A22b)

Consider the specific case

f (x) = - x log x, x > 0,

(A23)

for which we get

f^{'} (x) = - log x - 1, f^{''} (x) = - \frac{1}{x}, f^{'''} (x) = \frac{1}{x^{2}} .

(A24)

As long as

f^{'''} (x)

is a falling function for any

x > 0

, fulfilling (A22a) at the left boundary of (A22b) (at

c = x_{0}

if

ξ > 0

, and at

c = x_{0} - | ξ |

if

ξ < 0

) makes sure (A22a) is fulfilled in the whole interval (A22b). Precisely, the requirement is

\begin{matrix} |\frac{1}{x_{0}^{2}} \frac{ξ^{3}}{6}| & ≪ |\frac{1}{x_{0}} \frac{ξ^{2}}{2}|, if ξ > 0, \end{matrix}

(A25a)

\begin{matrix} |\frac{1}{(x_{0} - {| ξ |)}^{2}} \frac{ξ^{3}}{6}| & ≪ |\frac{1}{x_{0}} \frac{ξ^{2}}{2}|, if ξ < 0, \end{matrix}

(A25b)

which in the case

ξ > 0

reduces to

\frac{ξ}{3 x_{0}} ≪ 1,

(A26)

and in the case

ξ < 0

to

\frac{1}{3} Φ (\frac{| ξ |}{x_{0}}) ≪ 1,

(A27a)

where

Φ (ζ) = \frac{ζ}{{(1 - ζ)}^{2}} .

(A27b)

Replacing

Φ (\cdot)

in (A27a) by its linearization

Φ (ζ) \approx ζ

for small

ζ

, we reduce both (A26) and (A27a) to a single condition

| ξ | ≪ 3 x_{0} .

(A28)

We use these considerations to expand in powers of

ϵ

the function

I_{0} (p_{s}, ϵ)

defined in (21) with

p_{s s}

,

p_{s b}

,

p_{b b}

substituted by their expressions in terms of

ϵ

according to (14). We note that the braces notation

{\cdot}

defined in (19) is expressed via the function

f (x)

from (A23) as

{q} = \frac{f (q)}{log 2} .

(A29)

Expanding this way the subexpressions of (21)

\begin{matrix} {p_{s s}} & = {p_{s}^{2} + ϵ p_{s}^{2}}, \end{matrix}

(A30a)

\begin{matrix} {p_{s b}} & = {p_{s} p_{b} - ϵ p_{s}^{2}}, \end{matrix}

(A30b)

\begin{matrix} {p_{b b}} & = {p_{b}^{2} + ϵ p_{s}^{2}}, \end{matrix}

(A30c)

we find by immediate calculation that the zero-order and linear in

ϵ

terms vanish, and the quadratic term yields (35). The condition (A28) has to be applied to all three subexpressions (A30a–c). Omitting the insignificant factor 3 in (A28), we obtain the applicability conditions

\begin{matrix} | ϵ p_{s}^{2} | & ≪ p_{s}^{2}, \end{matrix}

(A31a)

\begin{matrix} | ϵ p_{s}^{2} | & ≪ p_{s} p_{b}, \end{matrix}

(A31b)

\begin{matrix} | ϵ p_{s}^{2} | & ≪ p_{b}^{2}, \end{matrix}

(A31c)

which is equivalent to

\begin{matrix} | ϵ | & ≪ 1, \end{matrix}

(A32a)

\begin{matrix} | ϵ | & ≪ \frac{p_{b}}{p_{s}} = ϵ_{max}, \end{matrix}

(A32b)

\begin{matrix} | ϵ | & ≪ ϵ_{max}^{2}, \end{matrix}

(A32c)

where the notation

ϵ_{max}

from (16) is used. We note that when

ϵ_{max} < 1

, the condition (A32c) is the strongest among (A32a–c); when

ϵ_{max} > 1

, the condition (A32a) is the strongest. Therefore, in both cases (A32b) can be dropped, thus producing (36).

References

Tononi, G. An information integration theory of consciousness. BMC Neurosci. 2004, 5, 42. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Balduzzi, D.; Tononi, G. Integrated information in discrete dynamical systems: Motivation and theoretical framework. PLoS Comput. Biol. 2008, 4, e1000091. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Tononi, G. The integrated information theory of consciousness: An updated account. Arch. Ital. Biol. 2012, 150, 293–329. [Google Scholar] [PubMed]
Oizumi, M.; Albantakis, L.; Tononi, G. From the phenomenology to the mechanisms of consciousness: Integrated information theory 3.0. PLoS Comput. Biol. 2014, 10, e1003588. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Tononi, G. Consciousness as integrated information: A provisional manifesto. Biol. Bull. 2008, 215, 216–242. [Google Scholar] [CrossRef]
Peressini, A. Consciousness as Integrated Information A Provisional Philosophical Critique. J. Conscious. Stud. 2013, 20, 180–206. [Google Scholar]
Tsuchiya, N.; Taguchi, S.; Saigo, H. Using category theory to assess the relationship between consciousness and integrated information theory. Neurosci. Res. 2016, 107, 1–7. [Google Scholar] [CrossRef] [Green Version]
Tononi, G.; Boly, M.; Massimini, M.; Koch, C. Integrated information theory: From consciousness to its physical substrate. Nat. Rev. Neurosci. 2016, 17, 450. [Google Scholar] [CrossRef]
Norman, R.; Tamulis, A. Quantum Entangled Prebiotic Evolutionary Process Analysis as Integrated Information: From the origins of life to the phenomenon of consciousness. J. Comput. Theor. Nanosci. 2017, 14, 2255–2267. [Google Scholar] [CrossRef]
Engel, D.; Malone, T.W. Integrated information as a metric for group interaction. PLoS ONE 2018, 13, e0205335. [Google Scholar] [CrossRef]
Mediano, P.A.M.; Farah, J.C.; Shanahan, M. Integrated Information and Metastability in Systems of Coupled Oscillators. arXiv 2016, arXiv:1606.08313. [Google Scholar]
Toker, D.; Sommer, F.T. Information integration in large brain networks. PLoS Comput. Biol. 2019, 15, e1006807. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Barrett, A.B.; Seth, A.K. Practical measures of integrated information for time-series data. PLoS Comput. Biol. 2011, 7, e1001052. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Griffith, V. A Principled Infotheoretic ϕ-like Measure. arXiv 2014, arXiv:1401.0978. [Google Scholar]
Oizumi, M.; Tsuchiya, N.; Amari, S.i. Unified framework for information integration based on information geometry. Proc. Natl. Acad. Sci. USA 2016, 113, 14817–14822. [Google Scholar] [CrossRef] [Green Version]
Oizumi, M.; Amari, S.i.; Yanagawa, T.; Fujii, N.; Tsuchiya, N. Measuring integrated information from the decoding perspective. PLoS Comput. Biol. 2016, 12, e1004654. [Google Scholar] [CrossRef]
Tegmark, M. Improved measures of integrated information. PLoS Comput. Biol. 2016, 12, e1005123. [Google Scholar] [CrossRef]
Mediano, P.; Seth, A.; Barrett, A. Measuring integrated information: Comparison of candidate measures in theory and simulation. Entropy 2019, 21, 17. [Google Scholar] [CrossRef] [Green Version]
Kanakov, O.; Gordleeva, S.; Ermolaeva, A.; Jalan, S.; Zaikin, A. Astrocyte-induced positive integrated information in neuron-astrocyte ensembles. Phys. Rev. E 2019, 99, 012418. [Google Scholar] [CrossRef] [Green Version]
Araque, A.; Carmignoto, G.; Haydon, P.G.; Oliet, S.H.; Robitaille, R.; Volterra, A. Gliotransmitters Travel in Time and Space. Neuron 2014, 81, 728–739. [Google Scholar] [CrossRef] [Green Version]
Gordleeva, S.Y.; Stasenko, S.V.; Semyanov, A.V.; Dityatev, A.E.; Kazantsev, V.B. Bi-directional astrocytic regulation of neuronal activity within a network. Front. Comput. Neurosci. 2012, 6, 92. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Pankratova, E.V.; Kalyakulina, A.I.; Stasenko, S.V.; Gordleeva, S.Y.; Lazarevich, I.A.; Kazantsev, V.B. Neuronal synchronization enhanced by neuron–astrocyte interaction. Nonlinear Dyn. 2019, 97, 647–662. [Google Scholar] [CrossRef]
Gordleeva, S.Y.; Lebedev, S.A.; Rumyantseva, M.A.; Kazantsev, V.B. Astrocyte as a Detector of Synchronous Events of a Neural Network. JETP Lett. 2018, 107, 440–445. [Google Scholar] [CrossRef]
Gordleeva, S.Y.; Ermolaeva, A.V.; Kastalskiy, I.A.; Kazantsev, V.B. Astrocyte as Spatiotemporal Integrating Detector of Neuronal Activity. Front. Physiol. 2019, 10. [Google Scholar] [CrossRef] [PubMed]
Barrett, A.B. Exploration of synergistic and redundant information sharing in static and dynamical Gaussian systems. Phys. Rev. E 2015, 91, 052802. [Google Scholar] [CrossRef] [Green Version]
Kazantsev, V.B.; Asatryan, S.Y. Bistability induces episodic spike communication by inhibitory neurons in neuronal networks. Phys. Rev. E 2011, 84, 031913. [Google Scholar] [CrossRef]
Esir, P.M.; Gordleeva, S.Y.; Simonov, A.Y.; Pisarchik, A.N.; Kazantsev, V.B. Conduction delays can enhance formation of up and down states in spiking neuronal networks. Phys. Rev. E 2018, 98. [Google Scholar] [CrossRef]
Andreev, A.V.; Ivanchenko, M.V.; Pisarchik, A.N.; Hramov, A.E. Stimulus classification using chimera-like states in a spiking neural network. Chaos Solitons Fractals 2020, 139, 110061. [Google Scholar] [CrossRef]
Lobov, S.A.; Mikhaylov, A.N.; Shamshin, M.; Makarov, V.A.; Kazantsev, V.B. Spatial Properties of STDP in a Self-Learning Spiking Neural Network Enable Controlling a Mobile Robot. Front. Neurosci. 2020, 14. [Google Scholar] [CrossRef]
Makovkin, S.Y.; Shkerin, I.V.; Gordleeva, S.Y.; Ivanchenko, M.V. Astrocyte-induced intermittent synchronization of neurons in a minimal network. Chaos Solitons Fractals 2020, 138, 109951. [Google Scholar] [CrossRef]

Figure 1. Typical time traces of the spiking–bursting model containing

N = 4

nodes (“neurons”) in discrete time. Plots for different neurons are shown with different constant shifts along the ordinate axis. Two bursts (marked) and background uncorrelated spiking dynamics are visible.

Figure 1. Typical time traces of the spiking–bursting model containing

N = 4

nodes (“neurons”) in discrete time. Plots for different neurons are shown with different constant shifts along the ordinate axis. Two bursts (marked) and background uncorrelated spiking dynamics are visible.

Figure 2. Blue solid lines—plots of

I_{0} (p_{s}, ϵ)

versus

p_{s}

varied from 0 to

p_{s max}

as per (15), at

ϵ = 0.01

, 0.1, 0.2, 0.5, 1 (from right to left). Function

I_{0} (p_{s}, ϵ)

is a universal single function of two arguments, which is explicitly expressed in elementary functions in (21b), and allows one to express mutual information (18) and effective information (3) in terms of the model parameters. Red dashed lines—approximation (35). Red dots—upper bounds of approximation applicability range (36c).

Figure 2. Blue solid lines—plots of

I_{0} (p_{s}, ϵ)

versus

p_{s}

varied from 0 to

p_{s max}

as per (15), at

ϵ = 0.01

, 0.1, 0.2, 0.5, 1 (from right to left). Function

I_{0} (p_{s}, ϵ)

is a universal single function of two arguments, which is explicitly expressed in elementary functions in (21b), and allows one to express mutual information (18) and effective information (3) in terms of the model parameters. Red dashed lines—approximation (35). Red dots—upper bounds of approximation applicability range (36c).

Figure 3. Plots of

f (s)

on

s_{1} < s < 1

for several values of

s_{1}

(as indicated) at

p_{s} = 0.7

,

ϵ = 0.1

. According to (25a,b),

f (s)

shows the dependence of effective information

Φ_{eff}

upon the choice of the bipartition

A B

, which is characterized by the value of

s_{A} = s

, while the function parameter

s_{1}

determines the intensity of spontaneous spiking activity. For each value of

s_{1}

, the extremum

(\sqrt{s_{1}}, f (\sqrt{s_{1}}))

is indicated with a dot.

Figure 3. Plots of

f (s)

on

s_{1} < s < 1

for several values of

s_{1}

(as indicated) at

p_{s} = 0.7

,

ϵ = 0.1

. According to (25a,b),

f (s)

shows the dependence of effective information

Φ_{eff}

upon the choice of the bipartition

A B

, which is characterized by the value of

s_{A} = s

, while the function parameter

s_{1}

determines the intensity of spontaneous spiking activity. For each value of

s_{1}

, the extremum

(\sqrt{s_{1}}, f (\sqrt{s_{1}}))

is indicated with a dot.

Figure 4. Graphs of threshold

s_{1}^{min}

determining the minimal sufficient intensity of spontaneous neuronal spiking activity for positive II. (a) Blue solid lines—plots of

s_{1}^{min} (p_{s}, ϵ)

versus

p_{s}

varied from 0 to

p_{s max}

as per (15), at

ϵ = 0.1

, 0.5, 1 (from right to left). Red dashed line—plot of the asymptotic formula (38). (b) Blue solid lines—plots of

s_{1}^{min} (p_{s}, ϵ)

versus

ϵ

varied from 0 to

ϵ_{max}

as per (16), at

p_{s} = 0.5

, 0.6, 0.7 (from top to bottom). Vertical position of red dashed lines is the result of (38), horizontal span denotes the estimated applicability range (36b).

Figure 4. Graphs of threshold

s_{1}^{min}

determining the minimal sufficient intensity of spontaneous neuronal spiking activity for positive II. (a) Blue solid lines—plots of

s_{1}^{min} (p_{s}, ϵ)

versus

p_{s}

varied from 0 to

p_{s max}

as per (15), at

ϵ = 0.1

, 0.5, 1 (from right to left). Red dashed line—plot of the asymptotic formula (38). (b) Blue solid lines—plots of

s_{1}^{min} (p_{s}, ϵ)

versus

ϵ

varied from 0 to

ϵ_{max}

as per (16), at

p_{s} = 0.5

, 0.6, 0.7 (from top to bottom). Vertical position of red dashed lines is the result of (38), horizontal span denotes the estimated applicability range (36b).

Figure 5. Comparison of two versions of empirical effective information for the symmetric bipartition—“whole-minus-sum” measure

Φ_{eff}

(3) from [13] (blue lines) and “decoder based” information

Φ^{*}

(5) from [16] (red lines) versus spiking activity parameter

s_{1}

at various fixed values of the bursting component parameters

p_{s}

(indicated on top of the panels) and

ϵ

(indicated in the legends). Panel (a)—unnormalized values, panels (b–d)—normalized by

ϵ^{2}

. Threshold

s_{1}^{min}

calculated according to (38) is shown in each panel with an additional vertical grid line.

Figure 5. Comparison of two versions of empirical effective information for the symmetric bipartition—“whole-minus-sum” measure

Φ_{eff}

(3) from [13] (blue lines) and “decoder based” information

Φ^{*}

(5) from [16] (red lines) versus spiking activity parameter

s_{1}

at various fixed values of the bursting component parameters

p_{s}

(indicated on top of the panels) and

ϵ

(indicated in the legends). Panel (a)—unnormalized values, panels (b–d)—normalized by

ϵ^{2}

. Threshold

s_{1}^{min}

calculated according to (38) is shown in each panel with an additional vertical grid line.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kanakov, O.; Gordleeva, S.; Zaikin, A. Integrated Information in the Spiking–Bursting Stochastic Model. Entropy 2020, 22, 1334. https://doi.org/10.3390/e22121334

AMA Style

Kanakov O, Gordleeva S, Zaikin A. Integrated Information in the Spiking–Bursting Stochastic Model. Entropy. 2020; 22(12):1334. https://doi.org/10.3390/e22121334

Chicago/Turabian Style

Kanakov, Oleg, Susanna Gordleeva, and Alexey Zaikin. 2020. "Integrated Information in the Spiking–Bursting Stochastic Model" Entropy 22, no. 12: 1334. https://doi.org/10.3390/e22121334

APA Style

Kanakov, O., Gordleeva, S., & Zaikin, A. (2020). Integrated Information in the Spiking–Bursting Stochastic Model. Entropy, 22(12), 1334. https://doi.org/10.3390/e22121334

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Integrated Information in the Spiking–Bursting Stochastic Model

Abstract

1. Introduction

2. Definition of II Measures in Use

3. Spiking–Bursting Stochastic Model

4. Model Parameter Scaling

5. Analysis of the Empirical “Whole Minus Sum” Measure for the Spiking–Bursting Process

5.1. Expressing the “Whole Minus Sum” Information

5.2. Determining the Sign of the “Whole Minus Sum” Information

5.3. Asymptotics for Weak Correlations in Time

6. Comparison of Integrated Information Measures

7. Discussion

8. Conclusions

Author Contributions

Funding

Conflicts of Interest

Appendix A. Derivation of Parameters Scaling of the Spiking–Bursting Model

Appendix B. Expressing Mutual Information for the Spiking–Bursting Process

Appendix C. Expanding I 0 in Powers of ϵ

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI