1. Introduction
Integrated information (II) [
1,
2,
3,
4] is a measure of internal information exchange in complex systems and it has recently attracted a lot of interest, because initially it was proposed to quantify consciousness [
5]. Despite the fact that this initial aim is still a matter of research and debate [
6,
7,
8,
9], the II concept itself is by now a widely acknowledged tool in the field of complex dynamics analysis [
10,
11,
12]. The general concept gave rise to specific “empirical” formalizations of II [
13,
14,
15,
16] aimed at computability from empirical probability distributions based on real data. For a systematic taxonomy of II measures, see [
17], and a comparative study of empirical II measures in application to Gaussian autoregressive network models has been recently done in [
18].
Our recent study [
19] addressed the role of astrocytic regulation of neurotransmission [
20,
21,
22] in generating positive II via small networks of brain cells—neurons and astrocytes. Empirical “whole minus sum” II, as defined in [
13], was calculated in [
19] from the time series produced by a biologically realistic model of neuro-astrocytic networks. A simplified, analytically tractable stochastic “spiking–bursting” model (in complement to the realistic one) was designed to describe a specific type of activity in neuro-astrocytic networks which manifests itself as a sequence of intermittent system-wide excitations of rapid pulse trains (“bursts”) on the background of random “spiking” activity in the network [
23,
24]. The spiking–bursting model is a discrete-time, discrete-state stochastic process which mimics the main features of this behavior. The model was successfully used in [
19] to produce semi-analytical estimates of II in good agreement with direct computation of II from time series of the biologically realistic network model. We have suggested a possible explanation that a generation of positive II was the reason why mammal brain evolved to develop an astrocyte network to overlap with a network of neurons, but, still, it remained unclear what are the underlying mechanisms driving a complex neural behavior to generate positive II. In this paper we address this challenging question.
The present study aims at creating a theoretical formalism for using the spiking–bursting model of [
19] as an analytically tractable reference point for applying integrated information concepts to systems exhibiting similar bursting behavior (in particular, to other neuron–astrocyte networks). The analytical treatment is based on exact and asymptotic expressions for integrated information in terms of exactly known probability distributions for the spiking–bursting model. The model is constructed as the simplest possible (although essentially non-Gaussian) to reflect the features of neuron–astrocyte network dynamics which lead to generating positive II. We also aim at extending the knowledge of comparative features of different empirical II measures, which are currently available mainly in application to Gaussian autoregressive models [
17,
18], by applying two such measures [
13,
16] to our discrete-state model.
In
Section 2 and
Section 3 we specify the definitions of the II measures used and the model. Specific properties of the model which lead to redundancy in its parameter set are addressed in
Section 4. In
Section 5 we provide an analytical treatment for the empirical “whole minus sum” [
13] version of II in application to our model. This choice among other empirical II measures is inherited from the preceding study [
19] and is in part due to its easy analytical tractability, and also due to its ability to change sign, which naturally identifies a transition point in the parameter space. This property may be considered a violation of the natural non-negativeness requirement for II [
16]; on the other hand, the sign of the “whole minus sum” information has been given interpretation in terms of “net synergy” [
25] as a degree of redundancy in the evolution of a system [
18]. In this sense this transition may be viewed as a useful marker in its own right in the tool-set of measures for complex dynamics. This motivates our particular focus on identifying the sign transition of the “whole minus sum” information in the parameter space of the model. We also identify a scaling of II with a small parameter which determines time correlations in the bursting (astrocytic) subsystem.
In
Section 6 we compare the outcome of the “whole minus sum” II measure [
13] to that of the “decoder based” measure
, which was specifically designed in [
16] to satisfy the non-negativeness property. We compute
directly by definition from known probability distributions of the model. Despite their inherent difference consisting in changing or not changing sign, the two compared measures are shown to bear similarities in their dependence upon model parameters, including the same scaling with the time correlation parameter.
2. Definition of II Measures in Use
The empirical “whole minus sum” version of II is formulated according to [
13] as follows. Consider a stationary stochastic process
(binary vector process), whose instantaneous state is described by
N binary digits (bits), each identified with a node of the network (neuron). The full set of
N nodes (“system”) can be split into two non-overlapping non-empty subsets (“subsystems”)
A and
B; such a splitting is referred to as bipartition
. Denote by
and
two states of the process separated by a specified time interval
. The states of the subsystems are denoted as
,
,
,
.
Mutual information between
x and
y is defined as
where
is entropy (base 2 logarithm gives result in bits); summation is hereinafter assumed to be taken over the whole range of the index variable (here
x),
, due to assumed stationarity.
Next, a bipartition
is considered, and “effective information”
as a function of the particular bipartition is defined as
Finally, “whole minus sum” II denoted as
is defined as effective information calculated for a specific bipartition
(“minimum information bipartition”) which minimizes specifically normalized effective information:
Note that this definition prohibits positive II, whenever turns out to be zero or negative for at least one bipartition .
We compare the result of the “whole minus sum” effective information (
3) to the “decoder based” information measure
, which is modified from its original formulation of [
16] by setting the logarithms base to 2 for consistency:
where
3. Spiking–Bursting Stochastic Model
Physiologically, spikes are short (about 1 millisecond in duration) pulses of voltage (action potential) across the neuronal membrane. Bursts are rapid sequences of spikes. The main feature of the neuron–astrocyte network model in [
19] is the presence of network-wide coordinated bursts, when all neurons are rapidly spiking in the same time window. Such bursts are coordinated by the astrocytic network and occur on the background of weakly correlated spiking activity of individual neurons. The spiking–bursting model was suggested in [
19] as the simplest mathematical description of this behavior. In this model, time is discretized into small bins, and neurons are represented by binary digits taking on values 0 or 1, denoting the absence or the presence of at least one spike within the specific time bin. Respectively, a network-wide burst is represented by a time interval during which all neurons are locked at value 1 (which corresponds to a train of rapid spiking in the underlying biological system). The idea behind the model is illustrated by the graphical representation of its typical time evolution, as shown in
Figure 1. The graphs of the model dynamics can be seen as envelopes of respective time recordings of membrane voltage in actual neurons: each short rectangular pulse of the model is assumed to correspond to at least one narrow spike of voltage, and a prolonged pulse (several discrete time bins in duration) represents a spike train (burst).
Mathematically, this “spiking–bursting” model is a stochastic model, which produces a binary vector valued, discrete-time stochastic process. In keeping with [
19], the model is defined as a combination
of a time-correlated dichotomous component
V which turns on and off system-wide bursting (that mimics global bursting of a neuronal network, when each neuron produces a train of pulses at a high rate [
19]), and a time-uncorrelated component
S describing spontaneous (spiking) activity (corresponding to a background random activity in a neural network characterized by relatively sparse random appearance of neuronal pulses—spikes [
19]) occurring in the absence of a burst. The model mimics the spiking–bursting type of activity which occurs in a neuro-astrocytic network, where the neural subsystem normally exhibits time-uncorrelated patterns of spiking activity, and all neurons are under the common influence of the astrocytic subsystem, which is modeled by the dichotomous component
V and sporadically induces simultaneous bursting in all neurons. A similar network architecture with a “master node” spreading its influence on subordinated nodes was considered, for example, in [
1] (Figure 4b therein).
The model is defined as follows. At each instance of (discrete) time the state of the dichotomous component can be either “bursting” with probability
, or “spontaneous” (or “spiking”) with probability
. While in the bursting mode, the instantaneous state of the resulting process
is given by all ones:
(further abbreviated as
). In cases of spiking, the state
x is a (time-uncorrelated) random variate, which is described by a discrete probability distribution
(where an occurrence of “1” in any bit is referred to as a “spike”), so that the resulting one-time state probabilities read
where
is the probability of spontaneous occurrence of
(hereafter referred to as a system-wide simultaneous spike) in the absence of a burst. (In a real network, “simultaneous” implies occurring within the same time discretization bin [
19]).
To describe two-time joint probabilities for
and
, consider a joint state
which is a concatenation of bits in
x and
y. The spontaneous activity is assumed to be uncorrelated in time, which leads to the factorization
The time correlations of the dichotomous component are described by a
matrix
whose components are joint probabilities to observe the respective spiking (index “
s”) and/or bursting (index “
b”) states in
x and
y. (In a neural network these correlations are conditioned by burst duration [
19]; e.g., if this (in general, random) duration mostly exceeds
, then the correlation is positive.) The probabilities obey
(due to stationarity),
,
, thereby allowing one to express all one- and two-time probabilities describing the dichotomous component in terms of two independent quantities, which for example, can be a pair
; then
or
as in [
19], where
is the Pearson correlation coefficient defined by
In
Section 4 we justify the use of another effective parameter
(
13) instead of
to determine time correlations in the dichotomous component.
The two-time joint probabilities for the resulting process are then expressed as
Note that the above notation can be applied to any subsystem instead of the whole system (with the same dichotomous component, as it is system-wide anyway).
The mentioned probabilities can be interpreted in terms of the underlying biological system as follows (see details in [
19]):
is the probability of observing the astrocytic subsystem in the excited (high calcium concentration) state, which induces global bursting activity in all neurons, within a specific time discretization bin;
is the probability of observing the mentioned state in two time bins separated by the time lag
, and
is the respective time-delayed Pearson correlation coefficient of the astrocytic activity;
is the probability of observing a specific spatial pattern of spiking
x within one time bin in spontaneous neuronal activity (in the absence of an astrocyte-induced burst), and in particular
is the probability that all neurons fire a spike within one time bin in spontaneous activity. In this sense
measures the overall strength of spontaneous activity of the neuronal subsystem. When spiking activity is independent across neurons, the set of parameters
fully determines the “whole minus sum” II in the spiking–bursting model. In [
19] these parameters were fitted to match (in the least-squares sense) the two-time probability distribution (11) to the respective “empirical” (numerically obtained) probabilities for the biologically realistic model of the neuron–astrocyte network. This fitting produced the dependence of the spiking–bursting parameters
upon the biological parameters; see Figure 7 in [
19].
4. Model Parameter Scaling
The spiking–bursting stochastic model, as described in
Section 3, is redundant in the following sense. In terms of the model definition, there are two distinct states of the model which equally lead to observing the same one-time state of the resultant process with 1s in all bits: firstly—a burst, and secondly—a system-wide simultaneous spike in the absence of a burst, which are indistinguishable by one-time observations. Two-time observations reveal a difference between system-wide spikes on one hand and bursts on the other, because the latter are assumed to be correlated in time, unlike the former. That said, the “labeling” of bursts versus system-wide spikes exists in the model (by the state of the dichotomous component), but not in the realizations. Proceeding from the realizations, it must be possible to relabel a certain fraction of system-wide spikes into bursts (more precisely, into a time-uncorrelated portion thereof). Such relabeling would change both components of the model
(dichotomous and spiking processes), in particular, diluting the time correlations of bursts, without changing the actual realizations of the resultant process. This implies the existence of a transformation of model parameters which keeps realizations (i.e., the stochastic process as such) invariant. The derivation of this transformation is presented in
Appendix A and leads to the following scaling.
where
is a positive scaling parameter, and all other probabilities are updated according to Equation (9).
The mentioned invariance in particular implies that any characteristic of the process must be invariant to the scaling (12a–d). This suggests a natural choice of a scaling-invariant effective parameter
defined by
to determine time correlations in the dichotomous component. In conjunction with a second independent parameter of the dichotomous process, for which a straightforward choice is
, and with full one-time probability table for spontaneous activity
, these constitute a natural full set of model parameters
.
The two-time probability table (
8) can be expressed in terms of
and
by substituting Equation (
13) into Equation (9):
The requirement of non-negativeness of probabilities imposes simultaneous constraints
and
or equivalently,
Comparing the off-diagonal term
in (
14) to the definition of the Pearson’s correlation coefficient
in (
10), we get
thus, the sign of
has the same meaning as that of
. Hereinafter we limit ourselves to non-negative correlations
.
6. Comparison of Integrated Information Measures
In this Section we compare the outcome of two versions of empirical integrated information measures available in the literature, one being the “all-minus-sum” effective information
(
3) from [
13] which is used elsewhere in this study, and the other “decoder based” information
as introduced in [
16] and expressed by Equations (
5a–c). We calculate both measures by their respective definitions using the one- and two-time probabilities from Equations (
6a,b) and (
11a–d) for the spiking–bursting model with
bits, assuming no spatial correlations among bits in spiking activity, with same spike probability
P in each bit. In this case
where
is the number of ones in the binary word
x.
We consider only a symmetric bipartition with subsystems
A and
B consisting of
bits each. Due to the assumed equal spike probabilities in all bits and in the absence of spatial correlations of spiking, this implies complete equivalence between the subsystems. In particular, in the notations of
Section 5 we get
This choice of the bipartition is firstly due to the fact that the sign of effective information for this bipartition determines the sign of the resultant “whole minus sum” II (although the actual value of II is determined by the minimal information bipartition, which may be different). This has been established in
Section 5 (see reasoning behind Equations (
27)–(30) and further on); moreover, the function
introduced in Equation (
30b) expresses effective information for this particular bipartition
thus the analysis of effective information sign in
Section 5 applies to this symmetric bipartition.
Moreover, the choice of the symmetric bipartition is consistent with available comparative studies of II measures [
18], where it was substantiated by the conceptual requirement that highly asymmetric partitions should be excluded [
2], and by the lack of a generally accepted specification of minimum information bipartition; for further discussion, see [
18].
We have studied the dependence of the mentioned effective information measures
and
upon spiking activity, which is controlled by
, at different fixed values of the parameters
and
characterizing the bursting component. Typical dependence of
and
upon
, taken at
with several values of
, is shown in
Figure 5, panel (a).
The behavior of the “whole minus sum” effective information
(
41) (blue lines in
Figure 5) is found to agree with the analytical findings of
Section 5:
To verify the scaling observation, we plot the scaled values of both information measures
,
in the panels (b)–(d) of
Figure 5 for several fixed values of
and
. Expectedly, the scaling fails at
,
in panel (d), as (
36b) is not fulfilled in this case.
Furthermore, the “decoder based” information
(plotted with red lines in
Figure 5) behaves mostly the same way, apart from being always non-negative (which was one of key motivations for introducing this measure in [
16]). At the same time, the sign transition point
of the “whole minus sum” information associates with a rapid growth of the “decoder based” information. When
is increased towards 1, the two measures converge. Remarkably, the scaling as
is found to be shared by both effective information measures.
7. Discussion
In general, the spiking–bursting model is completely specified by the combination of a full single-time probability table (consisting of probabilities of all possible outcomes, where N is the number of bits) for the time-uncorrelated spontaneous activity, along with two independent parameters (e.g., and ) for the dichotomous component. This combination is, however, redundant in that it admits a one-parameter scaling (12) which leaves the resultant stochastic process invariant.
Condition (30) was derived assuming that spiking activity in individual bits (i.e., nodes, or neurons) constituting the system is independent among the bits, which implies that the probability table
is fully determined by
N spike probabilities for individual nodes. The condition is formulated in terms of
,
and a single parameter
(system-wide spike probability) for the spontaneous activity, agnostic of the “internal structure” of the system, i.e., the spike probabilities for individual nodes. This condition provides that the “whole minus sum” effective information is positive for any bipartition, regardless of the mentioned internal structure. Moreover, in the limit (36) of weak correlations in time, the inequality (
30a) can be explicitly solved in terms of
, producing the solution (
33), (
38).
In this way, the inequality (
33) together with the asymptotic estimate (
38) supplemented by its applicability range (36) specifies the region in the parameter space of the system, where the “whole minus sum” II is positive regardless of the internal system structure (sufficient condition). The internal structure (though still without spike correlations across the system) is taken into account by the necessary and sufficient condition (
27) for positive II.
The mentioned conditions were derived under the assumption of absent correlation between spontaneous activity in individual bits (
24). If correlation exists and is positive, then
, or
. Then comparing the expressions for
(
23) (general case) to (25) (space-uncorrelated case), and taking into account that
is an increasing function, we find
, cf. (
25a). This implies that any necessary condition for positive II remains as such. Likewise, in the case of negative correlations we get
, implying that a sufficient condition remains as such.
8. Conclusions
The present study substantiates, refines and quantifies qualitative observations in regard to II in the spiking–bursting model which were initially made in [
19]. The existence of lower bounds in spiking activity (characterized by
) required for positive “whole minus sum” II which was noticed in [
19] is now expressed in the form of an explicit inequality (
33) with the estimate (
38) for the bound
. The observation of [
19] that typically
is mostly determined by burst probability and weakly depends upon time correlations of bursts also becomes supported by the quantitative result (
33), (
38). In particular, there is a range of spiking activity intensity
, where the “whole minus sum” information is positive regardless of other system parameters, provided the spiking activity is spatially uncorrelated or negatively correlated across the system. When the burst probability is decreased (which implies less frequent activation of the astrocyte subsystem), the threshold value for spiking activity
also decreases.
We found that II scales as
, where
is proportional (as per Equation (
17)) to the Pearson’s time delayed correlation coefficient of the bursting component (which essentially characterizes the duration of bursts), for
small (namely, within (36)), when other parameters (i.e.,
and spiking probability table
) are fixed. For the “whole minus sum” information, this is an analytical result. Note that the reasoning behind this result does not rely upon the assumption of spatial non-correlation of spiking activity (between bits), and thus applies generally to arbitrary spiking–bursting systems. According to a numerical calculation, this scaling approximately holds for the “decoder based” information as well.
Remarkably, II can not exceed the time delayed mutual information for the system as a whole, which in case of the spiking–bursting model in its present formulation is no greater than 1 bit.
The model provides a basis for possible modifications in order to apply integrated information concepts to systems exhibiting similar, but more complicated behavior (in particular, to neuronal [
26,
27,
28,
29] and neuron–astrocyte [
24,
30] networks). Such modifications might incorporate non-trivial spatial patterns in bursting, and causal interactions within and between the spiking and bursting subsystems.
The model can also be of interest as a new discrete-state test bench for different formalizations of integrated information, while available comparative studies of II measures mainly focus on Gaussian autoregressive models [
17,
18].