Next Article in Journal
Information Theory in Emerging Wireless Communication Systems and Networks
Previous Article in Journal
Learning Causes of Functional Dynamic Targets: Screening and Local Methods
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Partial Information Decomposition for Multivariate Gaussian Systems Based on Information Geometry

School of Mathematics and Statistics, University of Glasgow, Glasgow G12 8QQ, UK
Entropy 2024, 26(7), 542; https://doi.org/10.3390/e26070542
Submission received: 22 May 2024 / Revised: 17 June 2024 / Accepted: 18 June 2024 / Published: 25 June 2024
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

:
There is much interest in the topic of partial information decomposition, both in developing new algorithms and in developing applications. An algorithm, based on standard results from information geometry, was recently proposed by Niu and Quinn (2019). They considered the case of three scalar random variables from an exponential family, including both discrete distributions and a trivariate Gaussian distribution. The purpose of this article is to extend their work to the general case of multivariate Gaussian systems having vector inputs and a vector output. By making use of standard results from information geometry, explicit expressions are derived for the components of the partial information decomposition for this system. These expressions depend on a real-valued parameter which is determined by performing a simple constrained convex optimisation. Furthermore, it is proved that the theoretical properties of non-negativity, self-redundancy, symmetry and monotonicity, which were proposed by Williams and Beer (2010), are valid for the decomposition I ig derived herein. Application of these results to real and simulated data show that the I ig algorithm does produce the results expected when clear expectations are available, although in some scenarios, it can overestimate the level of the synergy and shared information components of the decomposition, and correspondingly underestimate the levels of unique information. Comparisons of the I ig and I dep (Kay and Ince, 2018) methods show that they can both produce very similar results, but interesting differences are provided. The same may be said about comparisons between the I ig and I mmi (Barrett, 2015) methods.

1. Introduction

Williams and Beer [1] introduced a new method for the decomposition of information in a probabilistic system termed partial information decomposition (PID). This allows the joint mutual information between a number of input sources and a target (output) to be decomposed into components which quantify different aspects of the transmitted information in the system. These are the unique information that each source conveys about the target; the shared information that all sources possess about the target; the synergistic information that the sources in combination possess regarding the target. An additional achievement was to prove that the interaction information [2] is actually the difference between the synergy and redundancy in a system. Thus, a positive value for interaction information signifies that there is more synergy than redundancy in the system, while a negative value indicates the opposite. The work by Willliams and Beer has led to many new methods for defining a PID, mainly for discrete probabilistic systems [3,4,5,6,7,8,9,10,11,12,13] spawning a variety of applications [14,15,16,17,18,19].
There has been considerable interest in PID methods for Gaussian systems. The case of static and dynamic Gaussian systems with two scalar sources and a scalar target was considered in [20], which applied the minimum mutual information PID, I mmi . Further insights were developed in [21] regarding synergy. A PID for Gaussian systems based on common surprisal was published in [7]. Barrett’s work [20] was extended to multivariate Gaussian systems with two vector sources and a vector target in [22] using the I dep method which was introduced for discrete systems in [8]. Further work based on the concept of statistical deficiency is reported in [23]. Application of PID for Gaussian systems has been used in a range of applications [18,24,25,26,27,28,29,30]
We focus in particular here on the method proposed by Niu and Quinn [3]. They applied standard results from information geometry [31,32,33] in order to define a PID for three scalar random variables which follow an exponential family distribution, including a trivariate Gaussian distribution.
Here, we extend this work in two ways: (a) we provide general formulae for a PID involving multivariate Gaussian systems which have two vector sources and a vector target by making use of the same standard methods from information geometry as in [3] and (b) we prove that the Williams–Beer properties of non-negativity, self-redundancy, symmetry and monotonicity are valid for this PID. We also provide some illustrations of the resulting algorithm using real and simulated data. The PID developed herein is based on some of the probability models in the same partially ordered lattice on which the I dep algorithm is based. Therefore, we also compare the results obtained with those obtained by using the I dep method. The I ig results are also compared with those obtained using the I mmi algorithm.

2. Methods

2.1. Notation

A generic ‘p’ will be used to denote an absolutely continuous probability density function (pdf), with the arguments of the function signifying which distribution is intended. Bold capital letters are used to denote random vectors, with their realised values appearing in bold lowercase—so that p ( x 1 , x 2 , x 3 ) denotes the joint pdf of the random vectors, X 1 , X 2 , X 3 , while p ( x 1 , x 3 | x 2 ) is the conditional pdf of ( X 1 , X 3 ) given a value for X 2 .
We consider the case where random vectors X 1 , X 2 , X 3 , of dimensions n 1 , n 2 , n 3 , respectively, have partitioned mean vectors equal to zero vectors of lengths n 1 , n 2 , n 3 , respectively, and a conformably partitioned covariance matrix. We stack these random vectors into the random vector Z , so that Z has dimension m = n 1 + n 2 + n 3 , and assume that Z has a positive definite multivariate Gaussian distribution with pdf p ( x 1 , x 2 , x 3 ) , mean vector 0 and covariance matrix given by
Σ = Σ 11 Σ 12 Σ 13 Σ 12 T Σ 22 Σ 23 Σ 13 T Σ 23 T Σ 33 ,
where the covariance matrices Σ 11 , Σ 22 , Σ 33 of X 1 , X 2 , X 3 , respectively, are of sizes n 1 × n 1 , n 2 × n 2 , n 3 × n 3 , and Σ 12 , Σ 13 , Σ 33 are the pairwise cross-covariance matrices between the three vectors X 1 , X 2 , X 3 . We also denote the conformably partitioned precision (or concentration) matrix K by
K = K 11 K 12 K 13 K 12 T K 22 K 23 K 13 T K 23 T K 33 ,
where K = Σ 1 . The pdf of Z is
f ( z | K ) = | K | ( 2 π ) 1 2 m exp ( 1 2 z T K z ) | K | ( 2 π ) 1 2 m exp 1 2 i = 1 3 j = 1 3 i j x i T K i j x j , z R m .

2.2. Some Information Geometry

We now describe some standard results from information geometry [32,33] as applied to zero-mean, partitioned multivariate Gaussian probability distributions. The fact that there is no loss of generality in making this zero-mean assumption will be justified by Lemma 1 in Section 3. The multivariate Gaussian pdf defined in (2) may be written in the form
| K | ( 2 π ) 1 2 m exp ( 1 2 ( x 1 T K 11 x 1 + x 2 T K 22 x 2 + x 3 T K 33 x 3 ) ( x 2 T K 12 T x 1 + x 3 T K 13 T x 1 + x 3 T K 23 T x 2 ) ) ,
which may be written in terms of the Frobenius inner product as
exp ( 1 2 K 11 , x 1 x 1 T + 1 2 K 22 , x 2 x 2 T + 1 2 K 33 , x 3 x 3 T + K 12 , x 1 x 2 T + K 13 , x 1 x 3 T + K 23 , x 2 x 3 T ψ ( θ ) ) ,
where
ψ ( θ ) = 1 2 log | K | ( 2 π ) m .
This is of exponential family form [33] (p. 34) and [34] with natural parameter
θ = { 1 2 K 11 , 1 2 K 22 , 1 2 K 33 , K 12 , K 13 , K 23 } ,
and expectation parameter η = { η i j , i = 1 , 2 , 3 ; j = 1 , 2 , 3 ; i j } , where η i j = E ( x i x j T ) = Σ i j . We note that there is something of a terminal ambiguity here, since a ‘parameter’ is usually a real number. It is convenient to use the more compact notation provided by matrices since this enables all of the elements of a matrix natural parameter to be set to zero simultaneously.
The exponential family distribution in (2) is a dually flat manifold [31], which we denote by M.
We define the following e-flat submanifolds of M:
S 7 = { p M : K 12 = 0 } S 6 = { p M : K 13 = 0 } S 5 = { p M : K 23 = 0 } S 4 = { p M : K 12 = 0 and K 13 = 0 } S 3 = { p M : K 12 = 0 and K 23 = 0 } S 2 = { p M : K 13 = 0 and K 23 = 0 } S 1 = { p M : K 12 = 0 , K 13 = 0 and K 23 = 0 }
which may be conveniently pictured as the partially ordered lattice in Figure 1. The submanifolds S 5 and S 6 are necessary for the definition of the information-geometric PID [3] and the others will be considered in the sequel. Lattices similar to that in Figure 1 appear in [8,35,36] in relation to information decomposition, and in [37] who consider dually flat manifolds on posets. See also [38], and references therein, for the use of a variety of lattices of models in statistical work.
Hierarchical chains of submanifolds were considered in [31] but here the submanifolds are not all in a hierarchical chain due to the presence of two antichains: { S 2 , S 3 , S 4 } and { S 5 , S 6 , S 7 } . There are, however, several useful chains within the lattice. Of particular relevance here are the chains { S 2 , S 5 , M } , { S 2 , S 6 , M } and { S 2 , M } . Application of Amari’s mixed-cut coordinates [31] and calculation of divergences produces measures of mutual information that are of direct relevance in PID (as was noted by [3] for three scalar random variables) in that the equations
I [ X 1 ; X 2 ; X 3 ] = I [ X 1 ; X 3 ] + I [ X 2 ; X 3 | X 1 ] = I [ X 2 ; X 3 ] + I [ X 1 ; X 3 | X 2 ]
are obtained—and they are standard results in information theory based on the chain rule for mutual information [39]. These are nice illustrations of Amari’s method.
We now consider m-projections from the pdf p M to each of the submanifolds, S 1 S 7 [31]. It is easy to find the pdf in each submanifold that is closest to the given pdf in M, p, in terms of Kullback–Leibler (KL) divergence [40], ([Ch. 4]). They are given in Figure 1. We know [34,40] that setting a block of the inverse covariance for a multivariate Gaussian distribution to zero expresses a conditional dependence between the variables involved. For example, consider S 5 . On this submanifold K 23 = 0 and so X 2 and X 3 are conditionally independent given a value for X 1 . Therefore, this pdf, which we denote by p 5 , has the form p 5 ( x 1 , x 2 , x 3 ) = p ( x 2 | x 1 ) p ( x 3 | x 1 ) p ( x 1 ) . On submanifold S 2 , there are two conditional independences and so X 3 and the pair ( X 1 , X 2 ) are independent and the closest pdf in S 2 to the pdf p has the form p 2 ( x 1 , x 2 , x 3 ) = p ( x 1 , x 2 ) p ( x 3 ) .
The probability distributions defined by these information projections could also have been obtained by the method of maximum entropy, subject to constraints on model interactions [31], and they were obtained in this manner in [22] by making use of Gaussian graphical models [34,40].
We now mention important results from information geometry which are crucial for defining a PID [3]. Consider the pdfs p 5 , p 6 , p belonging to the submanifolds S 5 , S 6 , and to manifold M, and the e-geodesic passing through p 5 and p 6 . Then, any pdf on this e-geodesic path is also a zero-mean multivariate Gaussian pdf [41], ([Ch. 1]). We denote such a pdf by p t . It has covariance matrix Σ t , defined by
Σ t 1 = ( 1 t ) Σ 5 1 + t Σ 6 1
provided that Σ t 1 is positive definite. We consider also an m-geodesic from p to p t . Then, by standard results [31,33], this m-geodesic meets the e-geodesic through p 5 and p 6 at a unique pdf p t such that generalized Pythagorean relationships hold in terms of the KL divergence:
D ( p | | p 5 ) = D ( p | | p t ) + D ( p t | | p 5 )
D ( p | | p 6 ) = D ( p | | p t ) + D ( p t | | p 6 )
The pdf p ( t ) minimizes the KL divergence between the pdf p in M and the pdf p t which lies on the e-geodesic which passes through pdfs p 5 and p 6 .

2.3. The Partial Information Decomposition

Williams and Beer [1] introduce a framework called the partial information decomposition (PID) which decomposes the joint mutual information between a target and a set of multiple predictor variables into a series of terms reflecting information which is shared, unique or synergistically available within and between subsets of predictors. The joint mutual information, conditional mutual information and bivariate mutual information are defined as follows.
I [ X 1 , X 2 ; X 3 ] = p ( x 1 , x 2 , x 3 ) log p ( x 1 , x 2 , x 3 ) p ( x 1 , x 2 ) p ( x 3 ) d x 1 d x 2 d x 3 , I [ X 1 ; X 3 | X 2 ] = p ( x 1 , x 2 , x 3 ) log p ( x 1 , x 2 , x 3 ) p ( x 2 ) p ( x 1 , x 2 ) p ( x 2 , x 3 ) d x 1 d x 2 d x 3 , I [ X 1 ; X 3 ] = p ( x 1 , x 3 ) log p ( x 1 , x 3 ) p ( x 1 ) p ( x 3 ) d x 1 d x 3 ,
Here, we focus on the case of two vector sources, X 1 , X 2 , and a vector target X 3 . Adapting the notation of [42], we express the joint mutual information in four terms as follows:
Unq 1 I u n q [ X 1 ; X 3 | X 2 ] denotes the unique information that X 1 conveys about  X 3 ;
Unq 2 I u n q [ X 2 ; X 3 | X 1 ] is the unique information that X 2 conveys about X 3 ;
Shd I s h d [ X 1 , X 2 ; X 3 ] gives the common (or redundant or shared) information that both X 1 and X 2 have about X 3 ;
Syn I s y n [ X 1 , X 2 ; X 3 ] is the synergy or information that the joint vector ( X 1 , X 2 ) has about X 3 that cannot be obtained by observing X 1 and X 2 separately.
It is possible to make deductions about a PID by using the following four equations which give a link between the components of a PID and certain classical Shannon measures of mutual information. The following are from ([42] Equations (4) and (5)), with amended notation; see also [1].
I [ X 1 ; X 3 ] = Unq 1 + Shd ,
I [ X 2 ; X 3 ] = Unq 2 + Shd ,
I [ X 1 ; X 3 | X 2 ] = Unq 1 + Syn ,
I [ X 2 ; X 3 | X 1 ] = Unq 2 + Syn .
Also, the joint mutual information may be written as
I [ X 1 , X 2 ; X 3 ] = Syn + Unq 1 + Unq 2 + Shd .
The Equations (6)–(9) are of rank 3 and so it is necessary to provide a value for any one of the components, and then the remaining terms can be easily calculated. The initial formulation of [1] was based on quantifying the shared information and deriving the other quantities, but others have focussed on quantifying unique information or synergy directly [4,5,8]. Also, the following form [16] of the interaction information [2] will be useful. It was shown [1] to be equal to the difference in Syn—Shd.
I I [ X 1 ; X 2 ; X 3 ] = I [ X 1 ; X 2 | X 3 ] I [ X 1 ; X 2 ] .

3. Results

3.1. A PID for Gaussian Vector Sources and a Gaussian Vector Target

We now apply the results from the previous two sections in order to derive a partial information decomposition by making use of the method defined in [3]. The following lemma will confirm that without any loss of generality, we may assume, for all of the multivariate normal distributions considered herein, that the mean vector can be taken to be 0 and the covariance matrix of Z , defined on R m , where m = n 1 + n 2 + n 3 , can have the form
Σ = I n 1 P Q P T I n 2 R Q T R T I n 3 ,
where the matrices P , Q , R are of size n 1 × n 2 , n 1 × n 3 , n 2 × n 3 , respectively, and are the cross-covariance (correlation) matrices between the three pairings of the three random vectors X 1 , X 2 , X 3 , and so
E ( X 1 X 2 T ) = P , E ( X 1 X 3 T ) = Q , E ( X 2 X 3 T ) = R .
The calculation of the partial information coefficients will involve the computation of KL divergences [43] between two multivariate Gaussian distributions associated with two submanifolds in the lattice, defined in Figure 1; see Lemma 1, with proof in Appendix C. These probability distributions will have two features in common: they each have the same partitioned mean vector and also the same variance–covariance matrices for the random vectors X 1 , X 2 and X 3 , but different cross covariance matrices for each pair of the random vectors X 1 , X 2 and X 3 .
Lemma 1.
Consider two multivariate Gaussian pdfs, f 1 and f 2 , which have the same partitioned mean vector, μ = [ μ 1 T , μ 2 T , μ 3 T ] T , and conformably partitioned m × m covariance matrices
Φ = Σ 11 Φ 12 Φ 13 Σ 12 T Σ 22 Φ 23 Φ 13 T Φ 23 T Σ 33 and Λ = Σ 11 Λ 12 Λ 13 Λ 12 T Σ 22 Λ 23 Λ 13 T Λ 23 T Σ 33
respectively, where the diagonal blocks Σ 11 , Σ 22 , a n d Σ 33 are square.
Then, the Kullback–Liebler divergence D ( f 1 | | f 2 ) does not depend on the mean vector μ, nor does it depend directly on the variance–covariance matrices Σ 11 , Σ 22 , Σ 33 . The divergence is equal to
D ( f 1 | | f 2 ) = 1 2 log | Λ 1 | | Φ 1 | m + Tr ( Φ 1 1 Λ 1 ) ,
where
Φ 1 = I n 1 P 12 P 13 P 12 T I n 2 P 23 P 13 T P 23 T I n 3 and Λ 1 = I n 1 L 12 L 13 L 12 T I n 2 L 23 L 13 T L 23 T I n 3
with
P i j = Σ i i 1 2 Φ i j Σ j j 1 2 , L i j = Σ i i 1 2 Λ i j Σ j j 1 2 , ( i , j = 1 , 2 , 3 ; i j ) ,
which are the respective cross-correlation matrices among X 1 , X 2 , X 3 . The KL divergence depends only on these cross-correlation matrices.

3.2. Covariance Matrices

Table 1 gives the covariance matrices corresponding to each of the projected distributions p 1 p 7 on the submanifolds. It is known from Gaussian graphical models [34,40] that the probability distributions associated with submanifolds S 5 and S 6 are defined by setting K 23 = 0 and K 13 = 0 , respectively, in the precision matrix K. These conditions were shown in [22] to be equivalent to the equations R = P T Q and Q = P R , respectively. From Table 1, we see that the covariance matrices for pdfs p 5 and p 6 have the following form.
Σ 5 = I n 1 P Q P T I n 2 P T Q Q T Q P I n 3 , and Σ 6 = I n 1 P P R P T I n 2 R R T P T R T I n 3 .
The following lemma, which is proved in Appendix D, gives some useful results on determinants that will be used in the sequel.
Lemma 2.
The determinants of the matrices Σ 5 , Σ 6 are given by
| Σ 5 | = | I n 2 P T P | | I n 3 Q T Q | and | Σ 6 | = | I n 2 P T P | | I n 3 R T R | .
Also,
Σ 5 = Σ 6 Q = 0 and R = 0 I ( X 1 , X 2 ; X 3 ) = 0 .

3.3. Feasible Values for the Parameter t

From (3), the m-projection from manifold M to the e-geodesic passing through the pdfs p 5 and p 6 meets in general at pdf p t which has covariance matrix Σ t defined by
Σ t 1 = ( 1 t ) Σ 5 1 + t Σ 6 1 ,
and Σ t must be positive definite. Therefore, when finding the optimal pdf p t , we require to constrain the values of the parameter t to be such that Σ t is positive definite. We define the set of feasible values for t as
F = { t R : Σ t is positive definite } .
F is a closed interval in R of the form [ a , 1 + b ] , where a , b > 0 . The interior of F—the open interval [ a , 1 + b ] —is an open convex set. To enable the derivation of explicit results, it is useful to define the matrix Σ t by
Σ t 1 = ( 1 t ) Σ 6 + t Σ 5 .
We also require a feasible value t for t when working with the matrix Σ t , and so we define the set G of feasible values as follows
G = { t R : Σ t is positive definite } .
It turns out that the sets of feasible values F , G are actually the same set, as stated in the following lemma, which is proved in Appendix E, and this fact allows us to infer that Σ t is positive definite when Σ t is.
Lemma 3.
If the parameter t belongs to the closed interval [0,1], then the matrices Σ t and Σ t are both positive definite. Also, the two feasible sets F and G defined above in (16) and (18) are equal.

3.4. A Convex Optimisation Problem

The optimal value t of the parameter t is defined by
t = arg min t F D ( p | | p t )
The following lemma, with proof in Appendix F, provides details of the optimisation required to find t .
Lemma 4.
For t F , we define the real valued function g by g ( t ) = D ( p | | p t ) . Then,
Tr ( Σ t 1 Σ ) = m ,
and
g ( t ) = 1 2 log | I n 2 P T P | 2 | I n 3 Q T Q | | I n 3 R T R | | Σ | 1 2 log | ( 1 t ) Σ 6 + t Σ 5 |
Provided that the joint mutual information is positive, the minimization of g subject to the constraint t F , an open convex set, is a strictly convex problem, and the optimal value t is unique.
The minimum value of g is equal to
1 2 log I n 2 P T P I n 3 Q T Q I n 3 R T R | Σ | + 1 2 log 1 d ( t ) ,
where the determinant d ( t ) is defined by
d ( t ) = I n 3 ( 1 t ) 2 Q T Q t ( 1 t ) ( R T P T Q + Q T P R ) t 2 R T R , ( t F )
Alternatively, the minimum could occur at either endpoint of F.
We now define the PID components.

3.5. Definition of the PID Components

Following the proposal in [3], we define the synergy of the system to be
Syn = D ( p | | p t )
and by Lemma 1 and (20) the expression for the synergy is
Syn = 1 2 log | Σ t | | Σ | 1 2 m + 1 2 Tr ( Σ t 1 Σ ) = 1 2 log | Σ t | | Σ | .
Before defining the other PID terms, we require the following lemma, with proof in Appendix G.
Lemma 5.
The trace terms required in the definitions of the unique information are both equal to m:
Tr ( Σ 5 1 Σ t ) = m and Tr ( Σ 6 1 Σ t ) = m .
From (4), we know that
D ( p | | p 5 ) = D ( p | | p t ) + D ( p t | | p 5 )
and we define the unique information in the system that is due to source X 2 to be
Unq 2 = D ( p t | | p 5 )
as in [3]. By (5), we also have that
D ( p | | p 6 ) = D ( p | | p t ) + D ( p t | | p 6 )
and we define the unique information in the system that is due to source X 1 to be
Unq 1 = D ( p t | | p 6 ) .
as in [3]. Finding the optimal point, t , of minimisation of the KL divergence D ( p | | p t ) , and the orthogonality provided by the generalised Pythagorean theorems, define a clear connection between the geometry of the tangent space to manifold M and the definition of the information-geometric PID developed herein.
By using two of the defining equations of a PID (6) and (7), there are two possible expressions for the shared information, Shd, in the system:
I ( X 1 ; X 3 ) Unq 1 or I ( X 2 ; X 3 ) Unq 2 .
Using the result in Lemma 1, we may write the unique information terms as follows. The unique information provided by X 2 is defined to be
Unq 2 = D ( p t | | p 5 ) = 1 2 log | Σ 5 | | Σ t | 1 2 m + 1 2 Tr ( Σ 5 1 Σ t ) = 1 2 log | Σ 5 | | Σ t | .
by Lemma 5.
The unique information provided by X 1 is defined to be
Unq 1 = D ( p t | | p 6 ) = 1 2 log | Σ 6 | | Σ t | 1 2 m + 1 2 Tr ( Σ 6 1 Σ t ) = 1 2 log | Σ 6 | | Σ t | .
by Lemma 5.

3.6. The I ig  PID

Explicit expressions for the PID components are given in Proposition 1, with proof in Appendix H.
Proposition 1.
The partial information decomposition I i g for the zero-mean multivariate Gaussian system defined in (12) has the following components.
Syn = 1 2 log I n 2 P T P I n 3 Q T Q I n 3 R T R | Σ | + 1 2 log 1 d ( t ) Unq 1 = 1 2 log 1 I n 3 Q T Q 1 2 log 1 d ( t ) Unq 2 = 1 2 log 1 I n 3 R T R 1 2 log 1 d ( t ) Shd = 1 2 log 1 d ( t ) .
where the determinant d ( t ) is defined by
d ( t ) = I n 3 ( 1 t ) 2 Q T Q t ( 1 t ) ( R T P T Q + Q T P R ) t 2 R T R , ( t F )
and F is the interval of real values of t for which Σ t is positive definite.
  • The two possible expressions for the shared information in (31) are equal.
Theoretical properties of the I ig PID are presented in Proposition 2, with proof in Appendix I.2.
Proposition 2.
The PID defined in Proposition 1 possesses the Williams–Beer properties of non-negativity, self-redundancy, symmetry and monotonicity.

3.7. Some Examples and Illustrations

Example 1.
Prediction of calcium contents.
This dataset was considered in [22]. The I ig PID developed here, along with the I dep PID [22] and I mmi PID [20], was applied using data on 73 women involving one set of predictors X 1 (Age, Weight, Height), another set of two predictors X 2 (diameter of os calcis, diameter of radius and ulna), and target X 3 (calcium content of heel and forearm). The following results were obtained.
PID t Unq1Unq2ShdSyn
I ig  0.24080.35810.03040.07280.1904
I dep   0.40770.08000.02320.1408
I mmi   0.327700.10320.2209
A plot of the ‘synergy’ function g ( t ) is shown in Figure 2a. All three PIDs indicate the presence of synergy and a large component of unique information due to the variables in X 1 . The I ig PID suggests the transmission of more of the joint mutual information as shared and synergistic information and correspondingly less unique information due to either source vector than does the I dep PID. This is true also for the results from the I mmi PID, but it has higher values for synergistic and shared information and a lower value for Unq1 than those produced by the I ig PID. It was shown in [22] that pdf p in manifold M provides a better fit to these data than any of the submanifold distributions. This pdf contains pairwise cross-correlation between the vectors X 1 and X 3 , and between X 2 and X 3 . Hence, it is no surprise to find that a relatively large Unq1 component. One might also anticipate a large value for Unq2. That this is not the case is explained, at least partly, by the presence of unique information asymmetry, in that the mutual information between X 1 and X 3 (0.4309) is much larger than that between X 2 and X 3 (0.1032) and also bearing in mind the constraints imposed by (6)–(10).
The PIDs were also computed with the same X 1 and X 3 but taking X 2 to be another set of four predictors (surface area, strength of forearm, strength of leg, area of os calcis). The following results were obtained.
PID t Unq1Unq2ShdSyn
I ig 0.00270.35220.00000.07870.0186
I dep   0.37080.01860.06010
I mmi   0.352200.07870.0186
A plot of the ‘synergy’ function g ( t ) is shown in Figure 2b. In this case, the PIDs obtained from all three methods are very similar, with the main component being unique information due to the variables in X 1 . The PIDs indicate almost zero synergy and almost zero unique information due to the variables in X 2 . In [22], it was shown that the best of the pdfs is p 5 associated with submanifold S 5 . If this model were to hold exactly, then a PID must have Syn and Unq2 components that are equal to zero. Therefore, all three PIDs perform very well here, and the fact that the Unq1 component is much larger than the Shd component is due to unique information asymmetry, since the mutual information between X 2 and X 3 is only 0.0787. In this dataset, the I ig PID suggests the transmission just a little more of the joint mutual information as shared and synergistic information and correspondingly less unique information due to either source vector than does the I dep PID. The I ig and I mmi PIDs produce identical results (to 4 d.p.).
When working with real or simulated data, it is important to use the correct covariance matrix. In order to use the results given in Proposition 1, it is essential that the input covariance matrix has the structure of Σ , as given in (12). Further detail is provided in Appendix J.
Example 2.
PID expectations and exact results.
Since there is no way to know the true PID for any given dataset it is useful to consider situations under which some values of the PID components can be predicted, and this approach has been used in developments of the topic. Here, we consider such expectations provided by the pdfs associated with the submanifolds S 3 S 7 , defined in Figure 1. In submanifold S 3 , the source X 2 is independent of both the other source X 1 and the target X 3 . Hence, we expect only unique information due to source X 1 to be transmitted. Submanifold S 4 is similar but we expect only unique information due to source X 2 to be transmitted. In manifold S 5 , X 2 and X 3 are conditionally independent given a value for X 1 . Hence, from (9), we expect the Unq2 and Syn components to be zero. Similarly, for S 6 , we expect the Unq1 and Syn components to be equal to zero, by (8). On submanifold S 7 , the sources X 1 , X 2 are conditionally independent given a value for the target X 3 (which does not mean that the sources are marginally independent). Since the target X 3 interacts with both source vectors, one might expect some shared information as well as unique information from both sources, and also perhaps some synergy. Here, from (11), the interaction information must be negative or zero, and so we can expect to see transmission of more shared information than synergy.
We will examine these expectations by using the following multivariate Gaussian distribution (which was used in [22]). The matrices P , Q , R are given an equi-cross-correlation structure in which all the entries are equal within each matrix:
P = p 1 n 1 1 n 2 T , Q = q 1 n 1 1 n 3 T , R = r 1 n 2 1 n 3 T ,
where p , q , r denote here the constant cross correlations within each matrix and 1 n denotes an n-dimensional vector whose entries are each equal to unity.
The values of ( p , q , r ) are taken to be ( 0.15 , 0.15 , 0.15 ) , with n 1 = 3 , n 2 = 4 , n 3 = 3 . Covariance matrices for pdfs p 3 p 7 were computed using the results in Table 1. Thus, we have the exact covariance matrices which can be fed into the I ig , I dep and I mmi algorithms. The PID results are displayed in Table 2.
From Table 2, we see that all three PIDs meet the expectations exactly for pdfs p 3 p 6 , with only unique information transmitted when the pdfs p 3 , p 4 , are true, respectively, and zero unique for the relevant component and zero synergy when the models p 5 , p 6 are true, respectively. When model p 8 is the true model, we find that the I ig and I dep PIDs produce virtually identical results: the joint mutual information is transmitted almost entirely as synergistic information. The I mmi PID is slightly different, with less unique information transmitted about the variables in X 2 , and more shared and synergistic information transmitted than with the other two PIDs. The PIDs produce very different results for pdf p 7 , although, as expected, they do express more shared information than synergy. When this model is satisfied, I dep sets the synergy to 0, even if there is no compelling reason to support this. This curiosity is mentioned and illustrated in [22]. On the other hand, the I ig PID suggests that each of the four components contributes to the transmission of the joint mutual information, with unique information due to X 2 and shared information making more of a contribution than the other two components. The I mmi PID transmits a higher percentage of the joint information as shared and synergistic information, and a smaller percentage due to the variables in X 2 , than is found with I ig ; these differences are much stronger when comparison is made with the corresponding I dep components. As with model p, it appears that the setting of the Unq1 component in I ig to zero has been translated into its percentage being subtracted from the Unq2 component and added to both the Shd and Syn components in I ig to produce I mmi .
Example 3.
Some simulations.
Taking the same values of p , q , r and n 1 , n 2 , n 3 as in the previous example, a small simulation study was conducted. From each of the pdfs, p 3 p 7 , p , a simple random sample of size 1000 was generated from the 10-dimensional distribution, a covariance matrix estimated from the data and the I ig , I dep and I mmi algorithms were applied. This procedure was repeated 1000 times. In order to make the PID results from the sample of 1000 datasets comparable each PID was normalized by dividing each of its components by the joint mutual information; see (10). A summary of the results is provided in Table 3. We focus here on the comparison of I ig and I dep , and also I ig and I mmi , since I dep has been compared with I mmi for Gaussian systems [22].
  • I ig  vs.  I dep
For pdf p, the I ig and I dep PIDs produce very similar results in terms of both median and range, and the median results are very close indeed to the corresponding exact values in Table 2. For pdf p 7 , the differences between the PID components found in Table 2 persist here although each PID, respectively, produces median values of their components that are close to the exact results in Table 2. For the other four pdfs, there are some small but interesting differences among the results produced by the two PID methods. The I ig method has higher median values for synergy and shared information than for the unique information, when compared against the corresponding exact values in Table 2. In particular, the values of unique information given by I ig are much lower than expected for pdfs p 3 , p 4 , p 6 , and the levels of synergy are larger than expected particularly for pdfs p 3 and p 5 . On the other hand, the I dep PID tends to have larger values for the unique information, and lower values for synergy, especially for datasets generated from pdfs p 3 , p 4 and p 5 . For models p 3 p 6 , I dep has median values of synergy that are closer to the corresponding exact values than those produced by I ig . The suggestion that the I ig method can produce more synergy and shared information than the I dep method, given the same dataset, is supported by the fact that for all the pdfs and all 6000 datasets considered, the I ig method produced greater levels of synergy and shared information and smaller values of the unique information in every dataset. This raises a question of whether such a finding is generally the case and whether there is this type of a systematic difference between the methods. In the case of scalar variables, it is easy to derive general analytic formulae for the I ig PID components and such a systematic difference is present in this case.
  • I ig  vs.  I mmi
The I ig and I mmi PIDs produce similar results for the datasets generated from pdf p, although the I mmi PID suggests the transmission of more shared and synergistic information and less unique information than does I ig . For pdf p 7 , the differences between the PID results are much more dramatic, with the I mmi PID allocating an additional 15% of the joint mutual information to be shared and the synergistic information, and correspondingly 15% less of the unique information. Both methods produce almost identical summary statistics on the datasets generated from pdfs p 3 p 6 . Since the same patterns are present for all four distributions, we discuss the results for pdf p 5 as an exemplar and compare them with the corresponding exact values in Table 2. The results for component Unq1 show that both methods produce an underestimate of approximately 7%, on average, of the joint mutual information. The median values of Unq2 are close to those expected. The underestimates on the Unq1 component are coupled with overestimates, on average, for the shared and synergistic components; they are 2.6% and 4.3%, respectively, with the I ig method, and 3.1% and 4.7%, respectively, with I mmi .
As to be expected with percentage data, the variation in results for each component tends to be larger for values that are not extreme and much smaller for the extreme values. Also, the optimal values of t are shown in Table 3. They were all found to be in the range [0, 1], except for 202 of the datasets generated from pdf p 5 or p 6 .

4. Discussion

For the case of multivariate Gaussian systems with two vector inputs and a vector output, results have been derived using standard theorems from information geometry in order to develop simple, almost exact formulae for the I ig PID, thus extending the scope of the work of [3] on scalar inputs and output. The formulae require one parameter to be determined by a simple, constrained convex optimisation. In addition, it has been proved that this I ig PID algorithm satisfies the desirable theoretical properties of non-negativity, self-redundancy, symmetry and monotonicity, first postulated by Williams and Beer [1]. These results strengthen the confidence that one might have in using the I ig method to separate the joint mutual information in a multivariate Gaussian system into shared, unique and synergistic components. The examples demonstrate that the I ig method is simple to use and a small simulation study reveals that it is fairly robust, although in some of the scenarios considered the I ig method produced more synergy and shared information than expected, and correspondingly less unique information; in some other scenarios, it performed as expected. Comparison of the I ig and I dep algorithms reveal that they can produce exactly the same, or very similar, results in some scenarios, but in other situations, it is clear that the I ig method tends to have larger levels of shared information and synergy, and correspondingly, lower levels of unique information when compared with the results from the I dep method.
For datasets generated from pdfs p 3 p 6 , the PIDs produced using the I ig and I mmi methods are, on average, very similar indeed, and both methods overestimate synergy and shared information and underestimate unique information. The extent of these biases, as a percentage of the joint mutual information, is fairly small, on average, when pdf p 4 or p 6 is the true pdf, but larger, on average, when p 3 or p 5 is the true pdf. When pdf p 7 or p is the true pdf, the I mmi algorithm produces even more shared and synergistic information than obtained with the I ig method. This effect is particularly dramatic in the case of p 7 , where on average with I mmi 82% of the joint mutual information is transmitted as shared or synergistic information, as compared with 51.5% for I ig . It appears that the fact that the I mmi method forces one of the unique informations to be zero leads to an underestimation of the other unique information and an overestimate of both the shared and synergistic information, especially when p 7 or p is the true pdf and both unique information are expected to be non-zero.
While some numerical support is presented here for the hypothesis that there might be a systematic difference in this type between the I ig and I dep methods further research would be required to investigate this possibility. Also, the I ig developed here is a bivariate PID and it would be of interest to explore whether the method could be extended to deal with more than two source vectors.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data used in Example 1 is available from https://github.com/JWKay/PID (accessed on 17 June 2024).

Conflicts of Interest

The author declares no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
d.p.decimal places
I dep  Idep PID for Gaussian systems
I ig  Information-geometric PID
I mmi  Immi for Gaussian systems
KLKullback–Leibler
pdfprobability density function
PIDPartial information decomposition

Appendix A. Two Matrix Lemmas from [22]

Lemma A1.
Suppose that a symmetric matrix M is partitioned as
M = A B B T C ,
where A and C are symmetric and square. Then,
(i)
The matrix M is positive definite if and only if A and C B T A 1 B are positive definite.
(ii)
The matrix M is positive definite if and only if C and A B C 1 B T are positive definite.
(iii)
| M | = | A | | C B T A 1 B | .
Lemma A2.
When the covariance matrix Σ in (12) is positive definite then the following matrices are also positive definite, and hence nonsingular:
I n 2 P T P , I n 1 P P T , I n 3 R T R , I n 2 R R T , I n 3 Q T Q , I n 1 Q Q T .
Also, the determinant of each of these matrices is positive and bounded above by unity, and it is equal to unity if, and only if, the matrix involved is the zero matrix. Furthermore,
I n 1 P P T I n 2 = | I n 2 P T P | .

Appendix B. Some Formulae from [22]

The following equations provide the relevant information-theoretic terms with a slight change of notation: Σ Z is now Σ , the indices on the vectors run from 1 to 3 rather than from 0 to 2, and Y is replaced by X 3 .
I ( X 1 ; X 3 ) = 1 2 log 1 I n 3 Q T Q ,
I ( X 2 ; X 3 ) = 1 2 log 1 I n 3 R T R ,
I ( X 1 ; X 3 | X 2 ) = 1 2 log I n 2 P T P I n 3 R T R | Σ | ,
I ( X 2 ; X 3 | X 1 ) = 1 2 log I n 2 P T P I n 3 Q T Q | Σ |
I ( X 1 , X 2 ; X 3 ) = 1 2 log I n 2 P T P | Σ |
I I ( X 1 ; X 2 ; X 3 ) = I ( X 1 , X 2 ; X 3 ) I ( X 1 ; X 3 ) I ( X 2 ; X 3 )
= 1 2 log I n 2 P T P I n 3 Q T Q I n 3 R T R | Σ |

Appendix C. Proof of Lemma 1

From [43] (p. 189), the KL divergence is
D ( f 1 | | f 2 ) = 1 2 log | Λ | | Φ | m + Tr ( Φ 1 Λ )
since the mean vectors are equal. Each of the diagonal blocks in (13) is positive definite, by Lemma A1, and so possesses a positive definite square root [44] (p. 472). We define the positive definite block diagonal matrix
D = diag Σ 11 1 2 , Σ 22 1 2 , Σ 33 1 2
Hence, we may write
Φ = D I n 1 P 12 P 13 P 12 T I n 2 P 23 P 13 T P 23 T I n 3 D and Λ = D I n 1 L 12 L 13 L 12 T I n 2 L 23 L 13 T L 23 T I n 3 D
where
P 12 = Σ 11 1 2 Σ 12 Σ 22 1 2 , P 13 = Σ 11 1 2 Σ 13 Σ 33 1 2 , P 23 = Σ 22 1 2 Σ 23 Σ 33 1 2 .
and
L 12 = Σ 11 1 2 Λ 12 Σ 22 1 2 , L 13 = Σ 11 1 2 Λ 13 Σ 33 1 2 , L 23 = Σ 22 1 2 Λ 23 Σ 33 1 2 .
Therefore, using standard properties of determinants, we have from
| Φ | = | Σ 11 | | Σ 22 | | Σ 33 | I n 1 P 12 P 13 P 12 T I n 2 P 23 P 13 T P 23 T I n 3 = | Σ 11 | | Σ 22 | | Σ 33 | | Φ 1 | .
Use of a similar argument provides a similar expression for the determinant of Λ :
| Λ | = | Σ 11 | | Σ 22 | | Σ 33 | | Λ 1 | .
It follows that
| Λ | | Φ | = | Λ 1 | | Φ 1 | .
We now consider the trace term. We have from (A8) that
Φ = D Φ 1 D , and Λ = D Λ 1 D
and so
Φ 1 Λ = D 1 Φ 1 1 D 1 D Λ 1 D = D 1 Φ 1 1 Λ D .
Since Tr ( A B C ) = Tr ( C B A ) , for any three conformable matrices, the required result follows.

Appendix D. Proof of Lemma 2

Making use of Lemma A1(iii), we may write the determinant of the covariance matrix Σ as
| Σ | = I n 1 P Q P T I n 2 R Q T R T I n 3 = | A | | I n 3 Q T R T A 1 Q R |
where
A = I n 1 P P T I n 2
Since
A 1 = I n 1 + P ( I n 2 P T P ) 1 P T P ( I n 2 P T P ) 1 ( I n 2 P T P ) 1 P T ( I n 2 P T P ) 1
it can be shown that
Q R T A 1 Q R = Q T Q + ( P T Q R ) T ( I n 2 P T P ) 1 ) ( P T Q R )
From Lemma A1(iii), we have that
| A | = | I n 2 P T P |
and putting these results together gives
| Σ | = | I n 2 P T P | | I n 3 Q T Q ( P T Q R ) T ( I n 2 P T P ) 1 ( P T Q R ) | .
We now apply this result to obtain expressions for the determinants of Σ 5 and Σ 6 , which are defined in Table 1. In Σ 5 , R is replaced by P T Q and making this substitution in (A13) gives
| Σ 5 | = | I n 2 P T P | | I n 2 Q T Q | .
Similarly, replacing Q by P R in (A13), gives the determinant of Σ 6 , after some manipulation, as
| Σ 6 | = | I n 2 P T P | | I n 2 R T R | .
For the final result, we have that
Σ 5 Σ 6 = 0 0 Q P R 0 0 P T Q R Q T R T P T Q T P R T 0
from which it is clear that Q = 0 , R = 0 implies that Σ 5 = Σ 6 . Conversely,
Σ 5 = Σ 6 Q = P R and R = P T Q ( I n 2 P T P ) R = 0 and ( I n 1 P P T ) Q = 0 Q = 0 and R = 0
by Lemma A2. For the last equivalence, we note by (A13) that Q = 0 and R = 0 implies that I ( X 1 , X 2 ; X 3 ) = 0 . Conversely, by (A4) and (A13) and the non-negativity of mutual information, I ( X 1 , X 2 ; X 3 ) = 0 implies that I ( X 1 ; X 3 ) = 0 and I ( X 2 ; X 3 ) = 0 , which imply from (A1) and (A2) and Lemma A2 that Q = 0 and R = 0 .

Appendix E. Proof of Lemma 3

Σ t and Σ t are each positive definite if and only their inverses are positive definite. We work with the inverses. When t = 0 or t = 1 clearly both inverses are positive definite, since the inverses of Σ 5 and Σ 6 are positive definite. Suppose that 0 < t < 1 and consider for Σ t 1 , and for any x R m , the quadratic form
x T ( ( 1 t ) Σ 5 1 + t Σ 6 1 ) x = ( 1 t ) x T Σ 5 1 x + t x T Σ 6 1 x .
This term is non-negative for every x R m since the inverses of Σ 5 and Σ 6 are positive definite and 0 < t < 1 . Clearly, for the same reasons, this quadratic form is equal to zero if and only if x is the zero vector in R m , hence the result. A similar argument shows that Σ t 1 is positive definite when 0 < t < 1 .
For t F , we know that the matrix Σ t 1 = ( 1 t ) Σ 5 1 + t Σ 6 1 is positive definite. The matrix Σ t 1 = ( 1 t ) Σ 6 + t Σ 5 may be written in two ways as
( 1 t ) Σ 6 + t Σ 5 = Σ 6 ( ( 1 t ) Σ 5 1 + t Σ 6 1 ) Σ 5 = Σ 5 ( ( 1 t ) Σ 5 1 + t Σ 6 1 ) Σ 6 .
which is a product of three positive definite matrices. Since the last two expressions are transposes of each other it follows that this product is symmetric. Hence, from a result by E. P. Wigner [45], we deduce that Σ t 1 is positive definite, so that t G . A similar argument shows that t G t F . Thus, we have proved that the feasible sets F and G are equal.

Appendix F. Proof of Lemma 4

For the first part, we consider (3):
Σ t 1 Σ = ( 1 t ) Σ 5 1 Σ + t Σ 6 1 Σ
and apply the trace operator, which is linear, to obtain
Tr ( Σ t 1 Σ ) = ( 1 t ) Tr ( Σ 5 1 Σ ) + t Tr ( Σ 6 1 Σ ) .
Consider Tr ( Σ 5 1 Σ ) . We may write this as
Tr ( Σ 5 1 Σ ) . = Tr ( Σ 5 1 ( Σ 5 + ( Σ Σ 5 ) = Tr ( I m ) + Tr ( Σ 5 1 ( Σ Σ 5 ) )
Now, from (12) and the form of Σ 5 in Table 1, we have that
Σ Σ 5 = 0 0 0 0 0 R P T Q 0 R T Q T P 0 .
The pdf p 5 is defined by the constraint K 23 = 0 in the inverse covariance matrix K. Hence, Σ 5 1 has the form
Σ 5 1 = 0 0
Performing the multiplication of these block matrices shows that the diagonal blocks of Σ 5 1 ( Σ Σ 5 ) are all equal to a zero matrix, and so have trace equal to zero. Hence, the trace of this matrix is equal to zero. Since Tr ( I m ) = m , it follows from (A18) that this matrix has trace equal to m.
Now consider the trace of Σ 6 1 Σ . From (12) and the form of Σ 6 in Table 1, we have that
Σ Σ 6 = 0 0 Q P R 0 0 0 Q T R T P T 0 0 .
The pdf p 6 is defined by the constraint K 13 = 0 in the inverse covariance matrix K. Hence, Σ 6 1 has the form
Σ 6 1 = 0 0
By adopting a similar argument to that given above for Σ 5 , it follows that Tr ( Σ 6 1 Σ ) = m . The required result follows from (A17). Now for the second part of the proof, we use Lemma 1 and the trace result just derived to write
D ( p | | p t ) = 1 2 log | Σ t | | Σ | 1 2 m + 1 2 Tr ( Σ t 1 Σ ) = 1 2 log | Σ t | | Σ | .
Σ t 1 may be written as
Σ t 1 = Σ 5 1 ( ( 1 t ) Σ 6 + t Σ 5 ) Σ 6 1
and so, by Lemma 2, we have
| Σ t | = | Σ 5 | | Σ 6 | | ( 1 t ) Σ 6 + t Σ 5 | = | I n 2 P T P | 2 | I n 3 Q T Q | | I n 3 R T R | | ( 1 t ) Σ 6 + t Σ 5 |
Hence
g ( t ) = 1 2 log | I n 2 P T P | 2 | I n 3 Q T Q | | I n 3 R T R | | Σ | 1 2 log | ( 1 t ) Σ 6 + t Σ 5 |
We wish to minimize g ( t ) with respect to t under the constraint that t F . We set H ( t ) = ( 1 t ) Σ 6 + t Σ 5 , which is positive definite by Lemma 3, and apply Jacobi’s formula and the chain rule. Differentiating (A22) with respect to t we obtain
g ( t ) = 1 2 1 | H ( t ) | × | H ( t ) | × Tr ( H ( t ) 1 ( Σ 5 Σ 6 ) ) = 1 2 Tr ( H ( t ) 1 ( Σ 5 Σ 6 ) )
Further differentiation yields
g ( t ) = 1 2 Tr ( H ( t ) 1 ( Σ 5 Σ 6 ) H ( t ) 1 ( Σ 5 Σ 6 ) ) = 1 2 Tr ( [ H ( t ) 1 ( Σ 5 Σ 6 ) ] 2 ) = 1 2 i = 1 m λ i 2 ,
where the λ i are the eigenvalues of the matrix H ( t ) 1 ( Σ 5 Σ 6 ) . Since H ( t ) is positive definite it possesses a positive definite square root K ( t ) , say. Since the eigenvalues of H ( t ) 1 ( Σ 5 Σ 6 ) are the same as those of K ( t ) ( Σ 5 Σ 6 ) K ( t ) , which is symmetric and so has real eigenvalues, and since the joint mutual information is positive (and so by Lemma 2 Σ 5 Σ 6 ), it follows that at least one of the eigenvalues must be non-zero. Hence, g ( t ) < 0 , t F . Therefore, g ( t ) is strictly convex on the convex set F and the minimum is unique, and given by the value t which satisfies the necessary condition g ( t ) = 0 . (This t turns out to be exactly the value of t that is required for the two forms of shared information to be equal.) Of course, the minimum could occur at an endpoint of the interval F, in which case the minimum value of g ( t ) is the value taken at the endpoint.
Finally, we write
| ( 1 t ) Σ 6 + t Σ 5 | = A B ( t ) B ( t ) T I n 3 = | A | | I n 3 B ( t ) T A 1 B ( t ) | ,
where A is defined in (A10), with inverse given by (A11), and
B ( t ) = ( 1 t ) Q + t P R ( 1 t ) P T Q + t R .
Some matrix calculation then gives
A 1 B ( t ) = ( 1 t ) Q t R
and also
B ( t ) T A 1 B ( t ) = ( 1 t ) 2 Q T Q + t ( 1 t ) ( R T P T Q + Q T P R ) + t 2 R T R .
Now from (A12) and (A23) it follows that | ( 1 t ) Σ 6 + t Σ 5 | is equal to
| I n 2 P T P | I n 3 ( 1 t ) 2 Q T Q t ( 1 t ) ( R T P T Q + Q T P R ) t 2 R T R
Making use of this equation and also (A22), and replacing t by the optimal t , we have that the minimum of g ( t ) , when t F , is
g ( t ) = 1 2 log | I n 2 P T P | | I n 3 Q T Q | | I n 3 R T R | | Σ | + 1 2 log 1 d ( t ) ,
where
d ( t ) = I n 3 ( 1 t ) 2 Q T Q t ( 1 t ) ( R T P T Q + Q T P R ) t 2 R T R .

Appendix G. Proof of Lemma 5

From (4) we know that
D ( p | | p 5 ) = D ( p | | p t ) + D ( p t | | p 5 ) .
Then, by Lemmas 1 and 4,
D ( p | | p 5 ) = 1 2 log | Σ 5 | | Σ | 1 2 m + 1 2 Tr ( Σ 5 1 Σ ) = 1 2 log | Σ 5 | | Σ | .
Also,
D ( p t | | p 5 ) = 1 2 log | Σ 5 | | Σ t | 1 2 m + 1 2 Tr ( Σ 5 1 Σ t )
and
D ( p | | p t ) = 1 2 log | Σ t | | Σ |
by (20) with t = t . By substituting these expressions into (A28) it follows that Tr ( Σ 5 1 Σ t ) = m . The proof that Tr ( Σ 6 1 Σ t ) = m is obtained by using a very similar argument, starting with (5).

Appendix H. Proof of Proposition 1

The formula for synergy is the minimum value of g ( t ) as provided in (A26). From (30), Lemmas 1 and 5, we have that
Unq 1 = 1 2 log | Σ 6 | | Σ t | = 1 2 log | Σ 5 | | Σ 6 | | ( 1 t ) Σ 6 + t Σ 5 | , by ( A 21 ) , = 1 2 log | I n 2 P T P | d ( t ) | I n 2 P T P | | I n 3 Q T Q | , by ( A 25 ) and ( A 27 ) and Lemma 2 , = 1 2 log 1 | I n 3 Q T Q | 1 2 log 1 d ( t )
By making use of a similar calculation we find that
Unq 2 = 1 2 log | Σ 5 | | Σ t | = 1 2 log 1 | I n 3 R T R | 1 2 log 1 d ( t )
The two expressions for the shared information are
I ( X 1 ; X 3 ) Unq 1 or I ( X 2 ; X 3 ) Unq 2 .
By using (6), (7), (A29) and (A30) it is easy to see that both expressions for the shared information, Shd, are equal to 1 2 log 1 d ( t ) .

Appendix I. Proof of Proposition 2

Appendix I.1. Non-Negativity

Since the Syn, Unq1 and Unq2 components in the I ig PID have been defined to be KL divergences they are non-negative. It remains to show that Shd is non-negative. From Appendix H, we see that this is the case if and only if 0 < d ( t ) < = 1 . From (A23), (A24) and (A27) we have, with t = t , that
| ( 1 t ) Σ 6 + t Σ 5 | = A B ( t ) B ( t ) T I n 3 = | A | | I n 3 B ( t ) T A 1 B ( t ) | ,
and that
d ( t ) = | I n 3 B ( t ) T A 1 B ( t ) |
Since t F , we know by Lemma 3 that t G , and so this matrix is positive definite. It follows from Lemma A1(iii) that I n 3 B ( t ) T A 1 B ( t ) is positive definite and from Lemma A1(i) that A is positive definite, as is A 1 . Therefore, A 1 possesses a (symmetric) positive definite square root, C say, and we may write
I n 3 B ( t ) T A 1 B ( t ) = I n 3 X T X , with X = C B ( t ) .
Then the matrix X T X is positive semi-definite and so has non-negative eigenvalues, λ 1 , λ 2 , λ n 3 . We denote the eigenvalues of I n 3 X T X as { 1 λ i : i = 1 , 2 , . . . , n 3 } . Since I n 3 X T X is positive definite we know that 1 λ i > 0 for i = 1 , 2 , . . . , n 3 . It follows that 0 < 1 λ i 1 for i = 1 , 2 , . . . , n 3 . Since the determinant of a square matrix is the product of its eigenvalues we have that
d ( t ) | I n 3 X T X | = i = 1 n 3 ( 1 λ i ) ,
and so 0 < d ( t ) 1 , hence the result.

Appendix I.2. Self-Redundancy

The property of self-redundancy considers the case of X 1 = X 2 , i.e., both sources are the same, and it requires us to show that
Shd I s h d ( X 1 , X 1 ; X 3 ] = I [ X 1 ; X 3 ] .
When the sources are the same we have n 1 = n 2 , Q = R and P = I n 1 , which results in a singular covariance matrix. Therefore, we take P = ( 1 ϵ ) I n 1 , for very small ϵ such that 0 < ϵ < 1 , and let ϵ 0 + . Using this information in (A27) we have that
d ( t ) = | I n 3 ( 1 t ) 2 Q T Q ( 1 ϵ ) t ( 1 t ) Q T Q t 2 Q T Q | = | I n 3 Q T Q + ϵ t ( 1 t ) Q T Q | | I n 3 Q T Q | , as ϵ 0 + .
Therefore Shd 1 2 log 1 | I n 3 Q T Q | , which is equal to I [ X 1 ; X 3 ] by (A1).
Out of interest, it seems worthwhile to check the limits for the classical information measures given in (A3)–(A7). From (A3) and (A13), after some cancellation, we have that
I [ X 1 ; X 3 | X 2 ] = 1 2 log | I n 3 Q T Q | | I n 3 Q T Q ( P T Q R ) T ( I n 2 P T P ) 1 ( P T Q R ) | = 1 2 log | I n 3 Q T Q | | I n 3 Q T Q + ϵ 2 ϵ Q T Q | 0 , as ϵ 0 + .
Also, since Q = R ,
I [ X 2 ; X 3 | X 1 ] = 1 2 log | I n 3 R T R | | I n 3 R T R + ϵ 2 ϵ R T R | 0 , as ϵ 0 + .
Similarly, for the joint mutual information and the interaction information:
I [ X 1 , X 2 ; X 3 ] 1 2 log 1 | I n 3 Q T Q | ,
and
I [ X 1 ; X 2 ; X 3 ] 1 2 log 1 | I n 3 Q T Q | ,
as expected, since ϵ 0 + .

Appendix I.3. Symmetry

To validate the symmetry property we require to prove that I s h d [ X 1 , X 2 ; X 3 ] is equal to I s h d [ X 2 , X 1 ; X 3 ] . Swapping X 1 and X 2 means that the sources are now in order X 2 , X 1 , X 3 and the covariance matrix in (12) becomes
I n 2 P T R P I n 1 Q R T Q T I n 3 ,
The switching of X 1 and X 2 means also that the probability distributions on S 5 and S 6 swap, since
S 5 : p ( x 2 | x 1 ) p ( x 3 | x 1 ) p ( x 1 ) p ( x 1 | x 2 ) p ( x 3 | x 2 ) p ( x 2 ) S 6 : p ( x 1 | x 2 ) p ( x 3 | x 2 ) p ( x 2 ) p ( x 2 | x 1 ) p ( x 3 | x 1 ) p ( x 1 )
and the corresponding covariance matrices are
Σ 5 = I n 2 P T R P I n 1 P R R T R T P T I n 3 and Σ 6 = I n 2 P T P T Q P I n 1 Q Q T P T Q T I n 3
We now apply a similar argument to that used in the proof of Lemma 4.
| ( 1 t ) Σ 6 + t Σ 5 | = A B ( t ) B ( t ) T I n 3 = | A | | I n 3 B ( t ) T A 1 B ( t ) | ,
where A and A 1 are
A = I n 2 P T P I n 1
A 1 = I n 2 + P T ( I n 2 P P T ) 1 P P T ( I n 2 P P T ) 1 ( I n 2 P P T ) 1 P ( I n 2 P P T ) 1
and
B ( t ) = ( 1 t ) P T Q + t R ( 1 t ) Q + t P R .
Some matrix calculation then gives
A 1 B ( t ) = t R ( 1 t ) Q
and also
B ( t ) T A 1 B ( t ) = ( 1 t ) 2 Q T Q + t ( 1 t ) ( R T P T Q + Q T P R ) + t 2 R T R .
Hence,
d ( t ) = I n 3 ( 1 t ) 2 Q T Q t ( 1 t ) ( R T P T Q + Q T P R ) t 2 R T R
which is identical to the expression of d ( t ) obtained in (A27). It follows that d ( t ) = d ( t ) and that the shared information is unchanged by swapping X 1 and X 2 .

Appendix I.4. Monotonicity on the Redundancy Lattice

We use the term ‘redundancy’ here for convenience rather than ‘shared information’, since both terms mean exactly the same thing. A redundancy lattice is defined in [1]. When there are two sources and a target, there are four terms of interest that are usually denoted by { { 1 } { 2 } , { 1 } , { 2 } , { 12 } } , which are the terms in the redundancy lattice. For monotonicity it is required that the redundancy values for these four terms satisfy the inequalities { 1 } { 2 } { 1 } { 12 } and that { 1 } { 2 } { 2 } { 12 } .
The redundancy value for term { 12 } is the self-redundancy I s h d [ ( X 1 , X 2 ) , ( X 1 , X 2 ) ; X 3 ] which is, by the self-redundancy property, equal to the joint mutual information I [ X 1 , X 2 ; X 3 ] . Similarly, the redundancy values for the terms { 1 } and { 2 } are, by self-redundancy, the mutual information I [ X 1 ; X 3 ] and I [ X 2 ; X 3 ] , respectively. The final term { 1 } { 2 } is equal to the shared information defined in Proposition 1.
Since the PID defined in Proposition 1 possesses the non-negativity property it follows from (6)–(10) that the redundancy measure is monotonic on the redundancy lattice.

Appendix J. Computation

Code was written using R [46] in RStudio [46] to compute the I ig PID. This code together with R code to compute the I dep and I mmi PIDs is available from https://github.com/JWKay/PID (accessed on 17 June 2024). The I ig code first checks whether the input covariance matrix is positive-definite. If so, the feasible region F is computed by defining a function whose value indicates whether or not the matrix Σ t is positive definite, and then applying the uniroot root-finding algorithm. The constrained optimisation is performed by using the base R function optim. The code produces a plot of g ( t ) and returns the numerical results. Details of the pre-processing employed to make use of the I ig formulae presented in Proposition 1 are the same as used with the I dep PID and are available from Appendix D [22].

References

  1. Williams, P.L.; Beer, R.D. Nonnegative Decomposition of Multivariate Information. arXiv 2010, arXiv:1004.2515. [Google Scholar]
  2. McGill, W.J. Multivariate Information Transmission. Psychometrika 1954, 19, 97–116. [Google Scholar] [CrossRef]
  3. Niu, X.; Quinn, C.J. A measure of Synergy, Redundancy, and Unique Information using Information Geometry. In Proceedings of the 2019 IEEE International Symposium on Information Theory (ISIT), Paris, France, 7–12 July 2019. [Google Scholar]
  4. Griffith, V.; Koch, C. Quantifying synergistic mutual information. In Guided Self-Organization: Inception. Emergence, Complexity and Computation; Springer: Berlin/Heidelberg, Germany, 2014; Volume 9, pp. 159–190. [Google Scholar]
  5. Bertschinger, N.; Rauh, J.; Olbrich, E.; Jost, J.; Ay, N. Quantifying Unique Information. Entropy 2014, 16, 2161–2183. [Google Scholar] [CrossRef]
  6. Harder, M.; Salge, C.; Polani, D. Bivariate measure of redundant information. Phys. Rev. E 2013, 87, 012130. [Google Scholar] [CrossRef] [PubMed]
  7. Ince, R.A.A. Measuring multivariate redundant information with pointwise common change in surprisal. Entropy 2017, 19, 318. [Google Scholar] [CrossRef]
  8. James, R.G.; Emenheiser, J.; Crutchfield, J.P. Unique Information via Dependency Constraints. J. Phys. Math. Theor. 2018, 52, 014002. [Google Scholar] [CrossRef]
  9. Finn, C.; Lizier, J.T. Pointwise Information Decomposition using the Specificity and Ambiguity Lattices. arXiv 2018, arXiv:1801.09010. [Google Scholar]
  10. Lizier, J.T.; Bertschinger, N.; Jost, J.; Wibral, M. Information Decomposition of Target Effects from Multi-Source Interactions: Perspectives on Previous, Current and Future Work. Entropy 2018, 20, 307. [Google Scholar] [CrossRef]
  11. Makkeh, A.; Gutknecht, A.J.; Wibral, M. Introducing a differentiable measure of pointwise shared information. Phys. Rev. 2021, 103, 032149. [Google Scholar] [CrossRef]
  12. Gutknecht, A.J.; Wibral, M.; Makkeh, A. Bits and pieces: Understanding information decomposition from part-whole relationships and formal logic. Proc. R. Soc. A 2021, 477, 0110. [Google Scholar] [CrossRef]
  13. Kolchinsky, A. A Novel Approach to the Partial Information Decomposition. Entropy 2022, 24, 403. [Google Scholar] [CrossRef] [PubMed]
  14. Wibral, M.; Finn, C.; Wollstadt, P.; Lizier, J.T.; Priesemann, V. Quantifying Information Modification in Developing Neural Networks via Partial Information Decomposition. Entropy 2017, 19, 494. [Google Scholar] [CrossRef]
  15. Finn, C.; Lizier, J.T. Quantifying Information Modification in Cellular Automata Using Pointwise Partial Information Decomposition. In Artificial Life Conference Proceedings; MIT Press: Cambridge, MA, USA, 2018; pp. 386–387. [Google Scholar]
  16. Timme, N.; Alford, W.; Flecker, B.; Beggs, J.M. Synergy, redundancy, and multivariate information measures: An experimentalist’s perspective. J. Comput. Neurosci. 2014, 36, 119–140. [Google Scholar] [CrossRef] [PubMed]
  17. Sherrill, S.P.; Timme, N.M.; Beggs, J.M.; Newman, E.L. Partial information decomposition reveals that synergistic neural integration is greater downstream of recurrent information flow in organotypic cortical cultures. PLoS Comput. Biol. 2021, 17, e1009196. [Google Scholar] [CrossRef] [PubMed]
  18. Pinto, H.; Pernice, R.; Silva, M.E.; Javorka, M.; Faes, L.; Rocha, A.P. Multiscale partial information decomposition of dynamical processes with short and long-range correlations: Theory and application to cardiovascular control. Physiol. Meas. 2022, 43, 085004. [Google Scholar] [CrossRef]
  19. Ince, R.A.A.; Giordano, B.L.; Kayser, C.; Rousselet, G.A.; Gross, J.; Schyns, P.G. A Statistical Framework for Neuroimaging Data Analysis Based on Mutual Information Estimated via a Gaussian Copula. Hum. Brain Mapp. 2017, 38, 1541–1573. [Google Scholar] [CrossRef] [PubMed]
  20. Barrett, A.B. An exploration of synergistic and redundant information sharing in static and dynamical Gaussian systems. Phys. Rev. E 2015, 91, 052802. [Google Scholar] [CrossRef] [PubMed]
  21. Olbrich, E.; Bertschinger, N.; Rauh, J. Information decomposition and synergy. Entropy 2015, 17, 3501–3517. [Google Scholar] [CrossRef]
  22. Kay, J.W.; Ince, R.A.A. Exact partial information decompositions for Gaussian systems based on dependency constraints. Entropy 2018, 20, 240. [Google Scholar] [CrossRef]
  23. Venkatesh, P.; Schamberg, G. Partial Information Decomposition via Deficiency for Multivariate Gaussians. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Espoo, Finland, 26 June–1 July 2022; pp. 2892–2897. [Google Scholar]
  24. Niu, X.; Quinn, C.J. Synergy and Redundancy Duality Between Gaussian Multiple Access and Broadcast Channels. In Proceedings of the 2020 International Symposium on Information Theory and Its Applications (ISITA), Kapolei, HI, USA, 24–27 October 2020; pp. 6–10. [Google Scholar]
  25. Faes, F.; Marinazzo, D.; Stramaglia, S. Multiscale Information Decomposition: Exact Computation for Multivariate Gaussian Processes. Entropy 2017, 19, 408. [Google Scholar] [CrossRef]
  26. Stramaglia, S.; Wu, G.-R.; Pellicoro, M.; Marinazzo, D. Expanding the transfer entropy to identify information circuits in complex systems. Phys. Rev. E 2012, 86, 066211. [Google Scholar] [CrossRef] [PubMed]
  27. Daube, C.; Ince, R.A.A.; Gross, J. Simple acoustic features can explain phoneme-based predictions of cortical responses to speech. Curr. Biol. 2019, 29, 1924–1937. [Google Scholar] [CrossRef] [PubMed]
  28. Park, H.; Ince, R.A.A.; Schyns, P.G.; Thut, G.; Gross, J. Representational interactions during audiovisual speech entrainment: Redundancy in left posterior superior temporal gyrus and synergy in left motor cortex. PLoS Biol. 2018, 16, e2006558. [Google Scholar] [CrossRef]
  29. Schulz, J.M.; Kay, J.W.; Bischofberger, J.; Larkum, M.E. GABAB Receptor-Mediated Regulation of Dendro-Somatic Synergy in Layer 5 Pyramidal Neurons. Front. Cell. Neurosci. 2021, 15, 718413. [Google Scholar] [CrossRef] [PubMed]
  30. Newman, E.L.; Varley, T.F.; Parakkattu, V.K.; Sherrill, S.P.; Beggs, J.M. Revealing the Dynamics of Neural Information Processing with Multivariate Information Decomposition. Entropy 2022, 24, 930. [Google Scholar] [CrossRef] [PubMed]
  31. Amari, S.-I. Information geometry on hierarchy of probability distributions. IEEE Trans. Inf. Theory 2001, 47, 1701–1711. [Google Scholar] [CrossRef]
  32. Amari, S.-I. Information Geometry and Its Applications; Springer: Tokyo, Japan, 2020. [Google Scholar]
  33. Amari, S.-I.; Nagaoka, H. Methods of Information Geometry; American Mathematical Society: Providence, Rhode Island, 2000. [Google Scholar]
  34. Lauritzen, S.L. Graphical Models; Oxford University Press: Oxford, UK, 1996. [Google Scholar]
  35. Zwick, M. An overview of reconstructability analysis. Kybernetes 2004, 33, 877–905. [Google Scholar] [CrossRef]
  36. Ay, N.; Polani, D.; Virgo, N. Information Decomposition Based on Cooperative Game Theory. Kybernetika 2020, 56, 979–1014. [Google Scholar] [CrossRef]
  37. Sugiyama, M.; Nakahara, H.; Tsuda, K. Information Decomposition on Structured Space. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Barcelona, Spain, 10–15 July 2016; pp. 575–579. [Google Scholar]
  38. Bailey, R.A. Hasse diagrams as a visual aid for linear models and analysis of variance. Commun. Stat. Theory Methods 2021, 50, 5034–5067. [Google Scholar] [CrossRef]
  39. Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley-Interscience: New York, NY, USA, 1991. [Google Scholar]
  40. Whittaker, J. Graphical Models in Applied Multivariate Statistics; Wiley: Chichester, UK, 2008. [Google Scholar]
  41. Eguchi, E.; Komori, O. Minimum Divergence Methods in Statistical Machine Learning; Springer: Tokyo, Japan, 2022. [Google Scholar]
  42. Wibral, M.; Priesemann, V.; Kay, J.W.; Lizier, J.T.; Phillips, W.A. Partial information decomposition as a unified approach to the specification of neural goal functions. Brain Cogn. 2017, 112, 25–38. [Google Scholar] [CrossRef]
  43. Kullback, S. Information Theory and Statistics; Courier Corporation: Gloucester, MA, USA, 1978. [Google Scholar]
  44. Horn, R.A.; Johnson, C.R. Matrix Analysis; Cambridge University Press: New York, NY, USA, 1985. [Google Scholar]
  45. Wigner, E.P. On weakly positive matrices. Can. J. Math. 1963, 15, 313–318. [Google Scholar] [CrossRef]
  46. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2021; Available online: https://www.R-project.org/ (accessed on 20 May 2024).
Figure 1. A partially ordered lattice of the manifold M and submanifolds S 1 S 7 . The form of the pdf that is shown for each submanifold is that obtained by m-projection of the distribution p ( x 1 , x 2 , x 3 ) onto the submanifold.
Figure 1. A partially ordered lattice of the manifold M and submanifolds S 1 S 7 . The form of the pdf that is shown for each submanifold is that obtained by m-projection of the distribution p ( x 1 , x 2 , x 3 ) onto the submanifold.
Entropy 26 00542 g001
Figure 2. Plots of the ‘synergy’ function g ( t ) for the two calcium datasets. (a) First calcium dataset. The feasible range for t is (−1.13, 1.56), with t = 0.2408 . (b) Second calcium dataset. The feasible range for t is (−1.67, 1.69), with t = 0.0027 .
Figure 2. Plots of the ‘synergy’ function g ( t ) for the two calcium datasets. (a) First calcium dataset. The feasible range for t is (−1.13, 1.56), with t = 0.2408 . (b) Second calcium dataset. The feasible range for t is (−1.67, 1.69), with t = 0.0027 .
Entropy 26 00542 g002
Table 1. Submanifold probability distributions with corresponding covariance matrices (modified from [22]).
Table 1. Submanifold probability distributions with corresponding covariance matrices (modified from [22]).
Submanifold Σ i Submanifold Σ i
S 1 : p ( x 1 ) p ( x 2 ) p ( x 3 ) I n 1 0 0 0 I n 2 0 0 0 I n 3 S 5 : p ( x 2 | x 1 ) p ( x 3 | x 1 ) p ( x 1 ) I n 1 P Q P T I n 2 P T Q Q T Q T P I n 3
S 2 : p ( x 1 , x 2 ) p ( x 3 ) I n 1 P 0 P T I n 2 0 0 0 I n 3 S 6 : p ( x 1 | x 2 ) p ( x 3 | x 2 ) p ( x 2 ) I n 1 P P R P T I n 2 R R T P T R T I n 3
S 3 : p ( x 1 , x 3 ) p ( x 2 ) I n 1 0 Q 0 I n 2 0 Q T 0 I n 3 S 7 : p ( x 1 | x 3 ) p ( x 2 | x 3 ) p ( x 3 ) I n 1 Q R T Q R Q T I n 2 R Q T R T I n 3
S 4 : p ( x 2 , x 3 ) p ( x 1 ) I n 1 0 0 0 I n 2 R 0 R T I n 3 M : p ( x 1 , x 2 , x 3 ) I n 1 P Q P T I n 2 R Q T R T I n 3
Table 2. PID results for exact pdfs, reported as a percentage of the joint mutual information.
Table 2. PID results for exact pdfs, reported as a percentage of the joint mutual information.
pdfPIDUnq1Unq2ShdSyn
p I ig  4.36.61.587.7
I dep  4.36.61.487.6
I mmi  0.02.35.891.9
p 7 I ig  15.033.231.720.1
I dep  35.153.311.60.0
I mmi  0.018.246.735.1
p 6 I ig  0.075.924.10.0
I dep  0.075.924.10.0
I mmi  0.075.924.10.0
p 5 I ig  75.20.024.80.0
I dep  75.20.024.80.0
I mmi  75.20.024.80.0
p 4 I ig  0.0100.00.00.0
I dep  0.0100.00.00.0
I mmi  0.0100.00.00.0
p 3 I ig  100.00.00.00.0
I dep  100.00.00.00.0
I mmi  100.00.00.00.0
Table 3. PID results for simulated datasets from the pdfs, p 3 p 7 , p , reported as median (in bold) and range of the sample of percentages of the joint mutual information, apart from t which gives the median and range of the actual values obtained using the I ig algorithm.
Table 3. PID results for simulated datasets from the pdfs, p 3 p 7 , p , reported as median (in bold) and range of the sample of percentages of the joint mutual information, apart from t which gives the median and range of the actual values obtained using the I ig algorithm.
pdfPID t Unq1Unq2ShdSyn
p I ig  0.554.46.71.687.3
(0.44, 0.63)(2.6, 6.6)(3.7, 9.0)(1.1, 2.1)(85.4, 89.2)
I dep   4.56.81.487.2
(2.8, 6.8)(3.9, 9.3)(1.0, 1.9)(85.3, 89.0)
I mmi   0.02.36.091.7
(0.0, 2.9)(3.2, 6.2)(3.9, 8.0)(89.2, 93.6)
p 7 I ig  0.5915.133.031.320.2
(0.47, 0.71)(7.6, 26.5)(19.9, 48.0)(23.4, 37.3)(15.2, 24.9)
I dep   34.352.311.80.2
(23.6, 46.4)(38.9, 64.5)(7.8, 20.3)(0.0, 8.6)
I mmi   0.018.046.435.6
(0.0, 11.4)(0.0, 45.9)(30.4, 57.7)(21.3, 45.9)
p 6 I ig  0.970.172.525.02.4
(0.82, 1.06)(0.0, 2.4)(55.6, 84.7)(11.4, 38.4)(0.2, 8.1)
I dep   2.574.922.60.0
(0.2, 10.4)(60.4, 88.6)(7.6, 35.3)(0.0, 0.0)
I mmi   0.072.325.12.6
(0.0, 0.0)(45.7, 83.6)(13.5, 45.4)(0.3, 8.9)
p 5 I ig  0.0668.00.327.44.3
(−0.06, 0.20)(50.0, 81.7)(0.0, 2.9)(12.5, 43.3)(0.9, 12.1)
I dep   72.34.722.80.0
(54.9, 87.3)(0.9, 13.3)(6.8, 38.4)(0.0, 0.0)
I mmi   67.50.027.94.7
(45.7, 82.6)(0.0, 0.0)(13.9, 45.5)(0.9, 14.2)
p 4 I ig  0.970.0695.02.52.4
(0.91, 1.0)(0.0, 0.7)(83.0, 99.5)(1.9, 7.8)(0.3, 8.3)
I dep   2.297.20.30.2
(0.2, 8.4)(90.7, 99.7)(0.0, 2.6)(0.0, 2.7)
I mmi   0.094.92.52.6
(0.0, 0.0)(83.1, 99.3)(0.1, 9.4)(0.4, 8.6)
p 3 I ig  0.0590.90.24.54.5
(0.01, 0.14)(75.7, 98.4)(0.0, 1.8)(0.7, 11.5)(0.9, 11.5)
I dep   94.84.20.30.3
(86.2, 99.1)(0.7, 12.6)(0.0, 3.1)(0.0, 3.5)
I mmi   90.60.04.64.8
(76.3, 97.6)(0.0, 0.0)(1.0, 11.9)(1.2, 13.0)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kay, J.W. A Partial Information Decomposition for Multivariate Gaussian Systems Based on Information Geometry. Entropy 2024, 26, 542. https://doi.org/10.3390/e26070542

AMA Style

Kay JW. A Partial Information Decomposition for Multivariate Gaussian Systems Based on Information Geometry. Entropy. 2024; 26(7):542. https://doi.org/10.3390/e26070542

Chicago/Turabian Style

Kay, Jim W. 2024. "A Partial Information Decomposition for Multivariate Gaussian Systems Based on Information Geometry" Entropy 26, no. 7: 542. https://doi.org/10.3390/e26070542

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop