Next Article in Journal
Regression of Concurrence via Local Unitary Invariants
Next Article in Special Issue
Unique Information Through the Lens of Channel Ordering: An Introduction and Review
Previous Article in Journal
LPI Radar Waveform Recognition Based on Hierarchical Classification Approach and Maximum Likelihood Estimation
Previous Article in Special Issue
Synergy Makes Direct Perception Inefficient
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Synergy as the Failure of Distributivity

Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot 7610001, Israel
*
Author to whom correspondence should be addressed.
Entropy 2024, 26(11), 916; https://doi.org/10.3390/e26110916
Submission received: 15 September 2024 / Revised: 16 October 2024 / Accepted: 26 October 2024 / Published: 28 October 2024

Abstract

:
The concept of emergence, or synergy in its simplest form, is widely used but lacks a rigorous definition. Our work connects information and set theory to uncover the mathematical nature of synergy as the failure of distributivity. For the trivial case of discrete random variables, we explore whether and how it is possible to get more information out of lesser parts. The approach is inspired by the role of set theory as the fundamental description of part–whole relations. If taken unaltered, synergistic behavior is forbidden by the set-theoretic axioms. However, random variables are not a perfect analogy of sets: we formalize the distinction, highlighting a single broken axiom—union/intersection distributivity. Nevertheless, it remains possible to describe information using Venn-type diagrams. The proposed multivariate theory resolves the persistent self-contradiction of partial information decomposition and reinstates it as a primary route toward a rigorous definition of emergence. Our results suggest that non-distributive variants of set theory may be used to describe emergent physical systems.

1. Introduction

Reductionism is a standard scientific approach in which a system is studied by breaking it into smaller parts. However, some of the most interesting phenomena in physics and biology appear to resist such disentanglement. In these cases, complexity emerges from intricate interactions between many predominantly simple components [1]. Such synergic systems are typically described as “a whole that is greater than the sum of its parts”. To pour quantitative meaning into this equation-like definition, it is natural to borrow tools from the mathematical theory that describes part–whole relationships, namely set theory. Unfortunately, for finite sets, a simple Venn diagram suffices to demonstrate that the size of the whole ( A B ) can never exceed the sum of the sizes of its parts (A and B):
| A B | = | A | + | B | | A B | | A | + | B |
In fact, the trivial interaction, A B , between the two parts of the system decreases the size of the whole rather than increasing it.
To allow for more intricate interactions, one can turn to the realm of random variables. It is well known that measuring the outcome of two random variables can provide more information than the sum of what is obtained when measuring each separately. Moreover, the textbook description of the interactions between random variables often involves set-theoretical-like Venn diagrams [2]. These two facts lead to the intriguing possibility that random variables may lend themselves to a mathematical description of non-trivial whole–part relationships.
Take two discrete variables W and Z: the information W contains about Z is determined by the mutual information function I ( W ; Z ) [3]. Cases for which W can be presented as a joint random variable W = ( X , Y ) allow us to compare the whole against its parts:
I ( ( X , Y ) ; Z ) I ( X ; Z ) + I ( Y ; Z )
In other words, looking at both system parts together can convey either more or less information than their added values. Therefore, and in contrast to Equation (1), this formalism can be used to describe synergy.
In their seminal paper [4], Williams and Beer proposed the framework of partial information decomposition as a way of assessing the underlying structure of a two discrete random variable system and quantifying the amount of synergy between its parts. They suggested that, much like a set of elements, each variable can be decomposed into separate information “subsets”. These information atoms are assumed to have non-negative size and represent the information that is shared between two variables (R), uniquely present in only one of them ( U X , U Y ):
I ( X ; Z ) = R + U X , I ( Y ; Z ) = R + U Y , I ( ( X , Y ) ; Z ) = R + U X + U Y + S , R , U X , U Y , S 0
See Figure 1 for clarification. An additional synergy term (S) was artificially introduced to provide a simple mechanism that allows the whole to be greater than the sum of its parts:
I ( ( X , Y ) ; Z ) I ( X ; Z ) I ( Y ; Z ) = S R > 0 iff S > R
A series of papers [5,6,7] focused on calculating these atoms’ sizes by fixing the single remaining degree of freedom in Equation (3). No consensus has yet been reached regarding a single physical solution. Meanwhile, the field of applications is getting wider [8,9]. Recent works extend the theory to continuous variables [10,11], introduce causality [12,13], and consider quantum information [14].
Unfortunately, partial information decomposition has a significant drawback that puts the whole approach into question: no extension beyond two variables is possible without a fundamental self-contradiction [15]. Some authors attempted to resolve this by abandoning the basic properties required of information atoms, including their non-negativity [16,17].
In what follows, we reconsider the foundations of partial information decomposition and pinpoint the source of its long-standing self-contradictions. To do this, we follow H. K. Ting [18] to establish a rigorous relation between information and set theories and highlight a fundamental distinction between them: random variables, unlike sets, do not adhere to the union/intersection distributivity axiom [19]. This leads us to study a distributivity-free variant of set theory as a possible self-consistent theory of information atoms. Within this framework, we demonstrate that the presence of synergistic properties is a direct consequence of the broken axiom. In the case of N = 3 random variables, we show that the amount of synergistic information precisely coincides with the extent to which distributivity is breached. The acquired understanding allows us to resolve the contradictions and suggest a coherent multivariate theory, which may provide the foundations for quantifying emergence in large systems.

2. Set-Theoretic Approach to Information

In this section, we formalize the distinction between finite sets and discrete random variables. Clearly, it is linked to the synergistic behavior of the latter. We will first focus on a special illustrative example: the XOR gate. This system contains neither redundant nor unique information, which will emphasize the peculiar properties of synergy. A more general discussion, including arbitrary random variables, will be presented in the next section.

2.1. Basic Random Variable Operations

Some set-theoretic operations have straightforward extensions to random variables [18,20,21]. The first of these relies on the similarity between Equations (1) and (2) and identifies taking the joint variable with the union operator (∪). One can now go on to define random variable inclusion as:
X Y Z : X Z = Y
which is, actually, equivalent to X being a deterministic function of Y.
The inclusion–exclusion formula ([22], Chapter 3.1) applied to two random variables reveals mutual information as the size of the intersection between two random variables:
H ( X Y ) = H ( X ) + H ( Y ) I ( X ; Y ) ,
where Shannon entropy H is regarded as a measure on the random variable space. Indeed, it complies with many properties required of a mathematical measure ([23], Chapter 1.4): non-negativity, monotonicity, and subadditivity. Furthermore, entropy is zero only for deterministic variables, which play the role of an empty set (Appendix A, Lemma A1):
H ( X ) 0 , X Y H ( X ) H ( Y ) , H i = 1 N X i i = 1 N H ( X i ) , H ( X ) = 0 X =
A rigorous definition of intersection (∩) needs to comply with the inclusion order (5) X Y X , X Y Y , in addition to the size constraint. Unfortunately, a random variable satisfying both conditions does not always exist [24]. Nonetheless, a physically sensible intersection may be inferred in several cases:
H ( X Y ) = H ( X ) + H ( Y ) X Y = , X Y H ( X ) = I ( X ; Y ) X Y = X
These simple parallels between information theory and set theory are enough to study information decomposition in a random variable XOR gate.

2.2. The Simplest Synergic System: XOR Gate

Consider three pairwise independent fair coins O 1 , O 2 , O 3 with an additionally imposed higher order interaction–parity rule O 3 = O 1 O 2 . It fixes the value of the third variable to be 0 whenever the values of O 1 and O 2 coincide, and 1 otherwise.
Probability O 1 O 2 O 3
1 / 4 000
0001
0010
1 / 4 011
0100
1 / 4 101
1 / 4 110
0111
One can easily calculate the amount of information O 1 , O 2 and ( O 1 , O 2 ) convey about O 3 . The comparison of these contributions shows that the system is indeed synergic:
I ( O 1 ; O 3 ) = 0 bit , I ( O 2 ; O 3 ) = 0 bit , I ( ( O 1 , O 2 ) ; O 3 ) = 1 bit , I ( ( O 1 , O 2 ) ; O 3 ) > I ( O 1 ; O 3 ) + I ( O 2 ; O 3 )
Moreover, by substituting the above result into the decomposition Equation (3), we find that the system contains only a single non-zero information atom S = 1 bit . This allows us to study synergy separately from any other contributions on this example.

2.3. Subdistributivity

When taking a closer look at the XOR gate, our set-theoretic intuition for random variables breaks down even further. The pairwise independence dictates O 2 O 3 = O 1 O 3 = , while the parity rule makes O 3 a deterministic function of the joint variable ( O 1 , O 2 ) :
O 3 ( O 1 O 2 ) ( O 1 O 2 ) O 3 = O 3
A simple conclusion from these facts is that the XOR-gate variables do not comply with the set-theoretic axiom of distributivity:
( O 1 O 2 ) O 3 = O 3 = ( O 1 O 3 ) ( O 2 O 3 )
Nevertheless, it can be shown that a weaker relation of subdistributivity holds for any three random variables (Appendix A, Lemma A2):
( X Y ) Z ( X Z ) ( Y Z )
Even though it is evident that random variables are quite different from sets, we argue that some of the logic behind partial information decomposition may be recovered by extending set-theoretic notions, such as the inclusion–exclusion principle and Venn diagrams, to non-distributive systems.

2.4. Inclusion–Exclusion Formulas

The inclusion–exclusion formula for the XOR gate can be obtained by repeatedly applying the two-variable Equation (6) and using that I ( X ; Y ) = H ( X Y ) when the intersection exists:
H ( O 1 O 2 O 3 ) = = H ( O 1 O 2 ) + H ( O 3 ) H ( ( O 1 O 2 ) O 3 ) = = H ( O 1 ) + H ( O 2 ) + H ( O 3 ) H ( ( O 1 O 2 ) O 3 )
It disagrees with the analogous set-theoretic formula (for non-intersecting sets) only in the last term, which is non-zero precisely due to the subdistributivity. Note that while the rest of the terms are symmetric with respect to the permutation of indices, expression ( O 1 O 2 ) O 3 is not as it explicitly depends on the order of derivation. This essentially leads to three different inclusion–exclusion formulas. Nonetheless, the size of the distributivity-breaking term remains invariant:
H ( ( O 1 O 2 ) O 3 ) = H ( ( O 1 O 3 ) O 2 ) = H ( ( O 2 O 3 ) O 1 )

2.5. Construction of Venn-Type Diagram for XOR Gate

The non-uniqueness of inclusion–exclusion formulas complicates the construction of Venn diagrams. A way of tackling this as well as some further intuition can be traced via our XOR gate example.
In set theory, Venn diagrams act as graphical representations of the inclusion–exclusion principle ([22], Chapter 3.1). The inclusion–exclusion formula computes the size of union as a sum of all possible intersections between the participating sets. For correct bookkeeping, this is achieved with alternating signs that account for the covering number—the number of times each intersection is counted as a part of some set. In classical set theory, the covering number of an intersection is trivially the number of sets which are being intersected. However, (13) includes the distributivity-breaking term, which is absent from this classical theory and whose covering number is not evident. It appears with a negative sign which signifies an even-times covered region. In this three variable system, the only even alternative is a 2-covered region. From another perspective, in each of the three possible formulas O k is covered once by itself and one more time by the union O i O j (though not by O i or O j individually). As for the size of this region, independent of k, it measures at 1 bit of information. Denoting this area as Π s , we have:
Π s [ 2 ] = H ( ( O i O j ) O k ) = H ( O k ) = 1 bit ,
where the covering number is indicated in the brackets [ ] . To find the rest of our diagram’s regions, we borrow two properties of set-theoretic diagrams.
First of all, in a system of N arbitrary random variables X 1 , X N , the total entropy of the system is equal to the sum of all diagram regions Π i [ c i ] :
H ( X 1 , X N ) = i Π i [ c i ]
Second, the sum of individual variables’ entropies is equal to the sum of region sizes times their corresponding covering numbers c i :
H ( X 1 ) + H ( X 2 ) + + H ( X N ) = i c i Π i [ c i ]
These properties may be viewed as the information conservation law: adding new sources should either introduce new information or increase the covering of existing regions.
Let us assume that in addition to Π s the diagram of the XOR gate contains several more regions Π j s . To calculate their sizes and coverings we apply (16) and (17):
j s ( c j 1 ) Π j [ c j ] = 0 bit
We use the fact that information is non-negative and discard meaningless empty regions. The above equation then allows for a single 1-bit region, which is covered once:
Π g [ 1 ] = 1 bit
To respect the physical meaning behind the diagram regions as pieces of information, we demand the structure of the diagram to be well-defined. In other words, despite the existence of three different versions of inclusion–exclusion formula (13), they are all assumed to describe the same system. Indeed, our result remains invariant with respect to index permutations in terms of region sizes and covering numbers.
In regard to the shape of the Venn diagram, this assumption dictates along with (15) that region Π s corresponds to all variables at the same time:
Π s = H ( O 1 ) = H ( O 2 ) = H ( O 3 )
One can think of Π s as a 2-covered triple intersection between O 1 , O 2 , and O 3 . This is a drastic divergence from classical set theory, where an intersection between n sets is covered exactly n times. As we shall see, without distributivity, n variables can have multiple intersection regions with different covering numbers 1 c n .
Moving on to the second region in this system: Π g appears as a leftover when taking the difference between the whole system and Π s and by set-theoretic intuition, it does not intersect with O k for any k. As such, it is not a part of any single variable.
Finally, we combine all findings into a system of equations, which generates the Venn-type diagram of the information distribution inside the XOR gate (Figure 2):
Figure 2. A Venn-type diagram for the XOR gate. Each variable is represented by a primary color circle (red, yellow, blue) while the outer circle outlines the whole system. Of the total 2 bits of the XOR gate, one is covered two times and is represented by the inner disk. Since it is covered twice, this area is colored by pairwise color-blends (orange, purple, and green). Since it is covered by three variables, it includes patches of all three possible blends. A critical difference between this diagram and a set-theoretic one is that even though the three variables have no pairwise intersections, the inner disk representing the mutual content of all three variables is non-empty. The remaining 1 bit is covered once and resides only inside the joint variable. Since this area is covered once, it is colored by primary colors. Patches of all three colors are used since this area does not belong to any single variable.
Figure 2. A Venn-type diagram for the XOR gate. Each variable is represented by a primary color circle (red, yellow, blue) while the outer circle outlines the whole system. Of the total 2 bits of the XOR gate, one is covered two times and is represented by the inner disk. Since it is covered twice, this area is colored by pairwise color-blends (orange, purple, and green). Since it is covered by three variables, it includes patches of all three possible blends. A critical difference between this diagram and a set-theoretic one is that even though the three variables have no pairwise intersections, the inner disk representing the mutual content of all three variables is non-empty. The remaining 1 bit is covered once and resides only inside the joint variable. Since this area is covered once, it is colored by primary colors. Patches of all three colors are used since this area does not belong to any single variable.
Entropy 26 00916 g002
H ( O 1 ) = Π s , H ( O 2 ) = Π s , H ( O 3 ) = Π s , H ( O 1 O 2 ) = Π s + Π g , H ( O 2 O 3 ) = Π s + Π g , H ( O 1 O 3 ) = Π s + Π g , H ( O 1 O 2 O 3 ) = Π s + Π g
In usual Venn diagrams, intersections represent correlations between different parts. Similarly, in the XOR gate the higher-order parity interaction added on top of the non-correlated variables is responsible for the appearance of a 2-covered triple intersection.

2.6. Synergy as an Information Atom

We can compare our set-theory-inspired results against the expectations of the partial information decomposition. Namely, Equations (3) state that the information O 1 and O 2 carry about O 3 can be described by the atoms R = U 1 = U 2 = 0 bit , S = 1 bit . The left side of each line in (3) may be rewritten by definition as an intersection of random variables:
I ( X ; Z ) = H ( X Z ) , I ( Y ; Z ) = H ( Y Z ) , I ( ( X , Y ) ; Z ) = H ( ( X Y ) Z )
For the XOR gate, the former two are empty, while the last line links the original definition of synergistic information to the non-set-theoretic term of the inclusion–exclusion Formula (13) and the peculiar region of the corresponding diagram:
S = I ( ( O 1 , O 2 ) ; O 3 ) = H ( ( O 1 O 2 ) O 3 ) = Π s
Curiously, synergistic behavior of mutual information does not contradict the subadditivity of entropy. The synergistic information piece S is not new to the system and is always contained in the variables’ full entropy.
The nature of ghost atom  G = Π g is deeply connected to this outcome, even though it does not explicitly participate in the decomposition. Consider the individual contributions by each of the sources O 1 , O 2 :
I ( O i = 1 , 2 ; O 3 ) = H ( O i ) + H ( O 3 ) H ( O i O 3 )
Using (21), we can rewrite this in terms of information atoms:
I ( O i ; O 3 ) = Π s + Π s ( Π s + Π g ) = S G = 0
The equality between the synergistic and ghost atoms ensures that the former is exactly canceled from the individual contribution by each source. Synergistic information is, of course, still present in the “whole” (23). This circumstance is responsible for creating the illusion of synergy appearing out of nowhere when sources are combined.

3. General Trivariate Decomposition

The XOR gate example studied above is a degenerate example with a sole synergistic information atom. We will now expand our description into a system with non-synergistic components with the aim to characterize any three variables using information atoms.

3.1. Extended Random Variable Space

The lack of a proper description for information intersections severely limits our ability to decompose the information content of more general random variable systems. Our solution for this issue is inspired by an elegant duality between set theory and information quantities found by H. K. Ting in [18] and further elaborated in [21]. It simply extends the space of random variables to include all elements produced by operations , , (2), (8) and (28). Entropy is extended as a (non-negative) measure H ^ such that:
H ^ ( X ) = 0 X = , X Y = H ^ ( X Y ) = H ^ ( X ) + H ^ ( Y )
To approach the problem of characterizing information atoms in the trivariate case, we derive the corresponding inclusion–exclusion formula. As stated previously, the bivariate version (6) holds without alterations (Appendix A, Lemma A3). Now, in contrast, we get a distributivity-breaking difference term, which, to make matters even worse, depends on the order of derivation (Appendix A, Theorem A1). One possible variant of this formula is portrayed in Figure 3:
H ^ ( X 1 X 2 X 3 ) = = H ^ ( X 1 ) + H ^ ( X 2 ) + H ^ ( X 3 ) H ^ ( X 1 X 2 ) H ^ ( X 1 X 3 ) H ^ ( X 2 X 3 ) + + H ^ ( X 1 X 2 X 3 ) Δ H ^ ,
where Δ H ^ = H ^ ( ( ( X σ ( 1 ) X σ ( 2 ) ) X σ ( 3 ) ) ( ( X σ ( 1 ) X σ ( 3 ) ) ( X σ ( 2 ) X σ ( 3 ) ) ) ) for any permutation of indices σ . The difference is defined as:
D = X Y D Y = , D ( X Y ) = X
In general, due to subdistributivity the difference may not be unique (Appendix A, (A20)). Its size, on the other hand, is fixed as H ^ ( X Y ) = H ^ ( X ) H ^ ( X Y ) .
Figure 3. A single realization of the inclusion–exclusion principle for three variables. The new region, corresponding to the distributivity-breaking difference is represented via a checkered pattern. Covering numbers are written for each sector and highlighted by the colors. This is not a full Venn-type diagram that defines the information atoms, and hence, its structure is clearly not invariant with respect to variable permutations.
Figure 3. A single realization of the inclusion–exclusion principle for three variables. The new region, corresponding to the distributivity-breaking difference is represented via a checkered pattern. Covering numbers are written for each sector and highlighted by the colors. This is not a full Venn-type diagram that defines the information atoms, and hence, its structure is clearly not invariant with respect to variable permutations.
Entropy 26 00916 g003

3.2. Set-Theoretic Solution

Before going to arbitrary variables, consider a system where distributivity axiom holds. Under such condition the setup becomes effectively equivalent to set theory. A trivariate system can, therefore, be illustrated by the same Venn diagram as that of three sets:
H ( X 1 ) = Π { 1 } + Π { 1 } { 2 } + Π { 1 } { 3 } + Π { 1 } { 2 } { 3 } , H ( X 2 ) = Π { 2 } + Π { 1 } { 2 } + Π { 2 } { 3 } + Π { 1 } { 2 } { 3 } , H ( X 3 ) = Π { 3 } + Π { 1 } { 3 } + Π { 2 } { 3 } + Π { 1 } { 2 } { 3 } , H ( X 1 , X 2 ) = Π { 1 } + Π { 2 } + Π { 1 } { 2 } + Π { 1 } { 3 } + Π { 2 } { 3 } + Π { 1 } { 2 } { 3 } , H ( X 1 , X 3 ) = Π { 1 } + Π { 3 } + Π { 1 } { 2 } + Π { 1 } { 3 } + Π { 2 } { 3 } + Π { 1 } { 2 } { 3 } , H ( X 2 , X 3 ) = Π { 2 } + Π { 3 } + Π { 1 } { 2 } + Π { 1 } { 3 } + Π { 2 } { 3 } + Π { 1 } { 2 } { 3 } , H ( X 1 , X 2 , X 3 ) = = Π { 1 } + Π { 2 } + Π { 3 } + Π { 1 } { 2 } + Π { 1 } { 3 } + Π { 2 } { 3 } + Π { 1 } { 2 } { 3 }
By calculating the sizes of atoms, we derive (Appendix B, (A29)) the criterion for their non-negativity: the whole must be less or equal to the sum of the parts:
I ( X 1 , X 2 ; X 3 ) I ( X 1 ; X 3 ) I ( X 2 ; X 3 ) 0

3.3. Main Result: Arbitrary Trivariate System

At this point, we have studied two opposite cases: a completely synergic system (XOR gate) and one without any synergy (set-theoretic solution). To describe three arbitrary variables, any general decomposition must be able to replicate both of them. It turns out that a combination of the already known atoms (Figure 4) suffices in providing a non-negative decomposition (presented in detail in Appendix B (A47); for the proof see Lemma A6):
H ( X i ) = Π s + set theor . atoms Π , H ( X i , X j i ) = Π s + Π g + s . t . atoms Π , H ( X 1 , X 2 , X 3 ) = Π s + Π g + s . t . atoms Π , Π s = Π g
This is the minimal solution to the problem as it contains the smallest set of necessary atoms. The whole and parts are now related by the difference of two terms:
I ( ( X 1 , X 2 ) ; X 3 ) I ( X 1 ; X 3 ) I ( X 2 ; X 3 ) = Π s Π { 1 } { 2 } { 3 } 0
We can gain major insight by substituting the left side using the inclusion–exclusion formulas (6) and (27):
Δ H ^ H ^ ( X 1 X 2 X 3 ) = Π s Π { 1 } { 2 } { 3 }
Remember that the only 3-covered area in the system is X 1 X 2 X 3 . Therefore, the size of Π s is determined by the distributivity-breaking difference:
Π { 1 } { 2 } { 3 } = H ^ ( X 1 X 2 X 3 ) , Π s = Δ H ^
To find the physical meaning behind the recovered solution, we once again compare it to the partial information decomposition of the same system. Only four of the diagram regions (Figure 4) appear in the corresponding equations:
I ( X 1 ; X 3 ) = Π { 1 } { 2 } { 3 } + Π { 1 } { 3 } , I ( X 2 ; X 3 ) = Π { 1 } { 2 } { 3 } + Π { 2 } { 3 } , I ( ( X 1 , X 2 ) ; X 3 ) = Π { 1 } { 2 } { 3 } + Π s + Π { 1 } { 3 } + Π { 2 } { 3 }
The result fully captures the structure behind Williams and Beer’s definitions [4]:
Π { 1 } { 2 } { 3 } Redundancy , Π { 1 } { 3 } Unique information in X 1 , Π { 2 } { 3 } Unique information in X 2 , Π s Synergy
We have, thus, shown how information synergy naturally follows from set-theoretic arguments. The synergistic contribution is contained in the entropy of the parts and is precisely equal to the distributivity-breaking difference Δ H ^ . The interaction responsible for the synergistic contribution is depicted on the Venn diagram as an intersection with unconventional covering number Π s . Finally, the illusion of a whole being greater than the sum of its parts comes from the fact that the mutual information terms on the left-hand size of Equation (3) do not account for all regions of the Venn-diagram (Figure 4).

4. Towards a Multivariate Information Decomposition

In this section, we lay the foundation for a consistent theory of multivariate decomposition and resolve the contradictions between partial information decomposition axioms [15].

4.1. Information Atoms Based on Part–Whole Relations

To rigorously define the information atoms, we may think of them as basic pieces of information, which make up all more complex quantities. Previously, we have used the inclusion–exclusion principle to break down the entropy of the whole system into smaller parts step by step. Even without writing the formula for N variables, one can find the general form of the terms participating in this process:
Ξ [ C ] = X i
The covering number C is defined trivially as the number of intersecting union-brackets in (37) and determines the sign of the associated term by the inclusion–exclusion principle. Similarly to the Möbius inversion used in set theory [25], the decomposition of non-distributive space will rely on the inclusion order lattice ( L Ξ , ) of terms Ξ . A general description of the decomposition through part–whole relations was proposed in [26] in the form of the parthood table. It is a matrix with entries 0 or 1, which define whether a given atom Π is a part of a particular larger information piece, i.e., the inclusion–exclusion term (37):
H ^ ( Ξ i [ C i ] ) = j f i j Π j [ c j ] , f i j = 0 , 1
The parthood table depends on the initial variables through the monotonicity axiom, or compliance with the inclusion lattice ( L Ξ , ) :
Ξ i Ξ j k f i k f j k
It relates the table’s entries within themselves by a simple rule: if one Ξ term is included in the other, all the atoms from the decomposition of former should be present in the decomposition of the latter.
The summands Π are non-negative functions and represent the sizes of atoms. The covering number c j of each atom is defined by the coverings of inclusion–exclusion terms C i :
c j = max i : f i j = 1 C i
This rule remains unchanged from the classical set theory.
The information conservation law (17) is the final condition that preserves the physical meaning of the covering numbers—the number of times the same information appears in the system.
The existence of a general solution for N variables is not guaranteed. Besides, linear system (38) is undetermined for N > 2 . For a specific set of degenerate cases it is, however, still possible to calculate the sizes of all atoms. We will next list several such examples while specifying how information is distributed among their different parts.
  • Set-Theoretic Solution for N Variables
In a distributive system, the solution is a particular case of Möbius inversion [25] (Appendix B, (A30)). Mutual information as a function of random variables becomes subadditive (Appendix B, Lemma A5) proving that the lack of distributivity is a necessary condition for emergence.
  • XOR Gate
The solution found for the XOR gate is unique in the parthood table formalism (Appendix B, Theorem A2). This reinforces our proposal of synergistic and ghost atoms as physical entities.
  • N-Parity
Generalizing the XOR gate to an arbitrary number of variables yields the N-parity setup. It allows a solution of the similar form (Appendix B, (A42)–(A46)):
Π s [ 2 ] = 1 bit , Π g n = 1 , N 2 ¯ [ 1 ] = 1 bit , n , σ H ( X σ ( 1 ) , X σ ( 2 ) , X σ ( n ) ) = Π s + i = 2 n 1 Π g i

4.2. Resolving the Partial Information Decomposition Self-Contradiction

The existence of any multivariate decomposition was previously believed to be disproved [15] by employing a simple example that could not be solved without discarding one of the partial information decomposition axioms. The information inside three XOR variables, O 1 , O 2 , O 3 , about their joint variable O 4 = ( O 1 , O 2 , O 3 ) was claimed to be grouped into three 1-bit synergistic atoms that, using our notation, corresponding to O 1 ( O 2 O 3 ) O 4 , O 2 ( O 1 O 3 ) O 4 , and O 3 ( O 1 O 2 ) O 4 . These were summed up to give three bits of information—more than the total of two bits present in the entire system. The authors of [15] concluded that the non-negativity of information was not respected.
To resolve this discrepancy, first notice that partial information decomposition atoms are a subset of of the full set of atoms { Π } . In the system with N sources of information X 1 , X N and target X N + 1 they lie inside the intersection I ( X 1 , , X N ; X N + 1 ) = H ^ ( ( X 1 X N ) X N + 1 ) and are defined by the submatrix of the full parthood table f i j : Ξ i ( X 1 X 2 X N ) X N + 1 . In particular, when the output is equal to the joint variable of all inputs, the entropy of inputs coincides with mutual information, and hence, all atoms appear in the partial information decomposition (Appendix C, Lemma A7). The set of atoms { Π } itself is then identical to that of the system X 1 , X N alone with the exception of all covering numbers being increased by one to comply with the additional cover of X N + 1 (Appendix C, Theorem A3). This is exactly the type of system that was used in [15]. Using the solution of the XOR gate, we find:
Π s [ 3 ] = 1 bit , Π g [ 2 ] = 1 bit , I ( O 1 ; O 4 ) = Π s , I ( O 2 ; O 4 ) = Π s , I ( O 3 ; O 4 ) = Π s , I ( ( O 1 , O 2 ) ; O 4 ) = Π s + Π g , I ( ( O 1 , O 3 ) ; O 4 ) = Π s + Π g , I ( ( O 2 , O 3 ) ; O 4 ) = Π s + Π g , I ( ( O 1 , O 2 , O 3 ) ; O 4 ) = Π s + Π g
In place of three, there is only one symmetric atom Π s [ 3 ] . The confusion in [15] occurred since different forms of the inclusion–exclusion principle were considered separately and it was assumed that each version would create its own synergistic atom.

5. Discussion

Previous attempts for studying synergistic information using set-theoretic intuition have led to self-contradictions. In this work, we point out that the non-distributivity of random variables corresponds to a well-defined variant of set-theory. We employ our results to construct a Venn-like diagram for an arbitrary three-variable system and demonstrate how synergism to be a direct consequence of distributivity breaking.
Our results do not fully solve the problem at hand. First, precise calculation of atom sizes was left unanswered and might require a more explicit description of information intersections. Another caveat is that although we constructed the equations that describe a self-consistent multivariate information decomposition, the existence of a solution for N arbitrary random variables is yet to be proven.
Nevertheless, this work lays the basis for a self-consistent multivariate theory. Our analysis reestablishes the concept of information decompositions as a foundation for further enquiry in quantifying emergence. In this context, information theory serves as a mere illustration: the mechanism we describe offers an explanation of the nature of synergy which uses solely set-theoretic concepts and can be applied to any emergent physical system.
From the physical standpoint, synergistic properties of information are a consequence of entropy reordering inside the system of inputs and outputs. However, this is only possible because the mathematical entities under consideration (discrete random variables) possess the property of subdistributivity, whose origin and interpretation in terms of the underlying physical system is yet to be found. One could also take a different function to represent the size of random variables. This might lead to additional positive (synergic) or negative (redundant) contributions and requires further investigation. Examples of measures other than entropy that still obey set-theoretic logic are discussed in [21].

Author Contributions

I.S. and O.F. developed the theory and wrote the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This work has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement No. 770964), the Israeli Science Foundation grant 1727/20, and the Minerva Foundation. O.F. is the incumbent of the Henry J Leir Professorial chair.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Data is contained within the article.

Acknowledgments

We wish to thank Rotem Shalev, Amir Shpilka, and Gregory Falkovich for their insightful comments.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
PIDPartial Information Decomposition
XORExclusive OR
RVRandom variable

Appendix A. Properties of the Extended Random Variable (RV) Space

Lemma A1.
In the set-inspired algebra (Section 2.1), all deterministic variables are equal between each other and obey the property of the empty set ([27], Chapter 5): for any random variable X:
X
Proof. 
Let V be a deterministic variable. It is therefore also a deterministic function of any random variable X, which by (5) implies V X .
Now, for any two deterministic variables V 1 , V 2 , we have V 1 V 2 and V 2 V 1 ; hence, in set-theoretic view:
V 1 = V 2
Corollary A1.
Other properties of the empty set are equivalent to (A1): for any random variable X and deterministic variable:
X = X , X =
While the postulate of distributivity is independent of the other axioms in set theory, a weaker condition of subdistributivity of union over intersection ought to hold in random variable space even without it:
Lemma A2.
For any three random variables X , Y , Z :
( X Z ) ( Y Z ) ( X Y ) Z
Proof. 
We start by showing that if variables X 1 , X 2 are both included in X 3 , then also ( X 1 X 2 ) X 3 . Indeed, by the definition of inclusion (5), whenever X 1 and X 2 are deterministic functions of X 3 , the joint variable ( X 1 , X 2 ) is also such.
Now notice that for arbitrary variables X , Y , Z we have:
( X Z ) ( ( X Y ) Z ) = ( X Y ) ( Z ( X Z ) ) = = ( X Y ) ( X Z ) = ( X Z ) ,
therefore, by (8), ( X Z ) ( X Y ) Z . Likewise, ( Y Z ) ( X Y ) Z .
Combining the above results, we get the statement of the lemma:
( X Z ) ( Y Z ) ( X Y ) Z
Corollary A2.
The subdistributivity (or, more correctly, superdistributivity) of intersection over union also holds:
( X Y ) Z ( X Z ) ( Y Z )
In the extended RV space, the inclusion–exclusion principle for two variables, as expected, remains unaffected by the lack of distributivity
Lemma A3.
The size of the union of two extended RV space members is related to their own sizes and the size of their intersection as:
H ^ ( X Y ) = H ^ ( X ) + H ^ ( Y ) H ^ ( X Y )
Proof. 
We will rewrite the left side H ( X Y ) as a union of two disjoint pieces:
X ( Y X ) = X ( X Y ) ( Y X ) = X Y
At the same time, by definition X ( Y X ) = . We may use the additivity of the measure (26) to write the size of union as a sum of sizes of its disjoint parts:
H ^ ( X Y ) = H ^ ( X ) + H ^ ( Y X )
Repeating the same steps in order to decompose the second summand into two more terms concludes the proof:
( Y X ) ( X Y ) = ( ( Y X ) X ) Y = , ( Y X ) ( X Y ) = Y , H ^ ( Y ) = H ^ ( Y X ) + H ^ ( X Y )
The inclusion–exclusion principle for three variables, along with all the terms from the set-theoretic version, contains a peculiar extra term related to the failure of distributivity
Theorem A1.
The size of triple union is related to the sizes of individual terms, their intersections, and the distributivity-breaking difference Δ H ^ :
H ^ ( X 1 X 2 X 3 ) = = H ^ ( X 1 ) + H ^ ( X 2 ) + H ^ ( X 3 ) H ^ ( X 1 X 2 ) H ^ ( X 1 X 3 ) H ^ ( X 2 X 3 ) + + H ^ ( X 1 X 2 X 3 ) Δ H ^ ,
where the last term is found as:
Δ H ^ = H ^ ( ( ( X σ ( 1 ) X σ ( 2 ) ) X σ ( 3 ) ) ( ( X σ ( 1 ) X σ ( 3 ) ) ( X σ ( 2 ) X σ ( 3 ) ) ) )
and stays invariant with respect to permutations of indices σ.
Proof. 
We begin by choosing two of three variables (or extended RV space members) on the left side and grouping them in order to use the result of the previous lemma (A8):
H ^ ( ( X 1 X 2 ) X 3 ) = H ^ ( X 1 X 2 ) + H ^ ( X 3 ) H ^ ( ( X 1 X 2 ) X 3 )
The first term is easily decomposed further using Lemma A3. In order to proceed with the third term, we define the distributivity-breaking difference as Δ X 123 = ( X 1 X 2 ) X 3 ( X 1 X 3 ) ( X 2 X 3 ) and apply the second axiom of the measure:
H ^ ( ( X 1 X 2 ) X 3 ) = = H ^ ( ( ( X 1 X 3 ) ( X 2 X 3 ) ) Δ X 123 ) = = H ^ ( ( X 1 X 3 ) ( X 2 X 3 ) ) + H ^ ( Δ X 123 )
Applying (A8) once again:
H ^ ( ( X 1 X 3 ) ( X 2 X 3 ) ) = = H ^ ( X 1 X 3 ) + H ^ ( X 2 X 3 ) H ^ ( X 1 X 2 X 3 )
and combining everything into the final form:
H ^ ( X 1 X 2 X 3 ) = = H ^ ( X 1 ) + H ^ ( X 2 ) + H ^ ( X 3 ) H ^ ( X 1 X 2 ) H ^ ( X 1 X 3 ) H ^ ( X 2 X 3 ) + + H ^ ( X 1 X 2 X 3 ) H ^ ( Δ X 123 )
The only term that depends on the order of putting brackets in (A14) is H ^ ( Δ X 123 ) . Due to the associativity and commutativity of both union and intersection, we conclude that it is the only part of the equation that is not symmetric with respect to the permutations of indices:
Δ X 123 Δ X 132 Δ X 231 ,
Its size is, therefore, bound to be the same in all three cases. Defining a single function equal to this value concludes the proof:
Δ H ^ = H ^ ( Δ X 123 )
The operation of taking the difference ∖ in extended RV space may have more than one outcome. It can be shown already on the XOR gate example. Taking the variable that represents the whole system W = O 1 O 2 O 3 , we have two candidates O 2 , O 3 for the result of the difference W O 1 . Substituting them into the definition (28), we find out that both are valid, despite being explicitly unequal:
O i = 2 , 3 O 1 = , O i ( W O 1 ) = O i O 1 = W

Appendix B. Information Atoms

A convenient notation of antichains was proposed in the partial information decomposition [4,15] to describe pieces of information. Let us denote each joint variable by the collection of variables’ indices:
( X i 1 , X i 2 , , X i m ) { i 1 i 2 i m } = A
There is a trivial partial order A B i A i B and we can use it to represent the intersections. A set of strong antichains α A ( N ) is taken on the above poset:
α = A 1 A 2 A n = { i 11 i 1 m 1 } { i 21 i 2 m 2 } { i n 1 i n m n } ,
where all indices are chosen from 1 , N ¯ and never coincide i a b i c d . The partial order ⪯ can be extended to antichains:
α β B β A α : A B
Now, a general inclusion–exclusion term (37) in an N-variable system can be denoted by an antichain α A ( N ) :
Ξ α [ C = n ] = j = 1 n k = 1 m j X i j k
The covering C is always equal to the cardinality of the corresponding antichain (the number of brackets { } ):
C = n = | α |
The inclusion order on Ξ -terms follows from the antichain order ⪯. The latter is independent of the chosen random variables and holds for every system:
α β Ξ α Ξ β
The new notation allows us to replace the first index of the parthood table f i j with an antichain and simplify the formulation of the multivariate theory’s axioms:
H ^ ( Ξ α ) = i f α i Π i [ c i ] , Ξ α Ξ β i f α i f β i , c i = max α : f α i = 1 | α | k = 1 N H ( X k ) = i c i Π i [ c i ]
Lemma A4.
Two inclusion–exclusion terms that are equal as members of extended RV space have identical parthood matrix rows:
Ξ α = Ξ β i : Π i > 0 f α i = f β i
  • Set-Theoretic Solution
This is a complete replica of set theory, fully compliant with the distributivity axiom. For N = 3 variables, condition (30) is necessary and sufficient for non-negativity of all atoms:
Π { 1 } { 2 } { 3 } = I ( X 1 ; X 3 ) + I ( X 2 ; X 3 ) I ( X 1 , X 2 ; X 3 ) 0 , Π { i } { j } = I ( X i ; X j | X k ) 0 , Π { i } = H ( X j , X k ) + H ( X 1 , X 2 , X 3 ) 0
For an arbitrary number of variables N, there is no variability in the inclusion–exclusion formulas and the atoms are recovered via the Möbius inversion with respect to the antichain order ⪯. Let us also denote the atoms by a special subset of antichains ι with a single index in each bracket:
H ^ ( Ξ α ) = ι α Π ι [ n ] , ι = { i 1 } { i 2 } { i n } , Π ι = m = n N ( 1 ) m n i n + 1 , i m I m ( X i 1 ; X i m ) ,
where I m is the m th order interaction information function defined as a sign-changing sum of entropies k = 1 m ( 1 ) k 1 j 1 , j k H ( X j 1 , X j 2 , , X j k ) .
Set-theoretic systems never exhibit synergistic properties. The following result can be understood in the sense that the lack of distributivity is a necessary condition for the existence of synergy in any N-variable system
Lemma A5.
In a set-theoretic system, mutual information is always subadditive:
I ( ( X 1 , X N ) ; X N + 1 ) i = 1 N I ( X i ; X N + 1 )
Proof. 
Let us substitute the atoms (A30) into the inequality:
ι { 12 N } { N + 1 } Π ι k = 1 N ι { k } { N + 1 } Π ι
For any atom on the left side, we have by (A23):
ι { 12 N } { N + 1 } { i a } , { i b } ι : { i a } { 12 N } { i b } { N + 1 }
Since ι is composed of single indices, we have:
i a = 1 , N ¯ , i b = N + 1
Then, this term can also be found on the right side of (A32):
ι { i a } { N + 1 } : ι = ι
The non-negativity of all atoms concludes the proof. □
  • XOR Gate
The XOR gate contains a completely different set of atoms. With three pairwise independent initial variables, the set of inclusion–exclusion terms simplifies to:
Ξ { 123 } [ 1 ] = O 1 O 2 O 3 H ^ ( Ξ { 123 } ) = 2 , Ξ { i j } [ 1 ] = O i O j H ^ ( Ξ { i j } ) = 2 , Ξ { i } [ 1 ] = O i H ^ ( Ξ { i } ) = 1 , Ξ { i j } { k } [ 2 ] = ( O i O j ) O k H ^ ( Ξ { i j } { k } ) = 1
In the extended RV space Ξ { i j } { k } = Ξ { k } , Ξ { i j } = Ξ { 123 } and by Lemma A4, we only need to find decompositions of Ξ { i } and Ξ { 123 } . Due to the symmetry of the problem, decomposition of Ξ { i } may contain three types of atoms: three distinct atoms Π i , each being a part of only the respective Ξ i ; three distinct atoms Π i , j , each being a part of both specified terms; or one symmetrically shared Π s , as we have guessed in (15)
H ^ ( Ξ { i j } { k } [ 2 ] ) = H ^ ( Ξ { k } [ 1 ] ) = Π k [ 2 ] + Π i , k [ 2 ] + Π j , k [ 2 ] + Π s [ 2 ]
The coverings are calculated by definition (A27). For Ξ { 123 } , one more atom Π g may be added:
H ^ ( Ξ { i j } [ 1 ] ) = H ^ ( Ξ { 123 } [ 1 ] ) = = Π 1 [ 2 ] + Π 2 [ 2 ] + Π 3 [ 2 ] + Π 1 , 2 [ 2 ] + Π 2 , 3 [ 2 ] + Π 1 , 3 [ 2 ] + Π s [ 2 ] + Π g [ 1 ]
The following parthood table contains columns for all atoms discussed above.
f Π s Π g Π 1 Π 2 Π 3 Π 1 , 2 Π 1 , 3 Π 2 , 3
{ 1 } { 2 } { 3 } 00000000
{ 1 } { 2 } 00000000
{ 1 } { 3 } 00000000
{ 2 } { 3 } 00000000
{ 12 } { 3 } 10001011
{ 13 } { 2 } 10010101
{ 23 } { 1 } 10100110
{ 1 } 10100110
{ 2 } 10010101
{ 3 } 10001011
{ 12 } 11111111
{ 13 } 11111111
{ 23 } 11111111
{ 123 } 11111111
In a symmetric solution, the atom sizes are invariant with respect to index permutations, and hence, let:
Π s = x , Π i , k = y , Π i = 1 2 y x , Π g = 2 x 3 y 3 ( 1 2 y x ) = 2 x + 3 y 1
Substituting this into the information conservation law:
i H ( X i ) = 3 = 2 x + 2 3 y + 2 3 ( 1 2 y x ) + 2 x + 3 y 1 = 5 2 x 3 y , 2 x + 3 y = 2
However, we know that all atoms have non-negative sizes, which means that most atoms disappear from the solution (have zero sizes):
Π i = 1 2 y x = 0.5 y 0 , y = 0 , x = 1
Theorem A2.
The XOR gate has a unique symmetric decomposition:
Π s [ 2 ] = 1 bit , Π g [ 1 ] = 1 bit
f Π s Π g
{ 1 } { 2 } { 3 } 00
{ 1 } { 2 } 00
{ 1 } { 3 } 00
{ 2 } { 3 } 00
{ 12 } { 3 } 10
{ 13 } { 2 } 10
{ 23 } { 1 } 10
{ 1 } 10
{ 2 } 10
{ 3 } 10
{ 12 } 11
{ 13 } 11
{ 23 } 11
{ 123 } 11
  • N-Parity
A generalization of the XOR gate is the N-parity setup, also a symmetric system, for which one of the variables is fully determined by the combination of all the others:
X 1 , N ¯ = 0 , 50 % 1 , 50 % , i = 1 , N ¯ X i j i X j mod 2
The set of inclusion–exclusion terms is quite simple: 1-covered terms coinciding with the entropies, whose size is equal to the number of participating variables (for N variables it remains at N + 1 bits since the last variable is deterministic of the rest):
Ξ { i 1 i 2 i n } [ 1 ] = k = 1 , n ¯ X i k , H ^ ( Ξ { i 1 i 2 i n } ) = min ( n , N 1 )
and 1-bit 2-covered intersections between two unions:
Ξ { i 1 i 2 i n } { i n + 1 i N } [ 2 ] = k = 1 , n ¯ X i k l = n + 1 , N ¯ X i l , H ^ ( Ξ { i 1 i 2 i n } { i n + 1 i N } ) = 1
The rest of Ξ -terms are empty. A solution can be easily guessed: a single symmetric 2-covered atom Π s [ 2 ] = 1 and a set of N 2 ghost atoms Π g k [ 1 ] = 1 , k = 1 , N 2 ¯ , such that:
H ^ ( Ξ { i 1 i 2 i n } { i n + 1 i N } ) = Π s , H ^ ( Ξ { i 1 i 2 i n } ) = Π s + k = 1 n 2 Π g k
We immediately see that the information conservation law is satisfied:
i H ( X i ) = N = 2 Π s + k = 1 N 2 Π g k
  • Arbitrary Trivariate System
A Venn-type diagram for any three variables can be constructed using the following universal system of equations and parthood table (Table A1).
H ( X 1 ) = Π { 1 } { 2 } { 3 } [ 3 ] + Π s [ 2 ] + Π { 1 } { 2 } [ 2 ] + Π { 1 } { 3 } [ 2 ] + Π { 1 } [ 1 ] , H ( X 2 ) = Π { 1 } { 2 } { 3 } [ 3 ] + Π s [ 2 ] + Π { 1 } { 2 } [ 2 ] + Π { 2 } { 3 } [ 2 ] + Π { 2 } [ 1 ] , H ( X 3 ) = Π { 1 } { 2 } { 3 } [ 3 ] + Π s [ 2 ] + Π { 1 } { 3 } [ 2 ] + Π { 2 } { 3 } [ 2 ] + Π { 3 } [ 1 ] , H ( X 1 , X 2 ) = Π { 1 } { 2 } { 3 } [ 3 ] + Π s [ 2 ] + Π { 1 } { 2 } [ 2 ] + Π { 1 } { 3 } [ 2 ] + Π { 2 } { 3 } [ 2 ] + + Π { 1 } [ 1 ] + Π { 2 } [ 1 ] + Π g [ 1 ] , H ( X 1 , X 3 ) = Π { 1 } { 2 } { 3 } [ 3 ] + Π s [ 2 ] + Π { 1 } { 2 } [ 2 ] + Π { 1 } { 3 } [ 2 ] + Π { 2 } { 3 } [ 2 ] + + Π { 1 } [ 1 ] + Π { 3 } [ 1 ] + Π g [ 1 ] , H ( X 2 , X 3 ) = Π { 1 } { 2 } { 3 } [ 3 ] + Π s [ 2 ] + Π { 1 } { 2 } [ 2 ] + Π { 1 } { 3 } [ 2 ] + Π { 2 } { 3 } [ 2 ] + + Π { 2 } [ 1 ] + Π { 3 } [ 1 ] + Π g [ 1 ] , H ( X 1 , X 2 , X 3 ) = Π { 1 } { 2 } { 3 } [ 3 ] + Π s [ 2 ] + Π { 1 } { 2 } [ 2 ] + Π { 1 } { 3 } [ 2 ] + Π { 2 } { 3 } [ 2 ] + + Π { 1 } [ 1 ] + Π { 2 } [ 1 ] + Π { 3 } [ 1 ] + Π g [ 1 ] , Π s [ 2 ] = Π g [ 1 ]
Table A1. Parthood table for a universal trivariate decomposition.
Table A1. Parthood table for a universal trivariate decomposition.
f Π { 1 } { 2 } { 3 } Π s Π { 1 } { 2 } Π { 1 } { 3 } Π { 2 } { 3 } Π { 1 } Π { 2 } Π { 3 } Π g
{ 1 } { 2 } { 3 } 100000000
{ 1 } { 2 } 101000000
{ 1 } { 3 } 100100000
{ 2 } { 3 } 100010000
{ 12 } { 3 } 110110000
{ 13 } { 2 } 111010000
{ 23 } { 1 } 111100000
{ 1 } 111101000
{ 2 } 111010100
{ 3 } 110110010
{ 12 } 111111101
{ 13 } 111111011
{ 23 } 111110111
{ 123 } 111111111
Lemma A6.
Any system of three random variables can be decomposed into a set of non-negative atoms (A47).
Proof. 
One can find the sizes of atoms Π { i } from the last four equations in the system:
Π { 1 } = H ( X 1 , X 2 , X 3 ) H ( X 2 , X 3 ) 0 , Π { 2 } = H ( X 1 , X 2 , X 3 ) H ( X 1 , X 3 ) 0 , Π { 3 } = H ( X 1 , X 2 , X 3 ) H ( X 1 , X 2 ) 0
For the rest of the set-theoretic atoms, we have:
I ( X 1 ; X 2 ) = Π { 1 } { 2 } { 3 } + Π { 1 } { 2 } , I ( X 1 ; X 3 ) = Π { 1 } { 2 } { 3 } + Π { 1 } { 3 } , I ( X 2 ; X 3 ) = Π { 1 } { 2 } { 3 } + Π { 2 } { 3 }
To satisfy the non-negativity requirement, we need:
Π { 1 } { 2 } = I ( X 1 ; X 2 ) Π { 1 } { 2 } { 3 } 0 , Π { 1 } { 3 } = I ( X 1 ; X 3 ) Π { 1 } { 2 } { 3 } 0 , Π { 2 } { 3 } = I ( X 2 ; X 3 ) Π { 1 } { 2 } { 3 } 0 ,
which is equivalent to:
0 Π { 1 } { 2 } { 3 } min I ( X 1 ; X 2 ) , I ( X 1 ; X 3 ) , I ( X 2 ; X 3 )
The last independent equation can be written using a third-order information interaction function:
I 3 ( X 1 ; X 2 ; X 3 ) = Π { 1 } { 2 } { 3 } Π s ,
therefore:
Π s = Π g = Π { 1 } { 2 } { 3 } I 3 ( X 1 ; X 2 ; X 3 ) 0
The obtained set of conditions is indeed self-consistent, as:
min I ( X 1 ; X 2 ) , I ( X 1 ; X 3 ) , I ( X 2 ; X 3 ) I 3 ( X 1 ; X 2 ; X 3 )

Appendix C. Partial Information Decomposition (PID)

The partial information decomposition atoms are only a subset of all atoms Π . Yet, for some systems, it may be equal to the full set. Indeed, when the output is exactly the joint variable of all inputs, it essentially “covers” the whole diagram of the system of inputs. The entropies of inputs completely turn into mutual information about the output.
Lemma A7.
The partial information decomposition with inputs X 1 , X N and their joint variable chosen as an output X N + 1 = ( X 1 , , X N ) contains all information atoms Π of the system X 1 , X N + 1 .
Proof. 
The PID atoms are by definition the ones contained in the intersection of the form:
( X 1 X 2 X N ) X N + 1
By conditions of the lemma, in extended random variable space, we have:
( X 1 X 2 X N ) X N + 1 = ( X 1 X 2 X N ) = = ( X 1 X 2 X N ) X N + 1
Applying Lemma A4 concludes the proof. □
A stronger statement can be made that the whole structure of the resulting N + 1 variable decomposition is equivalent to the lesser decomposition of just the inputs X 1 , X N with a single extra covering added to each atom to account for the output X N + 1 , covering the whole system one more time.
Theorem A3.
A decomposition for the N + 1 variable system X 1 , X N + 1 with:
X N + 1 = X 1 X 2 X N
defined by a set of atoms { Π i [ c i ] } i I and parthood table f can be obtained from the decomposition { Π ˜ j [ c ˜ j ] } j J , f ˜ of the N variable system X 1 , X N as:
α A ( N + 1 ) , i I f α i = f ˜ F ( α ) j ( i ) Π i = Π ˜ j ( i ) c i = c ˜ j ( i ) + 1
where j ( i ) is a bijection of indices and (surjective) function F : A ( N + 1 ) A ( N ) removes a bracket from an antichain if this bracket contains index N + 1 .
Proof. 
Examining the inclusion–exclusion terms, we find that:
α A ( N + 1 ) Ξ α = Ξ F ( α )
By Lemma A4, this guarantees the equivalence of the corresponding parthood table rows:
i I , α A ( N + 1 ) f α i = f F ( α ) i
Now, we need to determine the parthood table rows only for α Im ( F ) = A ( N ) . Knowing the solution { Π ˜ } j J for the N-variable system, we substitute the same atoms into the larger N + 1 variable system and define a bijection of indices j ( i ) :
H ^ ( Ξ α ) = H ^ ( Ξ F ( α ) ) = j J f ˜ F ( α ) j Π ˜ j = i I f α i Π i , i I , α A ( N + 1 ) Π i = Π ˜ j ( i ) , f α i = f ˜ F ( α ) j ( i )
Finally, to ensure the validity of the new solution, we check its compliance with axioms (A27):
  • Monotonicity:
    Ξ α Ξ β Ξ F ( α ) Ξ F ( β ) i I f α i = f ˜ F ( α ) j ( i ) f ˜ F ( β ) j ( i ) = f β i
  • Covering numbers:
    N + 1 α | α | = | F ( α ) | + 1 N + 1 α | α | = | F ( α ) |
    c i = max α A ( N + 1 ) : f α i = 1 | α | = max β A ( N ) : f ˜ β j ( i ) = 1 | β | + 1 = c ˜ j ( i ) + 1
  • Information conservation law:
    H ( X N + 1 ) = H ( X 1 , X N ) = j J Π ˜ j , k = 1 N H ( X k ) = j J c ˜ j Π ˜ j , k = 1 N + 1 H ( X k ) = j J c ˜ i Π ˜ i + H ( X N + 1 ) = j J ( c ˜ j + 1 ) Π ˜ j = i I c i Π i

References

  1. Artime, O.; De Domenico, M. From the origin of life to pandemics: Emergent phenomena in complex systems. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2022, 380, 20200410. [Google Scholar] [CrossRef] [PubMed]
  2. Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley Series in Telecommunications and Signal Processing; Wiley-Interscience: Hoboken, NJ, USA, 2006. [Google Scholar]
  3. Shannon, C. A Mathematical Theory of Communication. Bell Labs Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
  4. Williams, P.; Beer, R. Nonnegative Decomposition of Multivariate Information. arXiv 2010, arXiv:1004.2515. [Google Scholar]
  5. Harder, M.; Salge, C.; Polani, D. Bivariate measure of redundant information. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 2013, 87, 012130. [Google Scholar] [CrossRef] [PubMed]
  6. Bertschinger, N.; Rauh, J.; Olbrich, E.; Ay, N. Quantifying Unique Information. Entropy 2013, 16, 2161. [Google Scholar] [CrossRef]
  7. Kolchinsky, A. A Novel Approach to the Partial Information Decomposition. Entropy 2022, 24, 403. [Google Scholar] [CrossRef] [PubMed]
  8. Mediano, P.; Rosas, F.; Luppi, A.; Jensen, H.; Seth, A.; Barrett, A.; Carhart-Harris, R.; Bor, D. Greater than the parts: A review of the information decomposition approach to causal emergence. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2022, 380, 20210246. [Google Scholar] [CrossRef] [PubMed]
  9. Lizier, J.; Bertschinger, N.; Wibral, M. Information Decomposition of Target Effects from Multi-Source Interactions: Perspectives on Previous, Current and Future Work. Entropy 2018, 20, 307. [Google Scholar] [CrossRef] [PubMed]
  10. Ehrlich, D.A.; Schick-Poland, K.; Makkeh, A.; Lanfermann, F.; Wollstadt, P.; Wibral, M. Partial information decomposition for continuous variables based on shared exclusions: Analytical formulation and estimation. Phys. Rev. E 2024, 110, 014115. [Google Scholar] [CrossRef] [PubMed]
  11. Schick-Poland, K.; Makkeh, A.; Gutknecht, A.; Wollstadt, P.; Sturm, A.; Wibral, M. A partial information decomposition for discrete and continuous variables. arXiv 2021, arXiv:2106.12393. [Google Scholar]
  12. Rosas, F.; Mediano, P.; Jensen, H.; Seth, A.; Barrett, A.; Carhart-Harris, R.; Bor, D. Reconciling emergences: An information-theoretic approach to identify causal emergence in multivariate data. PLoS Comput. Biol. 2020, 16, e1008289. [Google Scholar] [CrossRef] [PubMed]
  13. Balduzzi, D.; Tononi, G. Integrated Information in Discrete Dynamical Systems: Motivation and Theoretical Framework. PLoS Comput. Biol. 2008, 4, e1000091. [Google Scholar] [CrossRef] [PubMed]
  14. van Enk, S.J. Quantum partial information decomposition. Phys. Rev. A 2023, 108, 062415. [Google Scholar] [CrossRef]
  15. Rauh, J.; Bertschinger, N.; Olbrich, E. Reconsidering unique information: Towards a multivariate information decomposition. IEEE Int. Symp. Inf. Theory—Proc. 2014, 2014, 2232–2236. [Google Scholar] [CrossRef]
  16. Finn, C.; Lizier, J. Pointwise Partial Information Decomposition Using the Specificity and Ambiguity Lattices. Entropy 2018, 20, 297. [Google Scholar] [CrossRef] [PubMed]
  17. Ince, R.A.A. The Partial Entropy Decomposition: Decomposing multivariate entropy and mutual information via pointwise common surprisal. arXiv 2017, arXiv:1702.01591. [Google Scholar]
  18. Ting, H.K. On the Amount of Information. Theory Probab. Its Appl. 1962, 7, 439–447. [Google Scholar] [CrossRef]
  19. Tao, T. Special Cases of Shannon Entropy. Blogpost. 2017. Available online: https://terrytao.wordpress.com/2017/03/01/special-cases-of-shannon-entropy/ (accessed on 1 September 2024).
  20. Yeung, R. A new outlook on Shannon’s information measures. IEEE Trans. Inf. Theory 1991, 37, 466–474. [Google Scholar] [CrossRef]
  21. Lang, L.; Baudot, P.; Quax, R.; Forr’e, P. Information Decomposition Diagrams Applied beyond Shannon Entropy: A Generalization of Hu’s Theorem. arXiv 2022, arXiv:2202.09393. [Google Scholar]
  22. Mazur, D.R. AMS/MAA Textbooks; American Mathematical Society: Providence, RI, USA, 2010. [Google Scholar] [CrossRef]
  23. Tao, T. An Introduction to Measure Theory; American Mathematical Society: Providence, RI, USA, 2011; Volume 126. [Google Scholar]
  24. Wolf, S.; Wullschleger, J. Zero-error information and applications in cryptography. In Proceedings of the Information Theory Workshop, San Antonio, TX, USA, 24–29 October 2004; pp. 1–6. [Google Scholar]
  25. Stanley, R. Enumerative Combinatorics: Volume 1; Cambridge Studies in Advanced Mathematics; Cambridge University Press: Cambridge, UK, 1997. [Google Scholar]
  26. Gutknecht, A.; Wibral, M.; Makkeh, A. Bits and pieces: Understanding information decomposition from part-whole relationships and formal logic. Proc. R. Soc. A Math. Phys. Eng. Sci. 2021, 477, 20210110. [Google Scholar] [CrossRef] [PubMed]
  27. Stoll, R. Set Theory and Logic; Dover Books on Advanced Mathematics; Dover Publications: Mineola, NY, USA, 1979. [Google Scholar]
Figure 1. This diagram illustrates how information in two variables X , Y about a third variable Z can be decomposed into different information atoms. The amount of such information in X , Y and joint variable ( X , Y ) is measured using the mutual information I ( X ; Z ) , I ( Y ; Z ) , and I ( ( X , Y ) ; Z ) correspondingly. Redundant information R is information that is shared between X and Y such that knowing one of them suffices in deducing this information about Z. Unique information U X is found only in X, U Y —only in Y. The synergistic information S that X and Y hold about Z is only contained in the joint variable, but not individual sources on their own.
Figure 1. This diagram illustrates how information in two variables X , Y about a third variable Z can be decomposed into different information atoms. The amount of such information in X , Y and joint variable ( X , Y ) is measured using the mutual information I ( X ; Z ) , I ( Y ; Z ) , and I ( ( X , Y ) ; Z ) correspondingly. Redundant information R is information that is shared between X and Y such that knowing one of them suffices in deducing this information about Z. Unique information U X is found only in X, U Y —only in Y. The synergistic information S that X and Y hold about Z is only contained in the joint variable, but not individual sources on their own.
Entropy 26 00916 g001
Figure 4. A graphical illustration for the general solution of the trivariate problem. Compared to the Venn diagram for three sets, two new regions here are the 2-covered part of triple intersection Π s (synergistic atom) and a ghost atom Π g , which is not a part of any single initial variable. Similarly to Figure 3, colors indicate the coverings: three primary colors (red, yellow, blue, or their checkered combination) correspond to 1-covered atoms, the overlay of any two colors (orange, purple, green or their checkered combination) is 2-covered, and the overlay of all three colors (brown) is 3-covered.
Figure 4. A graphical illustration for the general solution of the trivariate problem. Compared to the Venn diagram for three sets, two new regions here are the 2-covered part of triple intersection Π s (synergistic atom) and a ghost atom Π g , which is not a part of any single initial variable. Similarly to Figure 3, colors indicate the coverings: three primary colors (red, yellow, blue, or their checkered combination) correspond to 1-covered atoms, the overlay of any two colors (orange, purple, green or their checkered combination) is 2-covered, and the overlay of all three colors (brown) is 3-covered.
Entropy 26 00916 g004
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sevostianov, I.; Feinerman, O. Synergy as the Failure of Distributivity. Entropy 2024, 26, 916. https://doi.org/10.3390/e26110916

AMA Style

Sevostianov I, Feinerman O. Synergy as the Failure of Distributivity. Entropy. 2024; 26(11):916. https://doi.org/10.3390/e26110916

Chicago/Turabian Style

Sevostianov, Ivan, and Ofer Feinerman. 2024. "Synergy as the Failure of Distributivity" Entropy 26, no. 11: 916. https://doi.org/10.3390/e26110916

APA Style

Sevostianov, I., & Feinerman, O. (2024). Synergy as the Failure of Distributivity. Entropy, 26(11), 916. https://doi.org/10.3390/e26110916

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop