Next Article in Journal
A Mean-Variance Hybrid-Entropy Model for Portfolio Selection with Fuzzy Returns
Next Article in Special Issue
General Hyperplane Prior Distributions Based on Geometric Invariances for Bayesian Multivariate Linear Regression
Previous Article in Journal
Generalized Stochastic Fokker-Planck Equations
Previous Article in Special Issue
Computing Bi-Invariant Pseudo-Metrics on Lie Groups for Consistent Statistics
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Homological Nature of Entropy †

1
Max Planck Institute for Mathematics in the Sciences, Inselstrasse 22, 04103 Leipzig, Germany
2
Universite Paris Diderot-Paris 7, UFR de Mathematiques, Equipe Geometrie et Dynamique, Batiment Sophie Germain, 5 rue Thomas Mann, 75205 Paris Cedex 13, France
*
Author to whom correspondence should be addressed.
This paper is an extended version of our paper published in Proceedings of the MaxEnt 2014 Conference on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, Amboise, France, 21–26 September 2014.
Entropy 2015, 17(5), 3253-3318; https://doi.org/10.3390/e17053253
Submission received: 31 January 2015 / Revised: 3 May 2015 / Accepted: 5 May 2015 / Published: 13 May 2015
(This article belongs to the Special Issue Information, Entropy and Their Geometric Structures)

Abstract

:
We propose that entropy is a universal co-homological class in a theory associated to a family of observable quantities and a family of probability distributions. Three cases are presented: (1) classical probabilities and random variables; (2) quantum probabilities and observable operators; (3) dynamic probabilities and observation trees. This gives rise to a new kind of topology for information processes, that accounts for the main information functions: entropy, mutual-informations at all orders, and Kullback–Leibler divergence and generalizes them in several ways. The article is divided into two parts, that can be read independently. In the first part, the introduction, we provide an overview of the results, some open questions, future results and lines of research, and discuss briefly the application to complex data. In the second part we give the complete definitions and proofs of the theorems A, C and E in the introduction, which show why entropy is the first homological invariant of a structure of information in four contexts: static classical or quantum probability, dynamics of classical or quantum strategies of observation of a finite system.

1. Introduction

1.1. What is Information?

“What is information?” is a question that has received several answers according to the different problems investigated. The best known definition was given by Shannon [1], using random variables and a probability law, for the problem of optimal message compression. However, the first definition was given by Fisher, as a metric associated to a smooth family of probability distributions, for optimal discrimination by statistical tests; it is a limit of the Kullback–Leibler divergence, which was introduced to estimate the accuracy of a statistical model of empirical data, and which can be also viewed as a quantity of information. More generally Kolmogorov considered that the concept of information must precede probability theory (cf. [2]). However, Evariste Galois saw the application of group theory for discriminating solutions of an algebraic equation as a first step toward a general theory of ambiguity, that was developed further by Riemann, Picard, Vessiot, Lie, Poincare and Cartan, for systems of differential equations; it is also a theory of information. In another direction Rene Thom claimed that information must have a topological content (see [3]); he gave the example of the unfolding of the coupling of two dynamical systems, but he had in mind the whole domain of algebraic or differential topology.
All these approaches have in common the definition of secondary objects, either functions, groups or homology cycles, for measuring in what sense a pair of objects departs from independency. For instance, in the case of Shannon, the mutual information is I(X; Y) = H (X) + H (Y) − H (X,Y), where H denotes the usual Gibbs entropy (H(X) = − Σx P(X = x) ln2 P(X = x)), and for Galois it is the quotient set IGal(L1; L2|K) = (Gal(L1 |K) × Gal(L2|K))/Gal(L|K), where L1, L2 are two fields containing a field K in an algebraic closure Ω of K, where L is the field generated by L1 and L2 in Ω, and where G a l ( L i | K ) = ( for i = 0 , 1 , 2 ) denotes the group introduced by Galois, made by the field automorphisms of Li fixing the elements of K.
We suggest that all information quantities are of co-homological nature, in a setting which depends on a pair of categories (cf. [4,5]); one for the data on a system, like random variables or functions of solutions of an equation, and one for the parameters of this system, like probability laws or coefficients of equations; the first category generates an algebraic structure like a monoid, or more generally a monad (cf. [4]), and the second category generates a representation of this structure, as do for instance conditioning, or adding new numbers; then information quantities are co-cycles associated with this module.
We will see that, given a set of random variables on a finite set Ω and a simplicial subset of probabilities on Ω, the entropy appears as the only one universal co-homology class of degree one. The higher mutual information functions that were defined by Shannon are co-cycles (or twisted co-cycles for even orders), and they correspond to higher homotopical constructions. In fact this description is equivalent to the theorem of Hu Kuo Ting [6], that gave a set theoretical interpretation of the mutual information decomposition of the total entropy of a system. Then we can use information co-cycles to describe forms of the information distribution between a set of random data; figures like ordinary links, or chains or Borromean links appear in this context, giving rise to a new kind of topology.

1.2. Information Homology

Here we call random variables (r.v) on a finite set Ω congruent when they define the same partition (remind that a partition of Ω is a family of disjoint non-empty subsets covering Ω and that the partition associated to a r.v X is the family of subsets Ωx of Ω defined by the equations X(ω) = x); the join r.v YZ, also denoted by (Y, Z), corresponds to the less fine partition that is finer than Y and Z. This defines a monoid structure on the set n(Ω) of partitions of Ω, with 1 as a unit, and where each element is idempotent, i.e., ∀X, XX = X. An information category is a set S of r.v such that, for any Y , Z S less fine than U S, the join YZ belongs to S, cf. [7]. An ordering on S is given by YZ when Z refines Y, which also defines the morphisms Z → Y in the category S. In what follows we always assume that 1 belongs to S. The simplex ∆(Ω) is defined as the set of families of numbers {pω; ω ∊ Ω}, such that ∀ω, 0 ≤ pω ≤ 1 and Σω pω = 1; it parameterizes all probability laws on Ω. We choose a simplicial sub-complex P in Δ(Ω), which is stable by all the conditioning operations by elements of S. By definition, for N ∊ ℕ, an information N-cochain is a family of measurable functions of P P, with values in ℝ or ℂ, indexed by the sequences (S1;…;SN) in S majored by an element of S, whose values depend only of the image law (S1, …, SN)*P. This condition is natural from a topos point of view, cf. [4]; we interpret it as a “locality” condition. Note that we write (S1; …; SN) for a sequence, because (S1, …, SN) designates the joint variable. For N = 0 this gives only the constants. We denote by C N the vector space of N-cochains of information. The following formula corresponds to the averaged conditioning of Shannon [1]:
S 0 . F ( S 1 ; ; S N ; P ) = P ( S 0 = υ j ) F ( S 1 ; ; S N P | S 0 = υ j ) ,
where the sum is taken over all values of S0, and the vertical bar is ordinary conditioning. It satisfies the associativity condition ( S 0 S 0 ) . F = S 0 . ( S 0 . F ).
The coboundary operator δ is defined by
δ F ( S 0 ; ; S N ; P ) = S 0 . F ( S 1 ; ; S N ; P ) + 0 N 1 ( 1 ) F ( ; ( S i , S i + 1 ) ; ; S N ; P ) + ( 1 ) N + 1 F ( S 0 ; ; S N 1 ; P ) ,
It corresponds to a standard non-homogeneous bar complex (cf. [5]). Another co-boundary operator on C N is δt (t for twisted or trivial action or topological complex), that is defined by the above formula with the first term S0.F (S1;…; SN; ℙ) replaced by F (S1;…; SN; ℙ). The corresponding co-cycles are defined by the equations δF = 0 or δt F = 0, respectively. We easily verify that δδ = 0 and δtδt = 0; then co-homology H * ( S ; P ) resp. H t * ( S ; P ) is defined by taking co-cycles modulo the elements of the image of δ resp. δt, called co-boundaries. The fact that classical entropy H(X; ℙ) = − Σi pi log2 pi is a 1-co-cycle is the fundamental equation H(X, Y) = H(X) + X.H (Y).
Theorem A. (cf. Theorem 1 section 2.3, [7]): For the full simplex ∆(Ω), and if S is the monoid generated by a set of at least two variables, such that each pair takes at least four values, then the information co-homology space of degree one is one-dimensional and generated by the classical entropy.
Problem 1. Compute the homology of higher degrees.
We conjecture that for binary variables it is zero, but that in general non-trivial classes appear, deduced from polylogarithms. This could require us to connect with the works of Dupont, Bloch, Goncharov, Elbaz-Vincent, Gangl et al. on motives (cf. [8]), which started from the discovery of Cathelineau (1988) that entropy appears in the computation of the degree one homology of the discrete group SL2 over ℂ with coefficients in the adjoint action (cf. [9]).
Suppose S is the monoid generated by a finite family of partitions. The higher mutual informations were defined by Shannon as alternating sums:
I N ( S 1 ; ; S N ; P ) = k = 1 k = N ( 1 ) k 1 I [ N ] ; c a r d ( I ) = k H ( S I ; P ) ,
where SI denotes the join of the Si such that iI. We have I1 = H and I2 = I is the usual mutual information: I(S; T) = H(S) + H (T) − H(S, T).
Theorem B. (cf. section 3, [7]): I2m = δtδδt…δδtH, I2m+1 = −δδtδδt…δδtH, where there are m − 1 δ and m δt factors for I2m and m δ and m δt factors for I2m+1.
Thus odd information quantities are information co-cycles, because they are in the image of δ, and even information quantities are twisted (or topological) co-cycles, because they are in the image of δt.
In [7] we show that this description is equivalent to the theorem of Hu Kuo Ting (1962) [6], giving a set theoretical interpretation of the mutual information decomposition of the total entropy of a system: mutual information, join and averaged conditioning correspond respectively to intersection, union and difference A\B = ABc. In special cases we can interpret IN as homotopical algebraic invariants. For instance for N = 3, suppose that I(X; Y) = I(Y; Z) = I(Z; X) = 0, then I3(X; Y; Z) = −I ((X,Y); Z) can be defined as a Milnor invariant for links, generalized by Massey, as they are presented in [10] (cf. page 284), through the 3-ary obstruction to associativity of products in a subcomplex of a differential algebra, cf. [7]. The absolute minima of I3 correspond to Borromean links, interpreted as synergy, cf. [11,12].

1.3. Extension to Quantum Information

Positive hermitian n × n-matrices ρ, normalized by Tr(ρ) = 1, are called density of states (or density operators) and are considered as quantum probabilities on E = ℂn. Real quantum observables are n × n hermitian matrices, and, by definition, the amplitude, or expectation, of the observable Z in the state ρ is given by the formula E ( Z ) = T r ( Z p ) (see e.g., [13]). Two real observables Y, Z are said congruent if their eigenspaces are the same, thus orthogonal decomposition of E are the quantum analogs of partitions. The join is well defined for commuting observables. An information structure S is given by a subset of observables, such that, if Y, Z have common refined eigenspaces decomposition in S, their join (Y, Z) belongs to S. We assume that {E} belongs to S. What plays the role of a probability functor is a map Q from S to sets of positive hermitian forms on E, which behaves naturally with respect to the quantum direct image, thus Q is a covariant functor. We define information N-cochains as for the classical case, starting with the numerical functions on the sets QX; XS, which behave naturally under direct images.
The restriction of a density ρ by an observable Y is ρ Y = A E A * ρ E A, where the EA’s are the spectral projectors of the observable Y. The functor Q is said to match S (or to be complete and minimal with respect to S) if, for each XS, the set QX is the set of all possible densities of the form ρX.
The action of a variable on the cochains space C Q * is given by the quantum averaged conditioning:
Y . F ( Y 0 ; ; Y m ; ρ ) = A T r ( E A * ρ E A ) F ( Y 0 ; ; Y m ; E A * ρ E A )
>From here we define coboundary operators δq and δQt by the formula (22), then the notions of co-cycles, co-boundaries and co-homology classes follow. We have δqδq = 0 and δQtδQt = 0; cf. [7].
When the unitary group Un acts transitively on S and Q, there is a notion of invariant cochains, forming a subcomplex of information cochains, and giving a more computable co-homology than the brut information co-homology. We call it the invariant information co-homology and denote it by H U * ( S ; Q ).
The Von-Neumann entropy of ρ is S(ρ) = ℕρ(−log2(ρ)) = −(ρ log2(ρ)); it defines a 0-cochain SY by restricting S to the sets QX. The classical entropy is H ( Y ; ρ ) = A T r ( E A * ρ E A ) log 2 ( T r ( E A * ρ E A ) ). Both these co-chains are invariant. It is well known that S(X,Y)(ρ) = H(X; ρ) + X.SY(ρ) when X, Y commute, cf. [13]. In particular, by taking Y = 1E we see that classical entropy measures the default of equivariance of the quantum entropy, i.e., H(X; ρ) = SX (ρ) − (X.S)(ρ). But using the case where X refines Y, we obtain that the entropy of Shannon is the co-boundary of (minus) the Von Neumann entropy.
Theorem C. (cf. Theorem 3 section 4.3): For n ≥ 4 and when S is generated by at least two decompositions such that each pair has at least four subspaces, and when Q is matching S, the invariant co-homology H U 1 of δq in degree one is zero, and the space H U 0 is of dimension one. In particular, the only invariant 0-cochain such that δS = −H is the Von Neumann entropy.
(This statement, which will be proved below, corrects a similar statement which was made in the announcement [14].)

1.4. Concavity and Convexity Properties of Information Quantities

The simplest classical information structure S is the monoid generated by a family of “elementary” binary variables S1,…,Sn. It is remarkable that in this case, the information functions IN,J = IN(Sj1;…SjN) over all the subsets J = {j1,…,jN} of [n] = {1,…, n}, different from [n] itself, give algebraically independent functions on the probability simplex ∆(Ω) of dimension 2n − 1. They form coordinates on the quotient of ∆(Ω) by a finite group.
Let d denotes the Lie derivative with respect to d = (1,…,1) in the vector space 2 n, and ∆ the Euclidian Laplace operator on 2 n, then ∆ = ∆ − 2−n dd is the Laplace operator on the simplex ∆(Ω) defined by equating the sum of coordinates to 1.
Theorem D. (cf [15]): On the affine simplex ∆(Ω) the functions IN,J with N odd (resp. even) satisfies the inequality ∆IN ≥ 0 (resp. ∆IN ≤ 0).
In other terms, for N odd the IN,J are super-harmonic which is a kind of weak concavity and for N even they are sub-harmonic which is a kind of weak convexity. In particular, when N is even (resp. odd) IN,J has no local maximum (resp. minimum) in the interior of ∆(Ω).
Problem 2. What can be said of the other critical points of IN,J? What can be said of the restriction of one information function on the intersection of levels of other information functions? Information topology depends on the shape of these intersections and on the Morse theory for them.

1.5. Monadic Cohomology of Information

Now we consider the category S * of generalized ordered partitions of Ω over S: they are sequences S = (E1,…,Em) of subsets of Ω such that ⋃jEj = Ω and E i E j = 0 as soon as ij. The number m is named the degree of S. Note the important technical point that some of the sets Ej can be the empty set. In the same spirit we introduce generalized ordered orthogonal decompositions of E for the quantum case; but in this summary, for simplicity we restrict ourselves to the classical case. Also we forget to add generalized to ordered up to now in this summary. A rooted tree decorated by S * is an oriented finite planar tree Γ, with a marked initial vertex s0, named the root of Γ, where each vertex s is equipped with an element Fs of S *, such that edges issued from s correspond to the values of Fs. When we want to mention that we restrict to partitions less fine than a partition X we put an index X, like in S X *.
The notation μ(m; n1,…,nm) denotes the operation which associates to an ordered partition S of degree m and to m ordered partitions Si of respective degrees ni, the ordered partition that is obtained by cutting the pieces of S using the pieces of Si and respecting the order. An evident unit element for this operation is the unique partition n0 of degree 1. The symbol μm denotes the collection of those operations for m fixed. The introduction of empty subsets in ordered partitions insures that the result of μ(m; ni,…,nm) is a partition of length ni +… + nm, thus the μm do define what is named an operad; cf. [10,16]. The axioms of unity, associativity and covariance for permutations are satisfied. See [10,1618] for the definition of operads.
The most important algebraic object which is associated to an operad is a monad (cf. [4,16]), i.e., a functor V from a category A to itself, equipped with two natural transformations μ : V V V and η : R V, which satisfy to the following axioms:
μ ( V μ ) = μ ( μ V ) , μ ( V η ) = I d = μ ( η V )
In our situation, we can apply the Schur construction (cf. [16]) to the μm to get a monad: take for V the real vector space freely generated by S *; it is naturally graded, so it is the direct sum of spaces V(m); m ≥ 1 where the symmetric group S m acts naturally to the right, then introduce, for any real vector space W the real vector space V ( W ) = m 0 V ( m ) S m W m; the Schur composition is defined by V V = m 0 V ( m ) S m V m. It is easy to verify that the collection (μm; m ∊ ℕ) defines a natural transformation V V V, and the trivial partition π0 defines a natural transformation η : R V, that satisfied to the axioms of a monad.
Also we fix a functor of probability laws QX over the category S X(m) be the vector space freely generated over ℝ by the symbols (P,i,m) where P belongs to QX, and 1 ≤ im. In the last section of the second part we show how this space arises from the consideration of divided probabilities. This is apparent on the following definition of the right action of the operad V on the family M X ( m ) ; m N *: a sequence S1,…,Sm or ordered partitions in S X * acts to a generator (P,i,m) by giving the vector Σjpj(Pj,),n) where pj is the probability P(Si = j) and Pj is the conditioned probability P|(Si = j). We denote by θm((P, i, m), (S1,…,Sm)) this vector.
Now we consider the Schur functor M X ( W ) + m M X ( m ) S m W m; the operations θm define a natural transformation θ : M V M, which is an action to the right in the sense of monads, i.e., θ ( F μ ) = θ ( θ V ); θ ○ (η) = Id. (We forgot the index X for simplicity.)
Now we consider the bar resolution of M : . M V ( k + 1 ) M V k , as in Beck (triples,…) [19], and Fresse [16], with its simplicial structure deduced from θ and μ, and the complex of natural transformations of V-right modules C * ( M ) = H o m V ( M V ° * , R ), where R is the trivial right module given by R ( m ) = . As in the classical case, we restrict us to co-chains that are measurable in the probability (P, i, m).
The co-boundary is defined by the Hochschild formula, extended by MacLane and Beck to monads (see Beck [19]):
δ F = F ( θ V k ) i = 0 , , k 1 ( 1 ) i F MV i μ V k i 1 ( 1 ) k F MV k ε .
The cochains are described by families of scalar measurable functions FX(S1;…,Sk; (P, i, m), where S1;…;Sk is a forest of m trees of level k labelled by S X *, and where the value on (P, i, m) depends only on the tree S 1 i ; S 2 i ; ; S k i.
We impose now the condition, named regularity, that F X ( S 1 ; , S k ; ( P , i , m ) ) = F X ( S 1 i ; S 2 i ; ; S k i P ). The regular co-chains form a sub-complex C r * ( M ); by definition, its homology is the arborescent information co-homology.
The regular cochains of degree k are determined by their values for m = 1 and decorated trees of level k, where the co-boundary takes the form:
δ F ( S ; S 1 ; ; S k ; P ) = i P ( S = i ) F ( S 1 i ; ; S k i ; P | ( S = i ) ) + i = 1 i = k ( 1 ) i F ( S ; ; μ ( S i 1 S i ) ; S i + 1 ; ; S k ; P ) + ( 1 ) k + 1 F ( S ; ; S k 1 ; P )
This gives co-homology groups H τ * ( S , P ), τ for tree. The fact that entropy H(S*ℙ) = H(S; ℙ) defines a 1-cocycle is a result of an equation of Fadeev, generalized by Baez, Fritz and Leinster [20], who gave another interpretation, based on the operad structure of the set of all finite probability laws. See also Marcolli and Thorngren [21].
Theorem E. (cf. Theorem 4 section 6.3, [22]): If Ω has more than four points, H τ 1 ( ( Ω ) , Δ ( Ω ) ) is the one dimensional vector space generated by the entropy.
Another co-boundary δt on C r * ( M ) corresponds to another right action of the monad V X, which is deduced from the maps θt that send (P, i, m) ⊗ S1 ⊗… ⊗ Sm) to the sum of the vectors (P, (i, j), n) for j = 1,…, ni that are associated to the end branches of Si. It gives a twisted version of information co-homology as we have done in the first paragraph. This allows us to define higher information quantities for strategies: for N = 2M + 1 odd, Iτ,N = − (δδt)M H, and for N = 2M + 2 even, iτ,n = δt(δδt)M H.
This gives for N = 2, a notion of mutual information between a variable S of length m and a collection T of m variables T1,…,Tm:
I τ ( S ; T i ; P ) = i = 1 i = m ( H ( T i P ) P ( S = i ) H ( T i ; P | S = i ) ) .
When all the Ti are equals we recover the ordinary mutual information of Shannon plus a multiple of the entropy of Ti.

1.6. The Forms of Information Strategies

A rooted tree Γ decorated by S * can be seen as a strategy to discriminate between points in Ω. For each vertex s there is a minimal set of chained edges α1,…,αk connecting s0 to s; the cardinal k is named the level of s; this chain defines a sequence (F0,v0; F1,v1; Fk−1,vk−1) of observables and values of them; then we can associate to s the subset Ωs of Ω where each Fj takes the value uj. At a given level k the sets Ωs form a partition πk of Ω; the first one π0 is the unit partition of length 1, and πl is finer than πl−1 for any l. By recurrence over k it is easy to deduce from the orderings of the values of Fs an embedding in the Euclidian plane of the subtrees Γ(k) at level k such that the values of the variables issued from each vertex are oriented in the direct trigonometric sense, thus πk has a canonical ordering ωk. Remark that many branches of the tree gives the empty set for Ωs after some level; we name them dead branches. It is easy to prove that the set ( S ) * of ordered partitions that can be obtained as a (πk,ωk) for some tree Γ and some level k is closed by the natural ordered join operation, and, as ( S ) * contains π0, it forms a monoid, which contains the monoid M ( S * ) generated by S *.
Complete discrimination of Ω by S * exists when the final partition of Ω by singletons is attainable as a πk; optimal discrimination correspond to minimal level k. When the set Ω is a subset of the set of words x1,…,xN with letters xi belonging to given sets Mi of respective cardinalities mi, the problem of optimal discrimination by observation strategies Γ decorated by S * is equivalent to a problem of minimal rewriting by words of type (F0,v0), (F1,v1),(Fk,vk); it is a variant of optimal coding, where the alphabet is given. The topology of the poset of discriminating strategies can be computed in terms of the free Lie algebra on Ω, cf. [16].
Probabilities ℙ in P correspond to a priori knowledge on Ω. In many problems P is reduced to one element, that is the uniform law. Let s be a vertex in a strategic tree Γ, and let P s be the set of probability laws that are obtained by conditioning through the equations Fi = i = 0,…,k − 1 for a minimal chain leading from s0 to s. We can consider that the sets P s for different s along a branch measure the evolution of knowledge when applying the strategy. The entropy H(F;s) for F in S * and ℙs in P s gives a measure of information we hope to obtain when applying F at s in the state ℙs. The maximum entropy algorithm consists in choosing at each vertex s a variable that has the maximal conditioned entropy H(F; ℙs).
Theorem F. (cf. [22]): To find one false piece of different weight among N pieces for N ≥ 3, when knowing the false piece is unique, by the minimal numbers of weighing, one can use the maximal entropy algorithm.
However we have another measure of information of the resting ambiguity at s, by taking for the Galois group Gs the set of permutations of Ωs which respects globally the set P s and the set of restrictions of elements of S * to Ωs, and which preserves one by one the equations Fi = vi. Along branches of Γ this gives a decreasing sequence of groups, whose successive quotients measure the evolution of acquired information in an algebraic sense.
Problem 3. Generalize Theorem F. Can we use algorithms based on the Galoisian measure of information? Can we use higher information quantities associated to trees for optimal discrimination?

1.7. Conclusion and Perspective

Concepts of Algebraic topology were recently applied to Information theory by several researchers. In particular notions coming from category theory, homological algebra and differential geometry were used for revisiting the nature and scope of entropy, cf. for instance Baez et al. [20], Marcolli and Thorngren [21] and Gromov [23]. In the present note we interpreted entropy and Shannon information functions as co-cycles in a natural co-homology theory of information, based on categories of observable and complexes of probability. This allowed us to associate topological figures, like Borromean links, with particular configuration of mutual dependency of several observable quantities. Moreover we extended these results to a dynamical setting of system observation, and we connected probability evolutions with the measures of ambiguity given by Galois groups. All those results provide only the first steps toward a developed Information Topology. However, even at this preliminary stage, this theory can be applied to the study of distribution and evolution of Information in concrete physical and biological systems. This kind of approach already proved its efficiency for detecting collective synergic dynamic in neural coding [12], in genetic expression [24], in cancer signature [25], or in signaling pathways [26]. In particular, information topology could provide the principles accounting for the structure of information flows in biological systems and notably in the central nervous system of animals.

2. Classical Information Topos. Theorem One

2.1. Information Structures and Probability Families

Let Ω be a finite set, the set Π(Ω) of all partitions of Ω constitutes a category with one arrow Y → Z from Y to Z when Y is more fine than Z, we also say in this case that Y divides Z. In Π(Ω) we have an initial element, which is the partition by points, denoted ω and a final element, which is Ω itself and is denoted by 1. The joint partition YZ or (Y, Z), of two partitions Y, Z of Ω is the less fine partition that divides Y and Z, i.e., their gcd. For any X we get XX = X, ωX = ω and 1.X = X.
By definition an information structure S on Ω is a subset of Π(Ω), such that for any element X of S, and any pair of elements Y, Z in S that X refines, the joint partition YZ also belongs to S.
In addition we will always assume that the final partition 1 belongs to S. In terms of observations, it means that at least something is a certitude.
Examples: start with a set Σ = {Si; 1 ≤ in} of partitions of Ω. For any subset I = {i1,…, ik} of [n] = {1,…, n}, the joint (Si1,…, Sik), also denoted SI, divides each Sij. The set W = W(Σ) of all the SI, when I describes the subsets of [n] is an information struture. It is even a commutative monoid, because any product of elements of W belongs to W, and the partition associated to Ω itself gives the identity element of W. The product S[n] of all the Si is maximal; it divides all the other elements. As Π(Ω) the monoid W(Σ) is idempotent, i.e., for any X we have XX = X.
By definition, the faces of the abstract simplex ∆([n]) are the subsets of [n]; its vertices are the singletons. Thus the monoid W(Σ) can be identified with the first barycentric subdivision of the simplex ∆([n]).
Remind that a simplicial subcomplex of ∆([n]) is a subset of faces that contains all faces of any of its elements. Then any simplicial sub-complex K of ∆([n]) gives a simplicial information structure S ( K ), embedded in W(Σ). In fact, if Y and Z are faces of a simplex X belonging to K, YZ is also a face in X, thus it belongs to K. The maximal faces Σa; aA of K correspond to the finest elements in S ( K ); the vertices of a face Σa gives a family of partitions, which generates a sub-monoid Wa = Wa) of W; it is a sub-information structures (full sub-category) of S ( K ), having the same unit, but having its own initial element ωa. These examples arise naturally when formalizing measurements if some obstructions or a priori decisions forbid a set of joint measurements.
This kind of examples were considered by Han [27] see also McGill [28].
Example 1. Ω has four elements (00), (01), (10), (11); the variable S1 (resp. S2) is the projection pr1 (resp. pr2), on E1 = E2 = {0, 1}; Σ is the set {S1, S2}. The monoid W(Σ) has four elements 1, S1, S2, S1 S2. The partition S1S2 = S2S1 corresponds to the variable Id : Ω → Ω.
Example 2. Same Ω as before, with the same names for the elements, but we take all the partitions of Ω in S. In addition to 1, S1, S2 and S = S1S2, there is S3, the last partition in two subsets of cardinal two, which can be represented by the sum of the indices: S3(00) = 0, S3(11) = 0, S3(01) = 1, S3(10) = 1, the four partitions Yω, for ω ∊ Ω, formed by a singleton {ω} and its complementary, and finally the six partitions Xμν = YμYν, indexed by pairs of points in Ω satisfying p < ν in the lexical order. The product of two distinct Y is a X, the product of two distinct X or two distinct Si is S, the product of one Y and a Si is a X, of one Y and a X is this X or S, of one S and a X is this X or S. In particular the monoid W is also generated by the three Si and the four Yω; it is called the monoid of partitions of Ω, and the associative algebra Λ ( S ) of this monoid is called the partition algebra of Ω.
Example 3. Same Ω as before, that is Ω = ∆(4), with the notations of example 2 for the partitions; but we choose as generating family the set ϒ of the four partitions Yμ; μ ∊ Ω; the joint product of two such partitions is either a Yμ (when they coincide) or a Xμv (when they are different). The monoid W(ϒ) has twelve elements.
Example 4. Ω has 8 elements, noted (000),…,(111), and we consider the family Σ of the three binary variables S1, S2, S3 given by the three projections. If we take all the joints, we have a monoid of eight elements. However, if we forbid the maximal face (S1, S2, S3), we have a structure S which is not a monoid; it is the set formed by 1, S1, S2, S3 and the three joint pairs (S1, S2), (S1, S3), (S2, S3).
On the side of probabilities, we choose a Boolean algebra B of sets in Ω, i.e., a subset B of the set P ( Ω ) of subsets of Ω that contains the empty set 0 and the full set Ω, and is closed by union and intersection. In this finite context, it is easy to prove that B is constituted by all the unions of its minimal elements (called atoms). Associated to this case, we will consider only information structures that are made by partitions whose each element belongs to B. Consequently we could replace everywhere Ω by the finite set Ω B of the atoms of B, but we will see that several Boolean sub-algebras appear naturally in the process of observation, thus we prefer to mention the choice of B at the beginning of observations. Then we consider the set Δ ( Ω B ), or Δ ( B ), of all probability laws on ( Ω , B ), i.e., all real functions px of the atoms x of B (the points of Ω B), satisfying px ≥ 0 and Σx px = 1. We see that this set of probabilities is also a simplex ∆([N]), where N is the cardinality of Ω B.
As on the side of partitions, we will consider more generally any simplicial sub-complex Q of Δ ( B ), and call it a probability complex. In the appendix, we show that this kind of examples correspond to natural forbidding rules, that can express physical constraints on the observed system.
A partition Y which is measurable with respect to B is made by elements Yi for i = 1, …, m, belonging to B. Let P be an element of Δ ( B ); the conditioning of P by the element Yi is defined only if P(Yi) ≠ 0, and given by the formula P(B|Y = yi) = P(BYi)/P(Yi). We will consider it as a probability on Ω equipped with B, not as a probability on Yi. Remark that if P belongs to a simplicial family Q, the probability P(B|Y = yi) is also contained in Q. In fact, if the smallest face of Q which contains P is the simplex a on the vertices x1,…,xk, then the conditioning of P by Yi, being equal to 0 for the other atoms x, belongs to a face of σ, which is in Q, because Q is a complex.
For a probability family Q, i.e., a set of probabilities on Ω, and a set of partitions S, we say that Q and S are adapted one to each other if the conditioning of every element of Q by every element of S belongs to Q.
By definition, the algebra B Y is the set of unions of elements of the partition Y. We can consider it as a Boolean algebra on Ω contained in B or as Boolean algebra on the quotient set Ω/Y. The image Y*Q of a probability Q for B by the partition Y is the probability on Ω for the sub-algebra B Y, that is given by Y * Q(t) = Q(t) for t ∊ B Y. It is the forgetting operation, also frequently named marginalization by Y.
By definition, the set Q Y is the image of Y*. Let us prove that it is a simplicial sub-complex of Δ ( B Y ): take a simplex σ of Q, denote its vertices by x1,…,xk, note δj the Dirac mass of xj, and look at the partition σi = Yiσ of σ induced by Y, then for all the xjσi the images Y* δj coincide. Let us denote this image by δ(Y, σi); it is an element of Q Y. For every law Q in a, the image Y*Q belongs to the simplex on the laws δ(Y, σi), and any point in this simplex belongs to Q Y. Q.E.D.
If XY is an arrow in Π ( Ω B ), the above argument shows that the map Q X Q Y is a simplicial mapping.
Conditioning by Y and marginalization by Y* are related by the barycentric law (or theorem of total probability, Kolmogorov 1933 [29]): for any measurable set A in B we have
P ( A ) = P ( Y = y 1 ) P | ( Y = y 1 ) ( A ) + + P ( Y = y m ) P | ( Y = y m ) ( A ) .
Remark that the notions of information structures and probability complexes extend to infinite sets; this is developed in paper [7].
In this context, we have a formula for any integrable function φ on Ω with respect to P:
Ω φ ( ω ) d P ( ω ) = Ω / Y d ( Y * P ) ( ω ) Ω φ ( ω ) d ( P | ( Y = ω ) ) ( ω ) .
Consider a finite set Ω, equipped with a Boolean algebra B, a probability family Q for it and an information structure S adapted to B.
For each object X in S, the set S X made by the partitions Y that are divided by X is a closed sub-category, possessing an internal law of monoid. The object X is initial. To any arrow X → Y is associated the inclusion S Y S X, thus we get a contra-variant functor from S to the category of monoids.
On the other side we have a natural co-variant functor of S to the category of sets, which associates to each partition X S the set Q X of probability laws in the image of Q on the quotient set Ω/X, and which associates to each arrow X → Y the surjection Q X Q Y which is given by direct image PXY*PX. If Q is simplicial the functor goes to the category of simplicial complexes.
Definition 1. For X S, the functional module F X ( Q ) is the real vector space of measurable functions on the space Q X; for each arrow of divisibility X → Y, we have an injective linear map ffY|X from Y to X given by
f Y | X ( P X ) = f ( Y * P X ) .
In this manner, we obtain a contra-variant functor from the category S to the category of real vector spaces.
If Q and S are adapted one to each other, the functor admits a canonical action of the monoid functor X S X, given by the average formula
( Y . f ) ( P ) = d Y * P ( y ) f ( P | ( Y = y ) ) .
To verify this is an action of monoid, we must verify that for any Z which divides Y, and any f ∊ Y, we have, in X the identity
( Z . f ) Y | X = Z . ( f Y | X ) ;
that means, for any P Q X:
E z d Z * P ( z ) f Y | X ( P | ( Z = z ) ) = E z d Z * P ( z ) f ( ( Y * P ) | ( Z = z ) ) .
But this results from the identity Y*(P|(Z = z)) = (Y*P)|(Z = z) due to Y*P(Z = z) = P(Z = z). The arrows of direct images and the action of averaged conditioning satisfy the axiom of distributivity: if Y and Z divide X, but not necessarily Z divides Y, we have
Z . ( f Y ) ( P , X ) = ( Z , Y ) ( ( Z , Y ) * P , ( Y , Z ) ) = ( Z . f ) ( Z , Y ) ( P , X ) .
Proof. The first identity comes from the fact that (Z,Y)*(P|(Z = z)) = Y*(P|(Z = z)); the second one follows from the fact that we have an action of the monoid S X.
As the formula (12) is central in our work, we insist a bit on it, and comment its meaning, at least in this finite setting:
Let Pf (P) be an element of X, and Y be the goal of an arrow X → Y, we have
Y . f ( P ) = j P ( Y = y i ) f ( P | Y = j ) .
where j describes the indices of the partition Y.
We will see when discussing functions of several partitions that this formula is due to Shannon and correspond to conditional information.
Lemma 1. for any pair (Y, Z) of variables in S X, and any F for which the integrals converge, we have (Y,Z).F = Y.(Z.F).
Proof. We note pi the probability that Y = yi, πij the joint probability of (Y = yi, Z = zj), and qij the conditional probability of Z = zj knowing that Y = yi, then
( Y , Z ) . F ( P ) = i j π i j F ( P | ( Y = y i , Z = z i ) ) = i p i ( j q i j F ( P | ( Y = y i , Z = z i ) ) = i p i ( j q i j F ( P | ( Y = y i ) ) | ( Z = z i ) ) = i p i ( Z . F ) ( P | ( Y = y i ) ) = Y . ( Z . F ) ( P ) .
Remark 1. In the general case, where Ω is not necessarily finite and B is any sigma-algebra, the Lemma 1 is a version of the Fubini theorem.
Let us consider the category S equipped with the discrete topology, to get a site (cf. SGA [30]). Over a discrete site every presheaf is a sheaf. The contravariant functor X S X gives a structural sheaf of monoids, and by passing to the algebras A X over ℝ which are generated by the (finite) monoids, we get a sheaf in rings, thus S becomes a ringed site. Moreover, by considering all contra-variant functors X N X from S to modules over the algebra functor A, we obtain a ringed topos, that we name the information topos associated to Ω , B , S. This ringed topos concerns only the observables given by partitioning.
Take now in account a probability family Q which is adapted to S, for instance a simplicial family; we obtain a functor XQX translating the marginalization by the partitions, considered as observable quantities, and the conditioning by observables is translated by a special element XX of the information topos.
In this way it is natural to expect that topos co-homology, as introduced by Grothendieck, Verdier and their collaborators (see SGA 4 [30]), captures the invariant structure of observation, and defines in this context what information is. This is the main outcome of our work.
As a consequence of Grothendieck’s article (Tohoku, 1957 [31]), a ringed topos possesses enough injective objects, i.e., any object is the sub-object of an injective object, moreover, up to isomorphism, there is a unique minimal injective object containing a given object, called its injective envelope (cf. Gabriel, seminaire Dubreil, exp. 17 [32]). Thus each object in the category D S of modules over a ringed site S possesses a canonical injective resolution I*(N); then the group E x t D n ( M , N ) can be defined as the homology of the complex H o m D ( M , I n ( N ) ). Those groups are denoted by Hn(M; N).
The “comparison theorem” (cf. Bourbaki, Alg.X Th1, p. 100 [33], or MacLane 1975, p. 261 [5]) asserts that, for any projective (resp. injective) resolution of M (resp. N) there exists a natural map of complexes between the resulting complex of homomorphisms and the above canonical complex, and that this map induces an isomorphism in co-homology.
In our context, we take for M the trivial constant module R S over S, and we take for N the functional module F ( Q ).
The existence of free resolutions of R S makes things easier to handle.
Hence we propose that the natural information quantities are classes in the co-homology groups H * ( R S , F ( Q ) ).
This is reminiscent of Galois co-homology see SGA [30], where M is also taken as the constant sheaf over the category of G-objects seen as a site.
In [7] we develop further this more geometric approach, by considering several resolutions. But in this paper, in order to be concrete, we will only focus on a more elementary approach, associated to a special resolution, called the non-homogeneous bar-resolution, which also leads to the general result. This is the object of the next section.

2.2. Non-Homogeneous Information Co-Homology

For each relative integer m ≥ 0, and each object X S, we consider the real vector space Sm(X), freely generated by the m-uples of elements of the monoid S X, and we define Cm(X) as the real vector space of linear functions from Sm(X) to the space X of measurable functions from Q X to ℝ.
Then we define the set C m of m-cochains as the set of collections FXCm(X) satisfying the following condition, named joint locality:
For each Y divided by X, when each variable Xj is divided by Y, we must have
F Y ( X 1 ; ; X m ; Y * P ) = F X ( X 1 ; ; X m ; P ) .
Thus a co-chain F is a natural transformation from the functor Sm (X) from S to the category of real vector spaces to the functor of measurable functions on Q X. Hence, F is not an ordinary numerical function of probability laws ℙ and a set (Xi,…,Xm) of m random variables, but we can speak of its value FX(X1;…;Xm; ℙ) for each X in S. For X given the co-chains form a sub-vector space C m ( X ) of Cm(X).
If we apply the condition to Y = (X1,…,Xm) we find that F(X1;…; Xm; ℙ) depends only on the direct image of ℙ by the joint variable of the Xi’s. This implies that, if F belongs to C m ( X ), we have
F ( X 1 ; ; X m ; P ) = F ( X 1 ; ; X m ; ( X 1 X n ) * P ) ,
Conversely, suppose that F satisfies the conditions (18) and consider X, Y two variables such that X divides Y, and that Y divides each Xj, and let P be a probability in Q X; then the joint variable Z =(Xi,…,Xm) divides Y and X, thus we have Z*P = Z*(X*P) = Z*(Y*P), and
F ( X 1 ; ; X m ; Y * P ) = F ( X 1 ; ; X m ; Z * P ) = F ( X 1 ; ; X m ; X * P ) .
Which proves that F belongs to C m ( X ).
Let F be an element of C m ( X ), and Y an element of S X; then we define
Y . F ( X 1 ; ; X m ; P ) = P ( Y = y j ) F ( X 1 ; ; X m ; P | Y = y i ) .
It follows from the equivalent condition (18) that Y.F also belongs to C m ( X ).
Moreover, the proof of Lemma 1 applies and give that, for any pair (Y, Z) of variables in S X, and any F in C m ( X ), we have (Y, Z).F = Y.(Z.F).
Thus (1) defines an action of the semigroup S X on the vector spaces C m ( X ).
Remark 2. The operation of S X can be rewritten more compactly by using integrals:
Y . F ( X 1 ; ; X m ; ) = Ω F ( X 1 ; ; X m ; | Y = Y ( ω ) ) ) d P ( ω ) .
The differential δ for computing co-homology is given by the Eilenberg-MacLane formula (1943):
δ m F ( Y 1 ; ; Y m + 1 ; P ) = Y 1 . F ( Y 2 ; ; Y m + 1 ; P ) + 1 m ( 1 ) i F ( ; ( Y i , Y i + 1 ) ; ; Y m + 1 ; P ) + ( 1 ) m + 1 F ( Y 1 ; ; Y m ; P ) .
Since this formula corresponds to the standard inhomogeneous bar-resolution in the case of semi-groups and algebras (Cf. MacLane p. 115 [4] and Cartan-Eilenberg pp. 174–175. [34]), we name δ the Hochschild co-boundary, as in the case of semi-groups, and algebras.
Remark that a function F satisfying the joint locality condition, (i.e., the hypothesis that F(Y1;…; Ym; P) depends only on (Y1,…, Ym)*P), has a co-boundary which is also jointly local, because the variables appearing in the definition are all joint variables of the Yj. (This this would not have been true for the stronger locality hypothesis asking that F depends only on the collection (Yj)*P; j = 1,…,m.)
It is easy to verify that δmδm−1 = 0. We denote by Zm the kernel of δm and by Bm the image of δm−1. The elements of Zm are named m-cocycles, we consider them as information quantities, and the elements of Bm are m-coboundaries.
Definition 2. For m ≥ 0, the quotient
H m ( C * ) = Z m / B m
is the m-th cohomology group of information of the information structure S on the simplicial family of probabilities Q. We denote it by H m ( S ; Q ).
The information co-homology satisfies functoriality properties:
Consider two pairs of information structures and probability families, ( S , Q ) and ( S , Q ) on two sets Ω, Ω′ equipped with the σ-algebras , respectively, and φ a surjective measurable map from ( Ω , ) to ( Ω , ), such that Q φ * ( Q )(i.e., φ * ( Q ) Q for every Q Q), and such that S φ * S (i.e., X S , X S , X = X φ); then we have the following construction:
Proposition 1. For each integer m ≥ 0, a natural linear map
φ * : H m ( Q ; S ) H m ( Q ; S ) ,
is defined by the following application at the level of local co-chains:
φ * ( F ) ( X 1 ; ; X m ; P ) = F ( X 1 ; ; X m ; φ * ( P ) ) ,
for a collection of variables X j ; j = 1 , , m satisfying X j = X j φ for each j.
Proof. First, remark that Xj=Xjφ implies X j = X j because φ is surjective. As F′ is (jointly) local, the co-chain F = φ* (F′) is also (jointly) local. Finally, it is evident that the map F′ ↦ F commutes with the co-boundary operator. Therefore the proposition follows.
Another co-homological construction works in the reversed direction:
Consider two information structures ( S , Q ) and ( S , Q ) on two sets Ω, Ω′ equipped with σ-algebras , respectively, and φ a measurable map from ( Ω , ) to ( Ω , ), such that Q φ * ( Q ) (i.e., Q Q , Q Q , Q = φ * ( Q )), and such that φ * S S (i.e., X S , X φ S); then the following result is true:
Proposition 2. For each integer m ≥ 0, a natural linear map
φ * : H m ( Q ; S ) H m ( Q ; S ) ,
is defined by the following application at the level of co-chains:
φ * ( F ) ( X m ; ; X m ; P ) = F ( X 1 φ ; ; X m φ ; P ) ,
for a probability law P Q and its image P′ = φ* (P).
Proof. First, remark that, if Q also satisfies P′ = φ*(Q), we have F ( X 1 φ ; ; X m φ ; P ) = F ( X 1 φ ; ; X m φ ; Q ). To establish that point, let us denote X j = X j φ ; j = 0 , , m, and X = ( X 1 , , X m ), X= (X1,…,Xm) the joint variables; the quantity F ( X 1 φ ; ; X m φ ; P ) depends only on X*P, but this law can be rewritten X * P , which is also equal to X*Q. In particular, if F is local, then F′ = φ* F is local.
As it is evident that the map FF′ commutes with the co-boundary operator, the proposition follows.
Remark this way of functoriality uses the locality of co-cycles.
Corollary 1. In the case where Q = φ * ( Q ) and S = φ * S , the maps φ* and φ* in information co-homology are inverse one of each other.
This is our formulation of the invariance of the information co-homology for equivalent information structures.
When m = 0, co-cochains are functions f of PX in Q X such that f(Y* PX) = f (PX) for any Y multiple of X (i.e., coarser than X). As we assume 1 belongs to S, and the set Q1 has only one element, f must be a constant. And every constant is a co-cycle, because
δ . f ( X 0 ; P ) = X 0 . f ( P ) f ( P ) = j P ( X 0 = x j ) f ( P | X 0 = x j ) f ( P ) = f ( 1 ) ( 1 1 ) = 0 .
Consequently H0 is ℝ. This corresponds to the hypothesis 1 S, meaning connexity of the category. If m components exist, we recover them in the same way and H0 is isomorphic to ℝm.
We now consider the case m = 1. From what precedes we know that there is no non-trivial co-boundary.
Non-homogeneous 1-cocycles of information are families of functions fX (Y; PX), measurable in the variable P in Q, labelled by elements Y S X, which satisfies the locality condition, stating that each time we have ZXY in S, we have
f X ( Y ; X * P z ) = f z ( Y ; P z )
and the co-cycle equation, stating that for two elements Y, Y′ of S X, we have
f ( ( Y , Y ) ; P ) = f ( Y ; P ) + Y . f ( Y ; P ) .
Remark that locality implies that it is sufficient to know the fY(Y; Y*P) to recover fX(Y; P) for all partition X in S that divides Y.
It is in this sense that we frequently omit the index X in fX.
Remark also that for any 1-cocycle f we have f (1; P) = 0.
In fact, the co-cycle equation tells that
f ( ( 1 , 1 ) ; P ) = f ( 1 ; P ) + 1 . f ( 1 ; P ) .
but
1 . f ( 1 ; P ) = f ( 1 ; P | 1 = 1 ) = f ( 1 ; P ) ,
and (1, 1) = 1,thus f (1; P) = 0.
More generally, for any X, and any value xi of X, we have
f ( X ; P | ( X = x i ) ) = 0 ,
In fact a special case of Equation (30) is
f ( ( X , X ) ; P ) = f ( X ; P ) + X . f ( X ; P ) .
which implies X.f (X; P) = 0; however, by definition,
X . f ( X ; P ) = i P ( X = x i ) f ( X ; P | ( X = x i ) ) ,
thus for every i we must have f(X; P|(X = xi)) = 0, due to P ≥ 0. This generalizes f (1; P) = 0 for any P, because, for a probability conditioned by X = xi, the partition X appears the same as 1, that is a certitude.
Remark also that for each pair of variables (X, Y), a 1-cocycle must satisfy the following symmetric relation:
f ( Y ; ) Z . f ( Y ; ) = f ( Z ; ) Y . f ( Z ; ) .

2.3. Entropy

Any multiple of the Shannon entropy is a non-homogeneous information co-cycle. Remind that entropy H is defined for one partition X by the formula
H ( X ; ) = i p i log p i ,
where the pi denotes the values of ℙ on the elements of the partition X. In particular the function H depends only on X*(ℙ), which is locality. The co-cycle equation expresses the fundamental property for an information quantity, writen by Shannon:
H ( X , Y ) = H ( X ) + H X ( Y )
Thus every constant multiple f = λH of H defines a co-cycle. Remark that the corresponding “homogeneous 1-cocycle” is the entropy variation:
F ( X ; Y ; ) = H ( X ; ) H ( Y ; ) .
This means that it satisfies the “invariance property”:
F ( ( Z , X ) ; ( Z , Y ) ) = H ( Z , X ) H ( Z , Y ) = H ( Z ) + H z ( X ) H ( Z ) H Z ( Y ) = Z . F ( Z ; Y ) ,
and the “simplicial equation”:
F ( Y ; Z ) F ( X ; Z ) + F ( X ; Y ) = 0
Note that the entropy variation H(X; P) − H(Y; P) exists in a wider range of condition, i.e., when Ω is infinite, if the laws of X and Y are absolutely continuous with respect to a same probability law ℙ0: we only have to replace the finite sum by the integral of the function −φ log φ where φ denotes the density with respect to ℙ0. Changing the reference law ℙ0 changes the quantities H(X) and H(Y) by the same constant, thus does not change the variation H(X; P) − H(Y; P).
We will prove now that, for many simplicial structures S, and sufficiently large adapted probability complexes Q, any information co-homology class of degree one is a multiple of the entropy class.
In particular this would be true for S = W ( Σ ) and Q = Δ ( Ω ), when Σ has more than two elements and Ω more than four elements, but this is also true in more refined situation, as we will see.
We assume that the functor of probabilities Q X contains all the laws on Ω/X, when X belongs to S. In such a case, by definition, we say that Q is complete with respect to S.
Let us consider a probability law P in Q and two partitions X, Y in the structure S, such that the joint XY belongs to S. We denote by Greek letters α,β,… the indices labelling the partition Y and by Latin letters k,l,… the indices of the partition X; the probability that X = ξk,Y = ηα is noted pk,α, then the probability of X = ξk is equal to pk = Σα pk,α and the probability of Y = ηα is equal to qα = Σk pk,α. To simplify the notations, let us write F = f (X; p),G = f ((Y, X); ℙ),H = f (Y; ℙ), Fα = f (X; ℙ|(Y = ηα)),Hk = f (Y; P|(X = ξk)).
The Hochschild co-cycle equation gives
α q α F α ( p k 1 , α q α , , p k m , α q α ) = G ( ( p k , α ) ) H ( q α 1 , , q α n )
But we also have the relation obtained by exchanging X and Y, which gives
k p k H k ( p k , α 1 p k , , p k , α n p k ) = G ( ( p k , α ) ) F ( p k 1 , , p k m ) .
Suppose that pk,α = 0 except when α = α1 and k = k2, k3,…,km or α = α2 and k = k1; we put p k i , α 1 = x i; i = 2,…,m and p k 1 , α 2 = x 1, which implies that we have x1 + x2 +… + xm = 1. Then Equation (33) implies that each term H in Equation (42) is zero, because only one value of the image law is non-zero, thus we can replace the only term G by F ( p k 1 , , p k m ), and we get from Equation (41):
H ( 1 x 1 , x 1 , 0 , , 0 ) = F ( x 1 , x 2 , , x m ) ( 1 x 1 ) F α 1 ( 0 , x 2 1 x 1 , , x m 1 x 1 ) .
Only the term F for α1 subsists because, the possible other one, for α2, concerns a certitude.
Consequently, by imposing x2 = 1 − x1 = a, x3 =… = xm = 0, we deduce the identity H (a, 1 − a, 0,…, 0) = F(1 − a, a, 0,…, 0). This gives a recurrence equation to calculate F from the binomial case:
F ( x 1 , x 2 , , x m ) = F ( x 1 , 1 x 1 , 0 , , 0 ) + ( 1 x 1 ) F ( 0 , x 2 1 x 1 , , x m 1 x 1 ) .
That is due to the fact that Fα1 is a special case of F, thus independent from Y and α1.
Then coming back to the co-cycle equation, we obtain in particular a functional equation for the binomial variables.
Lemma 2. With the notations of the example 1 (cf. example 1), Ω = {(00), (01), (10), (11)}, S1 (resp. S2) the projection pr1 (resp. pr2), on E1 = E2 = {0,1}, S = {S1, S2}; then the (measurable) information co-homology of degree one is generated by the entropy, i.e., there exists a constant C such that, for any X in W ( Σ ) , P P , f ( X ; P ) = C H ( X ; P ).
Proof. We consider a 1-cocycle f. We have f(1; P) = 0. Let us note fi(P) = f(Si; P), and fijk (u) the function f (Si; P|(Sj = k)), the variable u representing the probability of the first point in the fiber Sj = k in the lexicographic order. For each tableau 2 × 2, P = (p00, p01, p10, p11), the symmetry formula (36) gives
( p 00 + p 10 ) f 120 ( p 00 p 00 + p 10 ) + ( p 01 + p 11 ) f 121 ( p 01 p 01 + p 11 ) f 1 ( P ) = ( p 00 + p 01 ) f 210 ( p 00 p 00 + p 01 ) + ( p 10 + p 11 ) f 211 ( p 10 p 10 + p 11 ) f 2 ( P )
imposing p10 = 0,p00 = u,p11 = v,p01 = 1 − uv in this relation, we obtain the equation:
( 1 u ) f 1 ( 0 , 1 u v 1 u , 0 , v 1 u ) f 1 ( u , 1 u v , 0 , v ) = ( 1 v ) f 2 ( u 1 v , 1 u v 1 v , 0 , 0 ) f 2 ( u , 1 u v , 0 , v ) .
By hypothesis, f1, f2 depend only on the image law by S1, S2 respectively, thus, again by noting a binomial probability from the value of the first element in lexicographic order, we get
( 1 u ) f 1 ( 1 u v 1 u ) f 1 ( 1 v ) = ( 1 v ) f 2 ( u 1 v ) f 2 ( u ) .
By equating u to 1 − v, we find that f1(u) = f2(u); then we arrive to the following functional equation for h = f1 = f2:
h ( u ) h ( v ) = ( 1 v ) h ( u 1 v ) ( 1 u ) h ( v 1 u )
This is the functional equation which was considered by Tverberg in 1958 [35]. As a result of the works of Tverberg [35], Kendall [36] and Lee (1964, [37]), (see also Kontsevich, 1995 [38]), it is known that every measurable solution of this equation is a multiple of the entropy function:
h ( x ) = C ( x log ( x ) + ( 1 x ) log ( 1 x ) ) .
>From here it follows that, for any m-uple (x1, …, xm) of real numbers such that x1 + + xm = 1,
F ( x 1 , x 2 , , x m ) = C i x i log ( x i ) .
The same is true for H and G with the appropriate number of variables.
A pair of variables X, Y , such that X, Y, (XY) belong to S, is called an edge of S; we says this edge is rich if X and Y contain at least two elements and (X, Y) at least four elements which cross the elements of X and Y , in such a manner that the Lemma 2 applies if Q is complete. We say that S is connected, if every pair of elements X, X′ in S can be joined by a sequence of edges. We say that S is sufficiently rich if each vertex belongs to at least one rich edge. By the the recurrence Equation (100), these two conditions guaranty that the constant C which appears in the Lemma 2 is the same for all rich edges. Then the same recurrence Equation (100) implies that the whole co-cycle is equal to CH. If S has m connected components, we get necessarily m independent constants.
Thus we have established the following result:
Theorem 1. For every connected structure of information S, which is sufficiently rich, and every set of probability Q, which is complete with respect to S, the information co-homology group of degree one is one-dimensional and generated by the classical entropy.
The theorem applies to rich simplicial complexes, in particular to the full simplex S = W ( Σ ), which is generated by a family Σ of partitions S1, …, Sn, when n ≥ 2, such that, for every i at least of the pairs (Si, Sj) is rich.
Note that most of the axiomatic characterizations of entropy have used convexity, and recurrence over the dimension, see Khintchin [39], Baez et al. [20].
In our characterization, we assumed no symmetry hypothesis, this was a consequence of co-homology. Moreover, we do not assume any stability property relating to a higher dimensional simplex, this was also a consequence of the homological definition.
There exists a notion of symmetric information co-homology:
The group of permutations S ( , ), made by the permutations of Ω that respect the algebra , acts naturally on the set of partitions Π(Ω); in fact, if X ∈ Π(Ω) is made by the subsets Ω1, , Ωk, the partition σX is made by the subsets σ1(Ω1), …, σ1 (Ω1), in such a manner that, if σ, τ are two permutations of Ω, we have τ(σX) = (στ)X.
We say that a classical information structure S on ( , ) is symmetric if it is closed by the action of the group of permutations S ( , ), i.e., if X ∈ S, and σ ∈ S(Ω), the partition σX also belongs to S.
In the same way, we say that a probability functor Q is symmetric, if it is stable under local permutations, i.e., if X S and P Q X, and if σ S ( / X ), then the probability law σP = Pσ on Ω/X also belongs to Q X.
Remark that we also have τσP = (στ)P). Thus the actions of symmetric groups are defined here on the right. However, we have actions to the left by taking σ = (σ1). For the essential role of symmetries in information theory, see the article of Gromov in this volume.
A m-cochain F X : S m × Q X is said symmetric, when, for every X S, every probability P Q X, every collection of partitions Y1, …, Ym in S X, we have
F σ * X ( σ * Y 1 ; ; σ * Y m ; σ * P ) = F X ( Y 1 ; ; Y m ; P ) .
It is evident that symmetric cochains form a subcomplex of the information cochains complex; i.e., the coboundary of a symmetric cochain being a symmetric cochain. Consequently we get a symmetric information co-homology, that we name H S * ( S ; Q ).
In particular the entropy is a symmetric 1-cocycle.
The above proof of Theorem 1 applies to symmetric cocycle as well, thus, under the convenient hypothesis of connexity, richness, and completeness for S and Q we have H S 1 ( S ; Q ) = H.
Remark that an equivalent way to look at symmetric information cochains, consists in enlarging the category S in a “symmetric category” S S, by putting an arrow associated to each element σ X S ( / X ) from X to σX, and completing the category by composing the two kind of arrows, division and permutation. In this case, the probability functor Q must behave naturally with respect to permutation, which implies it is symmetric. Moreover, the natural notion of functional sheaf and local cochains are a symmetric sheaf and symmetric cochains.

2.4. Appendix. Complex of Possible Events

In each concrete situation, physical constraints produce exclusion rules between possible events, which select a sub-complex Q in the full probability simplex P = Δ N on Ω. The aim of this appendix is to make this remark more precise.
Let A0, A1, A2, A3, the N + 1 vertices of the large simplex ΔN, a point of ΔN is interpreted as a probability ℙ on the set of thee vertices; each vertex can be seen as an elementary event, and we will say that a general event A is possible for ℙ when ℙ(A) is different from zero. An event A is said impossible for P in the other case, that is when ℙ(A) = 0.
The star S(A) of a vertex A of ΔN is the complementary set of the opposite face to A, i.e., it is the set of probabilities P in ΔN such that A is possible, i.e., has non-zero probability. The relative star S(A|K) of A in subcomplex K is the intersection of the star of A with K.
We denote F = (A, B, C, D, …) the face of ΔN whose vertices are A, B, C, D, …. We note L(F) the set of points p in ΔN such that at least one of the points A, B, C, D, … is impossible for p. This is also the reunion of the faces which are opposite to the vertices A, B, C, D, … . Then L(F) is a simplicial complex. The complementary set in F of the interior of F , i.e., the boundary of F , is the reunion of the intersections of F with all faces opposite to A, B, C, D, …; it is also the set of probabilities p in F such that at least one of the points A, B, C, D, … is impossible for p, thus it is equal to L(F) ∩ F . If G is a face containing F the complex L(G) contains the complex L(F).
Let K be a simplicial complex contained in a N-simplex; then K is obtained by deleting from ΔN a set E = EK of open faces. Let F ˙ = F \ F be an element of E, then each faces G of ΔN containing F belongs to E, because K is a complex.
In this case K is contained in L(F). In fact L(F) is the smallest sub-complex of ΔN which does not contain F ˙. This can be proved as follows: if p in K makes that every vertices of F is possible, it belongs to a face G such that every vertex of F is a vertex of G, thus K contains G which contains F . So, if K does not contain F ˙, K is contained in L(F).
Let L = LK be the intersection of the L(F), where F describe the faces in EK. From what precedes we know that K is contained in L. However, every F ˙ in E is included in the complementary set of L(F), thus it is included in the complementary set of L, which is the union of the complementary sets of the L(F). Consequently the complementary set of K is included in the complementary set of L. Then K = L.
This discussion establishes the following result:
Theorem 2. A subset K of the simplex ΔN is a simplicial sub-complex if and only if it is defined by a finite number of constraints of the type: “for any p in K, the fact that A, B, C, … are possible for p implies that D is impossible for p”.
In other terms, more imaged but also more ambiguous, every sub-complex K is defined by constraints of the type: “if A, B, C, … are simultaneously allowed it is excluded that D can happen”.
The statement of the theorem is just a rewriting of the discussion, using elementary propositional calculus: let K be a sub-complex of ΔN, we have shown that K is the intersection of the L(F) where the open face F ˙ is not in K, but if A, B, C, D, … denote the vertices of the face F, a point p belongs to L(F) if and only if “(A is impossible for p) or (B is impossible for p) or …”, and this sentence is equivalent to “if (A is possible for p) and (B is possible for p) and …, then (D is impossible for p)”. This results from the equivalence between “(P implies Q) is true” and “(no P or Q) is true”. Reciprocally any L(F) is a simplicial complex, then every intersection of sets of the form L(F) is a simplicial complex too.

3. Higher Mutual Informations. A Sketch

The topological co-boundary operator on C, denoted by δt, is defined by the same formula as δ, except that the first term Y1.F (Y2; ; Yn; ℙ) is replaced by the term F(Y2; ; Yn; ℙ) without Y1:
δ t m F ( Y 1 ; ; Y m + 1 ; P X ) = F ( Y 2 ; ; Y m + 1 ; P X ) + 1 m ( 1 ) i F ( ; ( Y i , Y i + 1 ) ; ; Y m + 1 ; P X ) + ( 1 ) m + 1 F ( Y 1 ; ; Y m ; P X ) .
It is the coboundary of the bar complex for the trivial module t, which is the same as except no conditioning appears, i.e., Y.F = F . Hence it is the ordinary simplicial co-homology of the complex S with local coefficients in .
Remark that this operator also preserves locality, because all the functions of ℙ which comes in the development depends only on (Y2, …, Yn) ℙ, (Y1, …, Yn) ℙ and (Y1, …, Yn−1) ℙ.
By definition a topological cocycle of information is a cochain F that satisfies δtF = 0, and a topological co-boundary is an element in the image of δt.
It is easy to show that δtδt = 0, which allows to define a co-homology theory that we will name topological co-homology.
Now assume that the information structure S is a set W (Σ) = Δ(n) generated by a family Σ of partitions S1, …, Sn, when n ≥ 2.
Higher mutual information quantities were defined by Hu Kuo Ting [6] (see also Yeung [40]), generalizing the Shannon mutual information.
I N ( S 1 ; ; S N ; ) = k = 1 k = N ( 1 ) k 1 H k ( S 1 ; ; S N ; ) ,
where
H k ( S 1 ; ; S N ; ) = I [ N ] ; c a r d ( I ) = k H ( S I ; ) ,
SI denoting the joint partition of the Si such that i ∈ I. We also define I1 = H.
The definition of IN makes evident it is a symmetric function, invariant by all permutation of the partitions S1, …, SN.
For instance I2(S; T) = H(S) + H(T) − H(S, T) is the usual mutual information.
It is easily seen that I2 = δtH. The following formula generalizes this remark to higher mutual informations of even orders:
I 2 m = δ t δ δ t δ δ t H ,
where the right member contains 2m − 1 terms.
And for odd mutual information we have
I 2 m + 1 = δ δ t δ δ t δ δ t H ,
where the right member contains 2m terms.
We deduce from here that higher mutual informations are co-boundaries for δ or δt according that their order is odd or even respectively.
The result which proves the two above formulas is the following:
Lemma 3. Let n be even or odd we have
I N ( ( S 0 , S 1 ) ; S 2 ; ; S N ; ) = I N ( S 0 ; S 2 ; ; S N ; ) + S 0 . I N ( S 1 ; S 2 ; ; S N ; )
This lemma can be proved by comparing the completely developed forms of the quantities. It seems to signify that, with respect to one variable, IN satisfies the equation of information 1-cocycle, thus IN seems to be a kind of “partial 1-cocycle”; however this is misleading, because the locality condition is not satisfied. In fact IN is a N-cocycle, either for δ, either for δt depending on the parity of N.
For any N-cochain F we have
( δ δ t ) F ( S 0 ; S 1 ; ; S N ; ) = ( ( S 0 1 ) . F ) ( S 1 ; ; S N ; P ) ,
where S0 − 1 denotes the sum of the two operators of mean conditioning and minus identity.
That implies:
( δ δ t δ t δ ) F ( S 0 ; S 1 ; S 2 ; ; S N ; ) = ( ( 1 + S 0 + S 1 S 0 S 1 ) . F ) ( S 2 ; ; S N ; ) ,
Remark 3. Reciprocally the functions IN decompose the entropy of the finest joint partition:
H ( S 1 , S 2 , , S N ; ) = k = 1 k = N ( 1 ) k 1 I [ N ] ; c a r d ( I ) = k I k ( S i 1 ; S i 2 ; ; S i k ; )
For example, we have H(S, T) = I1(S) + I1(T) − I2(S; T), and
H ( S , T , U ) = H ( S ) + H ( T ) + H ( U ) I 2 ( S ; T ) I 2 ( T ; U ) I 2 ( S ; U ) + I 3 ( S ; T ; U ) .
Let us also note the recurrence formula whose proof is left to the reader (cf. Cover and Thomas [41]):
I N + 1 ( S 0 ; S 1 ; ; S N ) = I N ( S 1 ; ; S N ) S 0 . I ( S 1 ; ; S N ) .

4. Quantum Information and Projective Geometry

4.1. Quantum Measure, Geometry of Abelian Conditioning

In finite dimensional quantum mechanics the role of the finite set Ω of atomic events is played by a complex vector space E of finite dimension.
In fact, to each set Ω, of cardinal N, is naturally associated a vector space of dimension N over ℂ, which is the space freely generated over ℂ by the elements of Ω. Then we can identify E with ℂN, the canonical basis being the points x of Ω. In this case the canonical positive hermitian metric on E corresponds to the quadratic mean: if f and g are elements of E, we have
h 0 ( f , g ) = f | g 0 = f ¯ ( ω ) g ( ω ) d ω = 1 N j f j ¯ g j
Remark that, in the infinite dimensional situation, the space which would play the role of E is the space of L2 functions for a fixed probability P0.
Probability laws ℙ, which are elements of the big simplex Δ(N), give other hermitian structures, the ones which are expressed by diagonal matrices, with positive coefficients, and trace equal to 1.
In the general quantum case, described by E, a quantum probability law is every positive non-zero hermitian product h. If a basis is chosen, h is described by an N × N-matrix ρ. In the physical literature, every such ρ is called a density of states; and it is considered as a full description of the physical states of the finite quantum system. Usually ρ is normalized by Tr(ρ) = 1.
Note that this condition on the trace has no meaning for a positive hermitian form h if no additional structure is given, for instance a non-degenerate form h0 of reference. Why is it so? Because a priori a hermitian form h on E is a map from E to E ¯ , where ∗ denotes duality and bar denotes conjugation, the conjugate space E ¯ being the same set E, with the same structure of vector space over the real numbers as E, but with structure of vector space over the complex numbers changed by changing the sign of the action of the imaginary unit i. The complexification of the real vector space H of hermitian forms is H o m ( E , E ¯ ) E E ¯ . The space H is the set of fixed points of the ℂ-anti-linear map ut ū. A trace is defined for an endomorphism of the space E, as a linear invariant quantity on E*E. Here we could take the trace over ℝ, because E and E ¯ are the same over ℝ, but the duality would be an obstacle, because even over the field ℝ, the spaces E and E* cannot be identified, and there exits no linear invariant in E*E*, even over ℝ. In fact, a non-degenerate positive h0 is one of the way to identify E and E ¯ . A basis is another way, also defining canonically a form h0. More precisely, when h0 is given, every hermitian form h diagonalizes in an orthonormal basis for h0, thus all the spectrum of h makes sense not only the trace.
This h0 is tacitly assumed in most presentations. However it is better to understand the consequences of this choice. In non-relativistic quantum mechanics, it is not too grave, however in relativist quantum mechanics, it is; for instance, considering the system of two states as a spinor on the Lorentz space of dimension 4, the choice of h0 is equivalent to the choice of a coordinate of time. See Penrose and Rindler [42].
A much less violent way to do is to consider hermitian structures h up to multiplication by a strictly positive number. This would have the same effect as fixing the trace equals to one, without introducing any choice. In quantum mechanics only non-zero positive h are considered, not necessarily positive definite, but non-zero. This indicates that a good space of states is not the set H+ of all positive non-zero hermitian products but a convex part PH+ of the real projective space of real lines in the vector space H of hermitian forms. In this space, the complex projective space ℙ(E) of dimension N − 1 over ℂ is naturally embedded, its image consists of the rank one positive hermitian matrices of trace 1; these matrices correspond to the orthogonal projectors on one dimensional directions in E.
When a basis of E is chosen, particular elements of ℙ(E) are given by the generators of ℂN; they correspond to the Dirac distributions on classical states. We see here a point defended in particular by Von Neumann, that quantum states are projective objects not linear objects.
The classical random variables, i.e., the measurable functions on Ω with values in ℂ, are generalized in Quantum Mechanics by the operators in E, they are all the endomorphisms, i.e., any N × N-matrix, and they are named observables. Classical observables are recovered by diagonal matrices, their action on E corresponding to the multiplication of functions. Real valued variables are generalized by hermitian operators. Again this supposes that a special probability law h0 is given. If not “to be hermitian” for an operator has no meaning. (What could have a meaning for an operator is to be diagonalizable over R, which is something else.)
Then if h0 is chosen, the only difference between real observable and density of states is the absence of the positivity constraint.
By definition, the amplitude, or expectation, of the observable Z in the state ρ is the number given by the formula
E ρ ( Z ) = T r ( Z ρ ) .
It is important to note that h0 plays a role in this formula. Consequently the definition of expectation requires to fix an h0 not only a ρ. This imposes a departure from the relativistic case, which shall not be surprising, since considerations in relativistic statistical physics show that the entropy, for instance, depends on the choice of a coordinate for time. Cf. Landau-Lifschitz, Fluid Mechanics, second edition [43].
The partitions of Ω associated to random variables are replaced in the quantum context by the spectral decompositions of the hermitian operators X. As h0 is given, this decomposition is given by a set of positive hermitian commuting projectors of sum equal to the identity. The additional data for recovering the operator X is one real eigenvalue for each projector. The underlying fact from linear algebra is that every hermitian matrix is diagonalizable in a unitary basis, which means that
Z = j z j E j ,
where the number zj are real, two by two different, and where the matrices Ej are hermitian projectors, which satisfy, for any j and kj,
E j 2 = E j ; E j = E j ; E j E k = E k E j = 0 ;
and
j E j = I d N
When the hermitian operator Z commutes with the canonical projectors on the axis of ℂN, its spectral measure gives an ordinary partition of the canonical basis, and we recover the classical situation.
Note that the extension of the notion of partition is given by any decomposition of the vector space E in orthogonal sum, not necessarily compatible with a chosen basis. Again this assumes a given positive definite h0.
To generalize what we presented in the classical setting, quantum information theory must use only the spectral support of the decomposition, not the eigenvalues.
It would have been tempting to consider any decomposition of E in direct sum as a possible observable, however not every linear operator, or projective transformation, corresponds to such a decomposition, due to the existence of non-trivial nilpotent operators. What could be their role in quantum information? Moreover, the presence of h0 fully justifies the limitation to orthogonal decompositions.
In the general case, hermitian but not necessarily diagonal, we define the probability of the elementary events Z = zj by the following formula
ρ ( Z = z j ) = T r ( E j ρ E j )
And we define the conditional probability ρ|(Z = zj) by the formula
ρ | ( Z = z j ) = E j ρ E j / T r ( E j ρ E j ) .
One can notice that this definition can be extended to any projector, not necessarily hermitian. By definition, the conditioning of ρ by a projector Y is the matrix Y*ρY, normalized to be of trace 1. However, here, as it is done in most of the texts on Quantum Mechanics, we will mostly restrict ourselves to the case of hermitian projectors, i.e., Y* = Y.
Remark 4. What justifies these definitions of probability and conditioning? First they allow to recover the classical notions when we restrict to diagonal densities and diagonal observables, i.e., when ρ is diagonal, real, positive, of trace 1, Z is diagonal, and the Ej are diagonals, in which case they give a partition of Ω. The mean of Z is its amplitude. The probability of the event Z = zj is the sum of the probabilities p(ω) = ρωω for ω in the image of Ej; this the trace of ρEj. Moreover, the conditioning by this event is the probability obtained by projection on this image, as prescribed by the above formula.
Second, pure states are defined as rank one hermitian matrices. In this case ρ is the orthogonal projection on a vector ψ of norm equal to 1 (the finite dimensional version of the Schrodinger wave vector), the exact relation is
ρ = | ψ ψ |
or, in coordinates, if ψ has for coordinates the imaginary numbers ψ(ω), we have
ρ ω ω = ψ ( ω ) ¯ ψ ( ω ) .
Let Z be any hermitian operator, the result of quantum experiments indicate that the probability of the event Z = zj, for the state ψ, is equal to
P j = ψ | E j ψ .
But this quantity can also be written
P j = T r ( ψ | E j ψ ) = T r E ( | ψ ψ | E j | ) = T r ( ρ E j ) .
Starting from this formula and the fact any ρ can be written as a classical mixture of commuting quantum pure states,
ρ = a p a | ψ a ψ a | ,
we get the general formula of a quantum probability that we recalled.
Moreover, physical experiments indicate that after the measurement of an observable Z, giving the quantity zj, the system is reduced to the space Ej, and every pure state ψ is reduced to its projection Ejψ, which is compatible with the above definition of conditioning for pure states. Here again, the general formula can be deduced by Equation (74). The division by the probability is achieved to normalize to a trace 1. Thus conditioning in general is given by orthogonal projection in E, and it corresponds to the operation of measurement.
However, as claimed in particular by Roger Balian [44], the fact that the decomposition in pure states is non-unique implies that pure states cannot be so pertinent for understanding quantum information.
Definition 3. The density of states associated to a given variable Z and a given density ρ is given by the sum:
ρ z = j ρ ( Z = z j ) ρ | ( Z = z j ) = j E j ρ E j ,
where (Ej)jJ designates the spectral decomposition of Z, also named spectral measure of Z. Thus ρZ is usually seen as representing the density of states after the measurement of the variable Z. This formula is usually interpreted by saying that the statistical analysis of the repeated measurements of the observable Z transforms the density ρ into the density ρZ.
Remark that ρZ is better understood as being a collection of conditional probabilities ρ|(Z = zj), indexed by j.
In quantum physics as in classical physics the symmetries, discrete and continuous, have always played a fundamental role. For example, in quantum mechanics, a fundamental principle is the unitarity of the evolution in time, which claims that the states evolve as ρt = Utρ and that the observables evolve as Z t = U t Z U t 1, with Ut respecting the fundamental scalar product h0. In fact, as we already mentioned, a deeper principle associates the choice of a time coordinate t to the choice of h0, which gives birth to a unitary group U(E; h0), isomorphic to UN(ℂ). For stationary systems the family (Ut)t∈ℝ forms a one parameter group, i.e., Ut+s = UtUs = UsUt, and there exists a hermitian generator H of Ut in the sense that Ut = exp(2π itH/h); by definition, this particular observable H is the energy, the most important observable. Even if we have a privileged basis, like Ω in the relation with classical probability, the consideration of another basis which makes the energy H diagonal is of great importance. In the stationary case, a symmetry of the dynamical system is defined as any unitary operator, which commutes with the energy H. The set of symmetries forms a Lie group G, a closed sub-group in UN. The infinitesimal generators are considered as hermitian observables (obtained by multiplying the elements of the Lie algebra L(G) by i); in general they do not commute between themselves.
All these axioms extend to the infinite dimensional situation when E has a structure of an Hilbert space, but the spectral analysis of the un-bounded operators is more delicate and diverse than the analysis in finite dimension. Three kinds of spectrum appear, discrete, absolutely continuous and singular continuous. The symmetries could not form a Lie group in general, and so on.
In our simple case of elementary quantum probability, without fixed dynamics, the classical symmetries of the set of probabilities are given by the permutations of Ω, the vertices of Δ(N). They correspond to the unitary matrices which have one and only one non-zero element in each line and each column. They do not diagonalize in the same basis because they do not commute, but they form a group S N. Another subgroup of UN is natural for semi-classical study, it is the diagonal torus T N, its elements are the diagonal matrices with elements of modulus 1, they correspond to sets of angles. The group S N normalizes the torus T N, i.e., for each permutation σ and each diagonal element Z, the matrix σZσ−1 is also diagonal; its elements are the same as the elements of Z but in a different orders. The subgroup generated by S N and T N is the full normalizer of T N.
One of the strengths of the quantum theory, with respect to the classical theory, is that it gives a similar status to the states, the observables and the symmetries. States are hermitian forms, generalizing points in the sphere (or in the projective space) which are pure states, observables are hermitian operators, or better spectral decompositions, and symmetries are unitary operators, infinitesimal symmetries being anti-hermitian matrices.
All classical groups should appear in this framework. First, by choosing a special structure on E we restrict the linear group GLN(ℂ) to an algebraic subgroup G. For instance, by choosing a symmetric invertible bilinear form on E we obtain ON(ℂ), or, when N is even, by choosing an antisymmetric invertible bilinear form on E we obtain SpN(ℂ). In each of these cases there exists a special maximal torus (formed by the complexification of a maximal abelian subgroup T of unitary operators in G), and a Weyl group, which is the quotient of the normalizer N(T) by the torus T itself. This Weyl group generalizes the permutation group when more algebraic structures are given in addition to the linear structure. The compact group of symmetries is the intersection G of G with UN. In fact, given any compact Lie group Gc, and any faithful representation rc of Gc in ℂN, we can restrict real observables to generators of elements in Cc, and general observables to complex combinations of these generators, which integrate in a reductive linear group G. The spectral decomposition corresponds to the restriction to parabolic sub-groups of G. The densities of states are restricted to the Satake compactification of the symmetric space G/Gc [45].

4.2. Quantum Information Structures and Density Functors

To define information quantities in the quantum setting, we have a priori to consider families of operators (Y1, Y2, …, Ym) as joint variables. However, the efforts made in Physics and Mathematics were not sufficient to attribute a clear probability to the joint events (Y1 = y1, Y2 = y2, …, Ym = ym), when Y1, …, Ym do not commute; we even suspect that this difficulty is revelator of a principle, that information requires a form of commutativity. Thus, in our study, we will adopt the convention that every time we consider joint observables, they do commute. Hence we will consider only collections of commuting hermitian observables; their natural amplitudes in a given state are vectors in ℝm. However we do not exclude the consideration in our theory of sequences (Y1; ; Ym) such that the Yi do not commute.
A joint observable (Y1, Y2, …, Ym) define a linear decomposition of the total space E in direct orthogonal sum
E = α A E α ,
where Eα; αA is the collection of joint eigenspaces of the operators Yj. Note that any orthogonal decomposition can be defined by a unique operator.
Another manner to handle the joint variables is to consider linear families of commuting operators
Y ( λ 1 , , λ m ) = λ 1 Y 1 + + λ m Y m ,
or in equivalent terms, linear maps from ℝm to End(E). Then assigning a probability number and perform probability conditioning can be seen as functorial operations.
In what follows we denote indifferently by Eα the subspace of E or the orthogonal projection on this subspace.
>From the point of view of information, two sets of observables are equivalent if they give the same linear decomposition of E. We say that a decomposition Eα; αA refines a decomposition E′β; βB, when each E′β is a sum of spaces Eα for α in a subset Aβ of A. In such a case, we say that Eα; αA divides E′β; βB.
For instance, for commuting decompositions Y, Z it is possible to define the joint variable, as the less fine decomposition which is finer than Y and Z.
We insist that only decompositions have a role in information study at this moment. We will see that observation trees in the last section imposes to consider a supplementary structure, which consists in an ordering of the factors in the decomposition.
An information structure on E is a set S of decompositions X of E in direct sum, such that when Y and Z are elements of S which refine XS, then Y, Z commute and the finer decomposition (Y, Z) they generate belongs to S. In this text, we will only consider orthogonal decompositions.
Remark: in fact, the necessity of this condition in the quantum context was the original motivation to introduce the definition of classical information structure, as exposed in the first section. This can be seen as a comfortable flexibility in the classical context, or as a step from classical to quantum information theory.
As in the classical case, an information structure gives a category, denoted by the letter S, whose objects are the elements of S, and whose arrows XY are given by the divisions X|Y between the decompositions in S.
In what follows we always assume that 1, which corresponds to the trivial partition E, belongs to S, and is a final object. If not we will not get a topos.
Note that we are not the first to use categories and topos to formulate quantum or classical probability. In particular Doring and Isham propose a reformulation of the whole quantum and classical physics by using topos theory, see [46] and references inside. This theory followed remarkable works of Isham, Butterfield and Hamilton, made beween 1998 and 2002, and was further developed by Flori, Heunen, Landsman, Spitters, specially in the direction of a quantum logic. A common point between these works and our work is the consideration of sheaves over the category made by the partial ordering in commutative subalgebras. However, Doring et al. consider only the set of maximal algebras, and do not look at decompositions, i.e., they consider also the spectral values. In [46], Doring and Isham defined topos associated to quantum and classical probabilities. However, they focused on the definition of truth values in this context. For instance, in the classical setting, the topos they define is the topos of ordinary topological sheaves over the space (0, 1)L which has for open sets the intervals]0, r[for 0 ≤ r ≤ 1, and particular points in their topos are given by arbitrary probabilized spaces, which is far from the objects we consider, because our classical topos are attached to sigma-algebras over a given set. In fact, our aim is more to develop a kind of geometry in this context, by using homological algebra, in the spirit of Artin, Grothendieck, Verdier, when they developed topos for studying the geometry of schemes.
Example 5. The most interesting structures S seem to be provided by the quantum generalization of the simplicial information structure in classical finite probability. A finite family of commuting decompositions Σ = {S1, …, Sn} is given, they diagonalize in a common orthogonal basis, but it can happen that not all diagonal decompositions associated to the maximal torus belongs to the set of joints W (Σ). In such a case a subgroup GΣ appears, which corresponds to the stabilizer of the finest decomposition S[n] = (S1…Sn). This group is in general larger than a maximal torus of UN, it is a product of unitary groups (corresponding to common eigenvalues of observables in W (Σ)), and it is named a Levy subgroup of the unitary group. In addition we consider a closed subgroup G in the group U(E; h0) (which could be identified with UN), and all the conjugates gY g−1 of elements of W (Σ) by elements of G; this gives a manifold of commutative observable families Σg; gG. More generally we could consider several families Σγ; γ ∈ Γ of commuting observables, where Γ is any set. It can happen that an element of Σγ is also an element of Σλ for λγ. The family Γ ∗ Σ of the Σγ when γ describes the set Γ forms a quantum information structure. The elements of this structure are (perhaps ambiguously) parameterized by the product of an abstract simplex ∆(n) with the set Δ (in particular Γ = G for conjugated families).
A simplicial information structure is a subset of Γ Σ which corresponds to a family Kγ of simplicial sub-complexes of ∆(n). In the invariant case, when Γ = G, several restrictions could be usefull, for instance using the structure of the manifold of the conjugation classes of GΣ under G. The simplest case is given by taking the same complex K for all conjugates gΣg−1. By definition this latter case is a simplicial invariant family of quantum observables.
An event associated to S is a subspace EA, which is an element of one of the decompositions XS. For instance, if Y = (Y1, …, Ym), the joint event A = (Y1 = y1, Y2 = y2, …, Ym = ym) gives the space EA which is the maximal vector subspace of E where A happens, i.e.,
( f E A ) ( Y 1 ( f ) = y 1 f , Y 2 ( f ) = y 2 f , , Y m ( f ) = y m f ) .
We say that A is measurable for a decomposition Y whenever it is obtained by unions of elements of Y.
The role of the Boolean algebra B introduced in the first section, could have been accounted here by a given decomposition B of E such that any decomposition in S is divided by B.
However this choice of B is too rigid, in particular it forbids invariance by the unitary group U(h0). Thus we decided that a better analog of the Boolean algebra B is the set UB of all decompositions that are deduced from a given B by unitary transformations.
On the side of density of states, i.e., quantum probabilities, we can consider a subspace Q1 of the space P = ℙH+ of hermitian positive matrices modulo multiplication by a constant. Concretely, we identify the elements of Q1 with positive hermitian operators ρ such that T rρ = 1. The space P is naturally stratified by the rank of the form; the largest cell ℙH++ corresponds to the non-degenerate forms; the smallest cells correspond to the rank one forms, which are called pure states in Quantum Mechanics.
We will only consider subsets Q1 of P which are adapted to S, i.e., which satisfy that if ρ belongs to Q1, the conditioning of ρ by elements of S also belongs to Q1. This means that Q1 is closed by orthogonal projections on all the elements EA of the orthogonal decompositions X belonging to S. Note that a subset of P which is closed by all orthogonal projections is automatically adapted to any information category S.
Remind that, if ρ is a density of states and EA is an elementary event (i.e., a subspace of E), we define the conditioning of ρ by A by the hermitian matrix
ρ | A = E A * ρ E A / T r ( E A * ρ E A ) .
And we define the probability of the event EA for ρ as the trace:
ρ ( A ) = T r ( E A * ρ E A ) ,
In the same manner we define the density of a joint observable by
ρ Y = A p ( A ) ρ | A = A E A * ρ E A ,
A nice reference studying important examples is Paul-Andre Meyer, Quantum probability for probabilists [47].
If X is an orthogonal decomposition of E, we can associate to it a subset QX of Q1, which contains at least all the forms ρX where ρ belongs to Q1. The natural axiom that we assume for the function XQX, is that for each arrow of division X → Y , the set QY contains the set QX; then we note Y the injection from QX to QY . The fact that QX is stable by conditioning by every element of a decomposition Y which is less fine than X is automatic; it follows from the fact that Q1 is adapted to S. We will use conditioning in this way.
In what follows we denote by the letter Q such a functor XQX from the category S to the category of quantum probabilities, with the arrows given by direct images. The set Q1 is the value of the functor Q for the certitude 1. We must remind that many choices are possible for the functor when Q1 is given; the two extreme being the functor Qmax where QX = Q1 for every X, and the functor Qmin where QX is restricted to the set of forms ρX where ρ describes Q1; in this last case the elements of QX are positive hermitian forms on E, which are decomposed in blocs according to X.
From the physical point of view, Qmin appears to have more sense than Qmax, but we prefer to consider both of them.
A special probability functor, which will be noted Qcan(S), is canonically associated to a quantum information structure S:
Definition 4. The canonical density functor Q X c a n ( S ), is made by all positive hermitian forms matched to X, i.e., all the forms ρX when ρ describes PH+.
It is equal to the functor Qmin associated to the full set Q1 = PH+. When the context is clear, we will simply write Qcan.
An important difference appears between the quantum and the classical frameworks: if X divides Y, there exist more (quantum) probability laws in QY than in QX, but there exist less classical laws at the place Y than at the place X, because classical laws are defined on smaller sigma-algebras.
In particular, the trivial partition has only one classical state, which is Tr(ρ) = 1, but it has the richest structure in terms of quantum laws, any hermitian positive form.
Let us consider the classical probabilities, i.e., the maps that associate the number Pρ(A) to an event A; then, for an event which is measurable for Y, the law YρX gives the same result than the law ρX.
Remark: This points to a generalized notion of direct image, which is a correspondence qXY between QX and QY , not a map: we say that the pair (ρX, ρY) in QX × QY belongs to qXY, if for any event which is measurable for Y, we have the equality of probabilities
ρ X ( A ) = ρ Y ( A )
Let us look at the relation of quantification, between a classical information structure and a quantum one:
Consider a maximal family of commuting observables S in the quantum information structure S, i.e., the full subcategory associated to an initial object X0. This family is a classical information structure. Conversely, if we start with a classical information structure S, made by partitions of a finite set Ω, we can always consider it as a quantum structure associated to the vector space E = ℂ freely generated over ℂ by the elements of Ω. Note that E comes with a canonical positive definite form h0, and, to be interesting from the quantum point of view, it is better to extend S by applying to it all unitary transformations of E, generating a quantum structure S = U S.
Remark 5. Suppose that S is unitary invariant, we can define a larger category SU by taking as arrows the isomorphisms of ordered decomposition, and close by all compositions of arrows of S with them. Such an invariant extended category SU is not far to be equivalent to the category S S, made by adding arrows for permutations of the sets Ω/X (cf. above section), from the point of view of category theory: let us work an instant, as we will do in the last part of this paper, with ordered partitions of Ω, being itself equipped with an order, and ordered orthogonal decompositions of E. In this case we can associate to any ordered partition X = (E1, …, Em) of E, the unique ordered partition Ω compatible with the sequence of dimensions and the order of Ω. It gives a functor τ from S to S such that ι τ = I d S, where ι denotes the inclusion of S in S. These two functors are extended, preserving this property, to the categories SU and S S. In fact, the functor ι sends a permutation to the unitary map which acts by this permutation on the canonical basis, and the functor τ sends a unitary transformation g between XS and gXgS to the permutation it induces on the orthogonal decompositions. Moreover, consider the map f which associates to any X ∈ SU the unique morphism from the decomposition ιτ(X) to X; it is a natural transformation from the functor ιτ to the functor I d S U, which is invertible, then it defines an equivalence of category between S S and SU. However a big difference begins with probability functors.
Let Q be a quantum density functor adapted to S, and note ιQ the composite functor on S; we can consider the map Q which associates to X S the set of classical probabilities ℙρ for ρQX. If X divides Y, the fact that the direct image Yℙ(ρ) of ρQX coincides with the law Y * ( ρ ) gives the following result:
Lemma 4. p ↦ ℙρ is a natural transformation from the functor ιQ to the functor Q.
Definition 5. This natural transformation is called the Trace, and we denote by T rX its value in X, i.e., T rX(ρ) = ℙρ, seen as a map from QX to Q X.
In general there is no natural transformation in the other direction, from Q X to QX.
Remark that the trace sends a unitary invariant functor to a symmetric functor.

4.3. Quantum Information Homology

As in the classical case, we can consider the ringed site given by the category S, equipped with the sheaf of monoids {SX; XS}. In the ringed topos of sheaves of S-modules, the choice of a probability functor Q generates remarkable elements in this topos, formed by the functional space F of measurable functions on Q with values in ℝ. The action of the monoid (or the generated ring) being given by averaged conditioning, and the arrows being given by transposition of direct images. Then, the quantum information co-homology is the topos co-homology:
H m ( S , Q ) = E x t S m ( ; F )
However, as in the classical case, we can define directly the co-homology with a bar resolution of the constant sheaf, as follows:
A set of functions FX of m observables Y1, …, Ym divided by X, and one density ρ indexed by XS, is said local, when for any decomposition X dividing a decomposition Y, we have, for each ρ in QX,
F X ( Y 1 ; ; Y m ; ρ ) = F X ( Y 1 ; ; Y m ; Y * ( ρ ) ) .
For m = 0 this equation expresses that the family FX is an element of the topos.
For every m, a collection FX, XS is a natural transform F from a free functor Sm to the functor F.
Be careful that in the quantum context, it is not true in general that locality is equivalent to the condition saying that the value FX(Y1; …; Yn; ρ) depends only on the family of conditioned densities E A i * ρ E A ι ; i = 0 , , m, where Ai is one of the possible events defined by Yi.
In fact it depends on the choice of Q; for instance it is false for a Qmax, but it is true for a Qmin.
The counter-example in the case of Qmax is given by a function F (ρ) which is independent of X. It is local (in the sense of topos that we adopt) but it is non-local in the apparently more natural sense that it depends only of ρX. This is important to have this quantum particularity in the mind for understanding the following discussion.
As in the classical case, the action of observables on local functions is given by the average of conditioning, in the manner of Shannon, but using the Von Neumann conditioning:
Y . F ( Y 0 ; ; Y m ; ρ ) = A T r ( E A * ρ E A ) F ( Y 0 ; ; Y m ; ρ | A )
where the EA’s are the spectral projectors of the bundle Y. In this definition there is no necessity to assume that Y commutes with the Yj’s.
Remind that, when EAρEA is non-zero, ρ|A is equal to EAρEA/T r(EAρEA), and verifies the normalization condition that the trace equals to one. When EAρEA is equal to zero, the factor T r(EAρEA) is zero, then by convention the corresponding term F is absent.
The proof of the Lemma 1 applies without significant change to prove that the above formula defines an action of the monoid functor SX.
Then, the definition of co-homology is given exactly as we have done for the classical case, by introducing the Hochschild operator:
δ ^ m F ( Y 1 ; ; Y m + 1 ; ρ ) = Y 1 . F ( Y 2 ; ; Y m + 1 ; ρ ) + 1 m ( 1 ) i F ( ; ( Y i , Y i + 1 ) ; ; Y m + 1 ; ρ ) + ( 1 ) m + 1 F ( Y 1 ; ; Y m ; ρ ) .
The Von-Neumann entropy is defined by the following formula
S ( ρ ) = E ρ ( log 2 ( ρ ) ) = T r ( ρ log 2 ( ρ ) ) .
For any density functor Q which is adapted to S, the Von-Neumann entropy defines a local 0-cochain, that we will call SX, and is simply the restriction of S to the set QX. If ρ belongs to QX and if X divides Y , the law Yρ, which is the same hermitian form as ρ belongs to QY by functoriality, thus S(Yρ) = S(ρ) is translated by SX(ρ) = SY (Yρ). This 0-cochain will be simply named the Von Neumann entropy.
In the case of Qmax, SX gives the same value at all places X. In the case of Qmin it coincides with S(ρX), where ρX denotes the restriction to the decomposition X.
Be careful: ρS(ρX) is not a local 0-cochain for Qmax. In fact in the case of Qmax we have the same set Q = QX for every place X, thus, if we take for X a strict divisor of Y and if we take a density ρ such that, for the restrictions of ρ, the spectrum of ρY and ρX are different, then, in general, we do not have SX(ρ) = SY (Yρ), even if, as it is the case in the quantum context, Yρ = ρ.
Remark that in the case of Qmax, where every function of ρ independent of X is a cochain of degree zero, the particular functions which depends only on the spectrum of ρ are invariant under the action of the unitary group, and they are the only 0-cochains which are invariant by this group.
Definition 6. Suppose that S and Q are invariant by the unitary group, as is UB, we say that an m-cochain F is invariant, if for every X in S dividing Y1, …, Ym in S, every ρ in QX and every g in the group U(h0), we have
F g . X ( g . Y 1 , , g . Y m ; g . ρ ) = F X ( Y 1 ; ; Y m ; ρ ) ;
where g.X = gXg, g.Yi = gYig; i = 1, …, m and g.ρ = gρg.
This is compatible with the naturality assumption (functoriality by direct images), because direct image is a covariant operation.
Note that conditioning is also covariant if we change all variables and laws coherently. Thus the action of the monoids SX on cochains respects the invariance.
Then the coboundary δ ^ preserves invariance. Thus the co-homology of the invariant co-chains is well defined. We call it the invariant information co-homology, and we will denote it by H U * ( S ; Q ), U for unitary.
Invariant co-cochains form a subcomplex of ordinary cochains, then we have a well defined map from H U * ( S ; Q ) to H(S; Q).
The invariant 0-co-chains depend only on the spectrum of ρ in the sets QX.
The invariant co-homology is probably a more natural object from the point of view of Physics. It is also on this co-homology that we were able to obtain constructive results.
The classical entropy of the decomposition {Ej} and the quantum law ρ is
H ( X ; ρ ) = j T r ( E j * ρ E j ) log 2 ( T r ( E j * ρ E j ) )
In general it is not true that H(X; ρ) = H(Y ; Yρ) when X divides Y . Thus the Shannon (or Gibbs) entropy is not a local 0-cochain, but it is a local 1-cochain, i.e., if X → Y → Z we have
H X ( Z ; ρ X ) = H Y ( Z ; Y * ρ X ) ,
Moreover it is a spectral 1-cochain for any Qmin.
The following result is well known, cf. Nielsen and Chuang [13].
Lemma 5. Let X, Y be two commuting families of observables; we have
S ( X , Y ) ( ρ ) = H ( Y ; ρ ) + Y . S X ( ρ )
Proof. We denote by α, β, … the indices of the different values of X, by k, l, … the indices of the different values of Y , and by i, j, … the indices of a basis Ik,α of eigenvectors of the conditioned density ρ k , α = E k , α * ρ E k , α constrained by the projectors Ek,α of the pair (Y, X). The probability p k = P ρ ( X = ξ k ) is equal to the sum over i, α of the eigenvalues λi,k,α of ρk,α. We have
Y . S ( X ; ρ ) = k p k i , α λ i , k , α p k log 2 ( λ i , k , α p k ) = i , k , α λ i , k , α log 2 ( λ i , k , α ) + i , k , α λ i , k , α log 2 ( p k ) = i , k , α λ i , k , α log 2 ( λ i , k , α ) + k p k log 2 ( p k ) .
Remark 6. Taking X = 1, or any scalar matrix, the preceding Lemma 5 expresses the fact that classical entropy is a derived quantity measuring the default of equivariance of the quantum entropy:
H ( Y ; ρ ) = S Y ( ρ ) ( Y . S Y ) ( ρ ) .
Lemma 6. For any XS, dividing YS and ρQX,
δ ^ ( S X ) ( Y ; ρ ) = H X ( Y ; ρ ) .
Proof. This is exactly what says the Lemma 5 in this particular case, because in this case (X, Y) = X, and, by definition, we have δ ^ ( S X ) ( Y ; ρ ) = Y . S X ( ρ ) S X ( ρ ).
To insist, we give a direct proof with less indices for this case:
Y . S X ( ρ ) = i p i k λ i k p i log 2 λ i k p i = i k λ i k log 2 λ i k + i k λ i k log 2 p i = S X ( ρ ) + i log 2 p i k λ i k = S X ( ρ ) + i ( log 2 p i ) p i = S X ( ρ ) H X ( Y ; ρ ) = S X ( ρ ) H X ( Y ; ρ ) .
The Lemma 6 says that (up to the sign) the Shannon entropy is the co-boundary of the Von-Neumann entropy. This implies that the Shannon entropy is a 1-co-cycle, as in the classical case, but now it gives zero in co-homology.
Note that the result is true for any Q, thus for Qmin and for Qmax as well.
Consider a maximal observable X0 in S, i.e., a maximal set of commuting observables in S, the elements of this maximal partition form a finite set Ω0. If S is invariant by the group U(E; h0), all the maximal observables are deduced from X0 by applying a unitary base change. Suppose that the functor Q is invariant also; then we get automatically a symmetric classical structure of information S on Ω0, given by the elements of S divided by X0. And S is equipped with a symmetric classical functor of probability, given by the probability laws associated to the elements of S.
Remind that we defined the trace from quantum probabilities to classical probabilities, by taking the classical ℙρ for each ρ, and we noticed that the trace is compatible with invariance and symmetry by permutations.
Definition 7. To each classical co-chain F 0 we can associate a quantum co-chain F = trF0 by putting
t r * ( F ) X ( Y 1 ; ; Y m ; ρ ) = F X 0 ( Y 1 ; ; Y m ; t r X ( ρ ) ) .
The following result is straightforward:
Proposition 3. (i) The trace of co-chains defines a map of the classical information Hochschild complex to the quantum one, which commutes with the co-boundaries, i.e., the map tr defines a map from the classical information Hochschild complex to the quantum Hochschild complex; (ii) this map sends symmetric cochains to invaraint cochains; it induces a natural map from the symmetric classical information co-homology H S * ( S , Q ) to the invariant quantum information co-homology HU(S; Q).
The Lemma 6 says that the entropy class goes to zero.
Remark 7. In a preliminary version of these notes, we considered the expression s(X; ρ) = S(ρX) − S(ρ) and showed it satisfies formally the 1-cocycle equation. But we suppress this consideration now, because s is not local, thus it plays no interesting role in homology. For instance in Qmin, S(ρX) is local but S(ρ) is not and in Qmax, S(ρ) is local but S(ρX) is not.
Definition 8. In an information structure S we call edge a pair of decompositions (X, Y) such that X, Y and XY belong to S; we say that an edge is rich when both X and Y have at least two elements and XY cuts those two in four distinct subspaces of E. The structure S is connected if every two points are joined by a sequence of edges, and it is sufficiently rich when every point belongs to a rich edge. We assume a maximal set of subspaces UB is given in the Grassmannian of E, in such a way that the maximal elements X0 of S (i.e., initial in the category) are made by pieces in UB. The density functor Q is said complete with respect to S (or UB) if for every X, the set QX contains the positive hermitian forms on the blocs of X, that give scalar blocs ραβ for two elements Eα, Eβ of a maximal decomposition. (All that is simplified when we choose a basis, and take maximal commutative subalgebras of operators, but we want to be free to consider simplicial complexes.)
Theorem 3. (i) for any unitary invariant quantum information structure S, which is connected and sufficiently rich, and for the canonical invariant density functor Qcan(S), (i.e., the density functor which is minimal and complete with respect to S), the invariant information co-homology of degree one H U 1 ( S ; Q ) is zero. (ii) Under the same hypothesis, the invariant co-homology of degree zero has dimension one, and is generated by the constants. Then, up to an additive constant, the only invariant 0-cochain which has the Shannon entropy as co-boundary is (minus) the Von-Neumann entropy.
Proof. (I) Let X, Y be two orthogonal decompositions of E belonging to S such that (X, Y) belongs to S, and ρ an element of Q. We name A k i ; i = 1 , , m the summands of X, and B α j ; j = 1 , , l the summands of Y ; the projections E k i ρ E k i ; i = 1 , , m resp. E α j ρ E α j ; j = 1 , , l of ρ on the summands of X, resp. Y are denoted by ρ k i ; i = 1 , , m and ρ α j ; j = 1 , , l respectively. The projections by the commutative products E k i E α j are denoted by ρ k i , α j ; i = 1 , , m , j = 1 , , l.
Let f be a 1-cocycle, we write f(X; ρ) = F (ρ), f(Y; ρ) = H(ρ) and G(ρ) = f(X, Y; ρ). Note that in Qmin, F is a function of the ρ k i, H a function of the ρ α j and G a function of the ρ k i , α j, but there is no necessity too assume this property; we can always consider these functions restricted to diagonal blocs, which are arbitrary due to the completeness hypothesis.
For any positive hermitian ρ′, we write ρ′|α, resp. ρ′|i the form conditioned by the event Bα resp. Ai. The co-cycle equation gives the two following equations, that are exchanged by permuting X and Y:
α j T r ( ρ α j ) F ( ( ρ k i | α j ) ; i = 1 , , m ) = G ( ( ρ k i , α j ) ; i , j ) H ( ( ρ α j ) ; j ) ,
i T r ( ρ k i ) H ( ( ρ α j | k i ) ; j ) = G ( ( ρ k i , α j ) ; i , j ) F ( ( ρ k i ) ; i ) .
Now we consider a particular case, where the small blocs ρk,α are zero except for (k1, α2) and (kj, α1) for j = 2, …, m. We denote by h1 the forme ρ k 1 , α 2 and by hi the form ρ k i , α 1, for i = 2, …, m. Remark that Tr(h1 + h2 + … + hm) = 1.
(II) As in the classical case, it is a general fact for a 1-cocycle f and any variable Z the value f(Z; ρ) is zero if ρ is zero outside one of the orthogonal summand Ca of Z; because the equation fX(Z, Z; ρ) = fX(Z; ρ) + Z.fX(Z; ρ) implies Z.fX(Z, ρ) = 0, and if ρ has only one non-zero factor ρa, we have
Z . f ( Z ; ρ ) = b T r ( ρ b ) f ( Z ; ρ b / T r ( ρ b ) ) = T r ( ρ a ) f ( Z ; ρ a / T r ( ρ a ) ) = 1 . f ( Z ; ρ a ) .
Therefore in the particular case that we consider, we get for any i that H ( ( ρ α j | k i ) ; j ) = 0. Consequently the Equation (96) equals the term in G to the term in F , and we can report this equality in the first equation. By denoting 1 x 1 = T r ( ρ α 1 ), this gives
H ( ( ρ α j ) ; j = 1 , 2 ) = F ( ( ρ k i ) ; i = 1 , , m ) ( 1 x 1 ) F ( ( 0 , h 2 1 x 2 , , h m 1 x m ) ) .
Now if we add the condition h3 = = hm = 0 we have F (0, h2/(1−x1), 0, …, 0) = 0 for the reason which eliminated the H ( ( ρ α j | k i ) ; j ); thus we obtain
H ( ρ α 1 ) ; j = 1 , 2 = F ( ( ρ k 1 ) ; i = 1 , 2 ) .
This is a sufficiently strong constraints for implying that both terms are functions of h1, h2 only, and that of course they coincide as functions of these small blocs.
First this gives a recurrence equation, which, as in the classical case is able to reconstruct F ( ( ρ k i ) ; i = 1 , , m ) from the case of two blocs:
F ( X ; ( ρ k i ) ; i = 1 , , m ) = F ( X ; ( ρ k 1 , ρ k 2 , 0 , , 0 ) ( 1 x 1 ) F ( X ; ( 0 , h 2 1 x 2 , , h m 1 x m ) ) .
(III) We are left with the study of two binary variables Y, Z, forming a rich edge.
The blocs of ρ adapted to the joint ZY are denoted by ρ00, ρ01, ρ10, ρ11, where the first index refers to Y and the second index refers to Z, but the blocs that are allowed for Y and Z are more numerous than four; there exist out of diagonal blocs, and their role will be important in our analysis. For Y we have matrices ρ 0 0 and ρ 1 0, and for Z we have matrices ρ 0 1 and ρ 1 1;
ρ 0 0 = ( ρ 00 ρ 001 0 ρ 010 0 ρ 01 ) ρ 1 0 = ( ρ 10 ρ 101 0 ρ 111 0 ρ 11 )
ρ 0 1 = ( ρ 00 ρ 001 1 ρ 010 1 ρ 01 ) ρ 1 1 = ( ρ 10 ρ 101 1 ρ 111 1 ρ 11 )
They are disposed in sixteen blocs for ρ, but certain of them, noted with stars, cannot be seen from ρY or ρZ:
ρ = ( ρ 00 ρ 001 0 ρ 001 1 ρ 001 * ρ 010 0 ρ 01 ρ 101 * ρ 001 1 ρ 010 1 ρ 010 * ρ 10 ρ 101 0 ρ 111 * ρ 111 1 ρ 111 0 ρ 11 )
Now the co-cycle equations are
F ( Y , Z ; ρ ) = Y . F ( Z ; ρ ) + F ( Y ; ρ ) = Z . F ( Y ; ρ ) + F ( Z ; ρ ) ,
giving the symmetrical relation:
Y . F ( Z ; ρ ) F ( Z ; ρ ) = Z . F ( Y ; ρ ) F ( Y ; ρ ) .
The conditioning makes many blocs disappear. Then, by denoting with latin letters the corresponding traces, and taking in account explicitly the blocs that must count, the symmetrical identity gives, for any ρ, the following developed equation:
( p 00 + p 01 ) F Z ( ρ 00 p 00 + p 01 , ρ 01 p 00 + p 01 , 0 , 0 ) + ( p 10 + p 11 ) F Z ( 0 , 0 , ρ 10 p 10 + p 11 , ρ 01 p 10 + p 11 ) F Z ( ρ 00 , ρ 001 1 , ρ 01 , ρ 001 1 , ρ 010 1 , ρ 10 , ρ 111 1 , ρ 11 ) = ( p 00 + p 10 ) F Y ( ρ 00 p 00 + p 10 , 0 , ρ 10 p 00 + p 11 , 0 ) + ( p 01 + p 11 ) F Y ( 0 , ρ 10 p 01 + p 11 , 0 , ρ 11 p 01 + p 11 ) F Y ( ρ 00 , ρ 001 0 , ρ 01 , ρ 001 0 , ρ 010 0 , ρ 10 , ρ 111 0 , ρ 11 ) .
(IV) Now we make appeal to the invariance hypothesis: let us apply a unitary transformation g which respects the two summands of Y but does not necessarily respect the summands of Z we replace Z by gZg, and ρ by gρg, the value of F Y ( ρ 0 0 , ρ 1 0 ) does not change. Our claim is that the only function FY which is compatible with the Equation (106) for every ρ are functions of the traces of the blocs.
For the proof, we assume that all the blocs are zero except the eight blocs concerning Y . In this case, we see that the last function −FY of the right member, involves the eight blocs, but all the other functions involve only the four diagonal blocs. Thus our claim follows from the following result:
Lemma 7. A measurable function f on the set H of hermitian matrices which is invariant under conjugation by the unitary group Un and invariant by the change of the coefficient a1n, the farthest from the diagonal, is a function of the trace.
Proof. An invariant function for the adjoint representation is a function of the traces of the exterior powers Λk(ρ), but these traces are coefficients in the basis e i 1 e i 1 e i k, and the elements divisible by e1en cannot be neglected, as soon as k ≥ 2.
Therefore the co-cycle FY, FZ comes from the image of tr* in proposition 3. Then the recurrence relation (100) implies that the same is true for the whole co-cycle F.
(V) For concluding the proof of (i), we appeal to the Theorem 1, that the only non-zero cocycles in this context, connected and sufficiently rich, are multiples of the classical entropy. However, the Lemma 5 says that the entropy is a co-boundary.
(VI) To prove (ii), we have to show that every 0-cocycle XfX(ρ), which depends only on the spectrum of ρ, is a constant. We know that a spectral function is a measurable function φ(σ1; σ2; …) of the elementary symmetric functions σ 1 = i λ i , σ 2 = i < j λ i λ j , .
And, to be a 0-cocycle, f must verify, for every pair of decompositions, X → Y, the equation
f X ( ρ ) = i P ρ ( Y = i ) f X ( ρ | ( Y = i ) ) .
Explicitly, if fX(ρ) = φX(σ1, σ2, …),
φ X ( σ 1 , σ 2 , ) = i σ 1 ( λ k , i ) φ X ( σ 1 ( λ k i ) , )
where each bloc ρ|i has the spectrum { λ k , i ; k J i }. For a sufficiently rich edge X = Y Z, we have with four eigenvalues repeated as it must be to fulfill the dimensions:
f ( λ 00 ( n 00 ) , n 00 ( n 00 ) , λ 01 ( n 01 ) , λ 10 ( n 10 ) , λ 11 ( n 11 ) ) = ( n 00 λ 00 + n 01 λ 01 ) f ( λ 00 ( n 00 ) n 00 λ 00 + n 01 λ 01 , λ 01 ( n 01 ) n 00 λ 00 + n 01 λ 01 ) + ( n 10 λ 10 + n 11 λ 11 ) f ( λ 10 ( n 10 ) n 10 λ 10 + n 11 λ 11 , λ 11 n 11 n 10 λ 10 + n 11 λ 11 ) ,
and
f ( λ 00 ( n 00 ) , n 00 ( n 00 ) , λ 01 ( n 01 ) , λ 10 ( n 10 ) , λ 11 ( n 11 ) ) = ( n 00 λ 00 + n 10 λ 10 ) f ( λ 00 ( n 00 ) n 00 λ 00 + n 10 λ 10 , λ 10 ( n 10 ) n 00 λ 00 + n 10 λ 10 ) + ( n 01 λ 01 + n 11 λ 11 ) f ( λ 01 ( n 01 ) n 01 λ 01 + n 11 λ 11 , λ 11 ( n 11 ) n 01 λ 01 + n 11 λ 11 ) ,
By equating the two second members, taking λ01 = λ00 = 0, and varying λ10, λ11, we find that f(x, y) is the sum of a constant and a linear function.
At the end, fX must be the sum of a constant and a linear function for every X. However, a linear symmetric function is a multiple of σ1. As ρ is normalized by the condition Tr(ρ) = 1, only the constant survives.
Remark 8. In his book “Structure des Systemes Dynamiques”, J-M. Souriau [48] showed that the mass of a mechanical system is a degree one class of co-homology of the relativity group with values in its adjoint representation; this class being non-trivial for classical Mechanics, with the Galileo group, and becoming trivial for Einstein relativistic Mechanics, with the Lorentz-Poincare group. Even if we are conscious of the big difference with our construction, the above result shows the same thing happens for the entropy, but going from classical statistics to quantum statistics.
>From the philosophical point of view, it is important to mention that the main difference between classical and quantum information co-homology in degree less than one, is the fact that the certitude, 1, becomes highly non-trivial in the quantum context. This point is discussed in particular by Gabriel Catren [49]. In geometric quantization the first ingredient, discovered by Kirillov, Kostant and Souriau in the sixties, is a circular bundle over the phase space that allows a non-trivial representation of the constants. The second ingredient also discovered by the same authors, is the necessity to choose a polarization, which correspond to the choice of a maximal commutative Poisson sub-algebra of observable quantities. This second ingredient appears in our framework through the limitations of information categories to collection of commutative Boolean algebras, coming from the impossibility to define manageable joints for arbitrary pair of observables.

5. Product Structures, Kullback–Leibler Divergence, Quantum Version

In this short section, we use both the homogeneous bar-complex and the non-homogeneous complex. A natural extension of the information co-cycles is to look at the measurable functions
F ( X 0 ; X 1 ; ; X m ; P 0 ; P 1 , P 2 , , P n ; X ) ,
of several probability laws Pj (or density of states respectively) on Ω (or E respectively) belonging to the space Q X that are absolutely continuous with respect to P0, and several decompositions Yi less fine than X. To be homogeneous co-chains these functions have to behave naturally under direct image Y(Pi), and to satisfy the equivariance relation:
F ( ( Y , X 0 ) ; ( Y , X 1 ) ; ; ( Y , X m ) ; P 0 ; P 1 , P 2 , , P n ; X ) = Y . F ( X 0 ; X 1 ; ; X m ; P 0 ; P 1 , P 2 , , P n ; X ) ,
for any YSX (resp. SX), where
Y . F ( X 0 ; X 1 ; ; X m ; P 0 ; P 1 , P 2 , , P n ; X ) = E Y d Y * P 0 ( y ) F ( X 0 ; X 1 ; ; X m ; P 0 | Y = y ; P 1 | Y = y , , P n | Y = y ; X ) .
Note that a special role is played by the law P0, which justifies the coma notation.
The proof of the Lemma 1 in Section 2.1 extends without modification to show that this defines an action of semi-group.
Then we define the homogeneous co-boundary operator by
δ F ( X 0 ; X 1 ; ; ; X m ; X m + 1 ; P 0 ; P 1 , P 2 , , P n ; X ) = i ( 1 ) i F ( X 0 ; ; X ^ i ; ; X m ; X m + 1 ; P 0 ; P 1 , P 2 , , P n ; X ) .
The co-cycles are the elements of the kernel of δ and the co-boundaries the elements of the image of δ (with a shift of degree). The co-homology groups are the quotients of the spaces of co-cycles by the spaces of co-boundaries.
This co-homology is the topos co-homology H S * ( , F n ), of the module functor F n of measurable functions of n + 1-uples of probabilities, in the ringed topos S (resp. S in the quantum case).
There is also the non-homogeneous version: a m-cocycle is a family of functions FX(X1; ; ; Xm; P0; P1, P2, …, Pn) which behave naturally under direct images, without equivariance condition.
The co-boundary operator is copied on the Hochschild operator: then we define the homogeneous co-boundary operator by
δ ^ F X ( X 0 ; X 1 ; ; ; X m ; P 0 ; P 1 , P 2 , , P n ) = ( X 0 . F X ) ( X i ; ; ; X m ; P 0 ; P 1 ; P 2 , , P n ) + i ( 1 ) i + 1 F ( X 0 ; ; X ^ i ; ; X m ; P 0 ; P 1 , P 2 , , P n ; X ) .
Let us recall the definition of the Kullback–Leibler divergence (or relative entropy) between two classical probability laws P, Q on the same space Ω, in the finite case:
H ( P ; Q ) = i p i log q i p i .
Over an infinite set, it is required that Q is absolutely continuous with respect to P with a L1-density dQ/dP , and the definition is
H ( P ; Q ) = Ω d P ( ω ) log d Q ( ω ) d P ( ω ) .
When dQ(ω)/dP (ω) = 0, the logarithm is −∞ and due to the sign minus, we get a contribution + in H, thus, if this happens with probability non-zero for P the divergence is infinite positive. To get a finite number we must suppose also that P is absolutely continuous with respect to Q, i.e., P and Q are equivalent.
The analogous formula defines the quantum Kullback–Leibler divergence (or quantum relative entropy), cf. Nielsen-Chuang [13], between two density of states ρ, σ on the same Hilbert space E, in the finite dimensional case:
S ( ρ ; σ ) = T r ( ρ ( log σ log ρ ) ) .
In the case of an infinite dimensional Hilbert space, it is required that the trace is well defined.
These quantities are positive or zero, and they are zero only in the case of equality of the measures (resp. the densities of states). It is the reason why it is frequently used as a measure of distance between two laws.
Proposition 4. The map which associates to X in S, Y divided by X, and two laws P, Q the quantity H(YP ; YQ) defines a non-homogeneous 1-cocycle, denoted HX(Y ; P ; Q).
Proof. As we already know that the classical Shannon entropy is a non-homogeneous 1-cocycle, it is sufficient to prove the Hochschild relation for the new function
H m ( Y ; P ; Q ) = i p i log q i .
Let us denote by pij (resp. qij) the probability for P (resp. Q) of the event Y = xi, Z = yj, and by pj (resp. qj) the probability for P (resp. Q) of the event Z = yj; then the probability pj (resp. q i j) of Y = xi knowing that Z = yj for P (resp. for Q) is equal to pij/pj (resp. qij/qj), and we have
H m ( ( Z , Y ) ; P , Q ) = i j p i j log q i j
= j p j i p i j log ( q j q i j )
= j p j log q j ( i p i j ) j p j i p i j log q i j
= j p j log q j j p j i p i j log q i j ;
the first term on the right is Hm(Z; P ; Q) and the second is (Z.Hm)(Y ; P ; Q), Q.E.D.
This defines a homogeneous co-cycle for pairs of probability laws HX(Y ; Z; P ; Q) = HX(Y ; P ; Q)− HX(Z; P ; Q), named Kullback-divergence variation.
In the quantum case, for two densities of states ρ, σ we define in the same manner a classical Kullback–Leibler divergence HX(Y ; ρ; σ) by the formula
H X ( Y ; ρ ; σ ) = k ( T r ( ρ k log ( T r ( ρ k ) ) log ( T r ( σ k ) ) ) ) ;
where the index k parameterizes the orthogonal decomposition Ek associated to Y and where ρk (resp. σk) denotes the matrix EkρEk (resp. EkσEk). It is the Kullback–Leibler divergence of the classical laws associated to the direct images ρ and σ respectively.
But in the case of quantum information theory, we can also define a quantum divergence, for any pair densities of states (ρ, σ) in QX,
S X ( ρ ; σ ) = T r ( ρ log σ ) .
Lemma 8. For any pair (X, Y) of commuting hermitian operators, such that Y divides X, the function SX satisfies the relation
S ( X , Y ) ( ρ ; σ ) = H Y ( X ; ρ ; σ ) + X . S Y ( ρ ; σ ) ;
where HX of two variables denotes the mixed entropy, defined by Equation (119).
Proof. As in the proof of the Lemma 4, we denote by α, β, … (resp. k, l, …) the indices of the orthogonal decomposition Y (resp. X), and by i, j, … the indices of a basis φi,k,α of the space Ek,α made by eigenvectors of the matrix G k , α = E k , α * ρ E k , α belonging to the joint operator (X, Y). In a general manner if M is an endomorphism of Ek,α we denote by Mi,k,α the diagonal coefficient of index (i, k, α). The probability pk (resp. qk) for ρ (resp. σ) of the event X = ξk is equal to the sum over i, α of the eigenvalues λi,k,α of ρk,α (resp. µi,k,α of σk,α). And the restricted density ρYk (resp. σYk), conditioned by X = ξk, is the sum over α of ϱk,α (resp. of σk,α) divided by pk (resp. qk). We have
X . S Y ( ρ ; σ ) = k p k T r ( ρ Y k log σ Y k )
= k p k i , α λ i , k , α p k ( log σ k q k ) i , k , α
= i , k , α λ i , k , α log q k i , k , α λ i , k , α ( log σ k ) i , k , α
= k p k log q k T r ( ρ k , α log ( σ k , α )
= H Y ( X ; ρ ; σ ) + S ( X , Y ) ( ρ ; σ ) .
As a corollary, with the argument proving the Lemma 5 from the Lemma 4, we obtain that the classical Kullback divergence is minus the co-boundary of the 0-cochain defined by the quantum divergence.
This shows that the generating function of all the co-cycles we have considered so far is the quantum 0-cochain for pairs S(ρ; σ) = −T r(ρ log σ).

6. Structure of Observation of a Finite System

Up to now the considered structures and the interventions of entropy can be considered as forming a kind of statics in information theory. The aim of this section is to indicate the elements of dynamics which could correspond. This more dynamical study could be more adapted to the known intervention of entropy in the theory of dynamical systems, as defined by Kolmogorov and Sinai.

6.1. Problems of Discrimination

The problem of optimal discrimination consists in separating the various states of a system, by using in the most economical manner, a family of observable quantities. One can also only want to detect a state satisfying a certain chosen property. A possible measure of the cost of discrimination is the number of step before ending the process.
First, let us define more precisely what we mean by a system, a state, an observable quantity and a strategy for using observations. As before, for simplicity, the setting is finite sets.
The symbol [n] denotes the set {1, …, n}. We have n finite sets Mi of respective cardinalities mi, and we consider the set M of sequences x1, …, xn where xi belongs to Mi; by definition a system is a subset X of M and a state of the system is an element of X. The set of (classical) observable quantities is a (finite) subset A of the functions from X to R.
A use of observables, named an observation strategy, is an oriented tree Γ, starting at its root, that is the smallest vertex, and such that each vertex is labelled by an element of A, and each arrow (naturally oriented edge) is labelled by a possible value of the observable at the initial vertex of the arrow.
For instance, if F0 marks the root s0, it means that we aim to measure F0(x) for the states; then branches issued from t0 are indexed by the values v of F0, and to each branch F0 = υ corresponds a subset Xυ of states, giving a partition of X. If F1,v is the observable at the final vertex αv of the branch F0 = υ, the next step in the program is to evaluate F1,v(x) for xXv; then branches issued from αv corresponds to values w of F1,υ restricted to Xv, and so on.
For each vertex s in Γ we note ν(s) the number of edges that are necessary for joining s to the root s0. The function ν with values in ℕ is called the level in the tree.
It can happen that a set Xυ consists of one element only; in this case we decide to extend the tree to the next levels by a branch without bifurcation, for instance by labelling with the same observable and the same value, but it could be any labelling, and its value on Xv. In such a way, each level k gives a well defined partition πk of X.
The level k also defines a sub-tree Γk of Γ, such that its final branches are bearing πk. This gives a sequence π0, π1, …, πl of finer and finer partitions of X, i.e., a growing sequence of partitions (if the ordering on partition is the opposite of the sense of arrows in the information category Π(X)). The tree is said fully discriminant if the last partition πl, which is the finest is made by singletons.
The minimal number of steps that are necessary for separating the elements of X, or more modestly for detecting a certain part of states, can be seen as a measure of complexity of the system with respect to the observations A. A refined measure could take in account the cost of use of a given observable, for instance the difficulty to compute its values.
Standard examples are furnished by weighting problems: in this case the states are mass repartitions in n objects, and allowed observables are weighting, which are functions of the form
F I , J ( x ) = i I x i j J x j
where I et J are disjoint subsets of [n].
We underline that such a function, which requires the choice of two disjoint subsets in [n], makes use of the definition of M as a set of sequences, not as an abstract finite set.
The kind of problems we can ask in this framework were studied for instance in “Problemes plaisants et delectables qui se font par les nombres” from Bachet de Meziriac (1612, 1624) [50].
The starting point of our research in this direction was a particular classical problem signaled to us by Guillaume Marrelec: given n objects ξ1, …, ξn, if we know that m have the same mass and n − m have another common mass, how many measures must be performed, to separate the two groups and decide which is the heavier?
Even for m = 1 the solution is interesting, and follows a principle of choice by maximum of entropy. In the present text we only want to describe the general structures in relation to this kind of problem without developing a specific study, in particular we want to show that the co-homological nature of the entropy extends to a more dynamical context of discrimination in time.
Remark 9. The discrimination problem is connected with the coding problem. In fact a finite system X (as we defined it just before) is nothing else than a particular set of words of length n, where the letter appearing at place i belongs to an alphabet Mi. Distinguishing between different words with a set A of variables f, is nothing else than rewriting the words x of X with symbols vf (labelling the image f(X)). To determine the most economical manner to do that, consists to find the smallest maximal length l of words in the alphabet (f, vf); f ∈ A, vf ∈ f(X) translating all the words x in X. This translation, when it is possible, can be read on the branches of a fully discriminating rooted tree, associated to an optimal strategy, of minimal level l. The word that translate x being the sequence (F0, v0), (F1, v1), , (Fk, vk), k ≤ l, of the variables put on the vertices along the branch going from 0 to x, and the values of these variables put along the edges of this branch.

6.2. Observation Trees. Galois Groups and Probability Knowledge

More generally, we consider as in the first part (resp. in the second part) a finite set Ω, equipped with a Boolean algebra (resp. a finite dimensional complex vector space E equipped with a positive definite hermitian form h0 and a family of direct decompositions in linear spaces UB). In each situation we have a natural notion of observable quantity: in the case of Ω it is a partition Y compatible with (i.e., less fine than ) with numbering of the parts by the integers 1, .., k if Y has k elements; in the case of E it is a decomposition Y compatible with UB (i.e., each summand is direct sum of elements of one of the decompositions uB; for u ∈ U(h0)), with a numbering of the summands by the integers 1, .., k if Y has k elements. We also have a notion of probability: in the case of (Ω, Y) it is a classical probability law PY on the quotient set Ω/Y; in the case of (E, Y) it is a collection of non-negative hermitian forms hY,i on each summands of Y.
We will consider information structures, denoted by the symbol S, for both cases (which could be distinguished by the typography, S or S, if necessary): they are categories made by objects that are observables and arrows that are divisions, satisfying the condition that if X ∈ S divides Y and Z in S, then the joint (Y, Z) belongs to S.
We will also consider probability families adapted to these information structures; they form a covariant functor XQX (which can be typographically distinguished in the two cases by Q X and QX) of direct images. When S is a classical subcategory of the quantum structure S, we suppose that we have a trace transformation from ιQ to Q, and if S and Q are unitary invariant, we remind that, thanks to the ordering, we have an equivalence of category between SU and S, and a compatible morphism from the functional module Q to the functional module Q.
Except the new ingredient of orderings, they are familiar objects for our reader. The letter X will denote both cases Ω and E, then the letters S, B, Q will denote respectively S, , Q or S, UB, Q. Be careful that now all observable quantities are ordered, either partitions, either direct decomposition. We will always assume the compatibility condition between Q and S, meaning that every conditioning of P ∈ Q by an event associated to an element of S belongs to Q.
In addition we choose a subset A of observables in S, which play the role of allowed elementary observations.
We say that a bijection σ from Ω to itself, measurable for , respects a set of observables A if for any Y A, there exists Z A such that Yσ = Z. It means that σ establishes an ordered bijection between the pieces Y (i) and the pieces Z(i), i.e., x ∈ Z(i) if and only if σ(x) ∈ Y (i). In other words the permutation σ respects A when the map σ* which associates the partition Yσ to any partition Y, sends A into A.
In the same way, we say that σ respects a family of probabilities Q if the associated map σ* sends an element of Q to an element of Q.
In the quantum case, with E, h0 and UB, we do the same by asking in addition that σ is a linear unitary automorphism of E.
Definition 9. If X, S, Q, B and A are given, the Galois group G0 is the set of permutations of X (resp. linear maps) that respect S, Q, B and A.
Example 6. Consider the system X associated to the simple classical weighting problem: states are parameterized by points with coordinates 0, 1 or 1 in the sphere Sn−1 of radius 1 in ℝn, according to their weights, either normal, heavier or lighter. Thus in this case Ω = X possesses 2n points. The set A of elementary observables is given by the weighting operations FI,J, Equation (132). For S we take the set S ( A ) of all ordered partitions πk obtained by applications of discrimination trees labelled by A. And we consider only the uniform probability P0 on X; in Q this gives the images of this law by the elements of S, and the conditioning by all the events associated to S.
Then the Galois group G0 is the subgroup S n × C 2 of S 2 n made by the product of the permutation group of n symbols by the group changing the signs of all the xi for i in [n].
Proof: the elements of S n respect A, and the uniform law. Moreover if σ changes the sign of all the xi, one can compensate the effect of σ on FI,J by taking GI,J = FJ,I, i.e., by exchanging the two sides of the balance.
To finish we have to show that permutations of X outside S n × C 2 do not respect A. First, consider a permutation σ that does not respect the indices i. In this case there exists an index i ∈ [n] such that σ(i+) and σ(i) are states associated to different coins, for instance σ(i+) = j+ and σ(i) = k+, with jk, or σ(i+) = j+ and σ(i) = k, with jk. Two cases are possible: these states have the same mass, or they have opposite mass. In both cases let us consider a weighting Fj,h(x) = xj − xh, where hk; by applying σ*Fj,h to x = σ(i+) we find +1 (or 1), and by applying σ*Fj,h to x = σ(i) we find 0. However, this cannot happen for a weighting, because for a weighting, either the change of i+ into i has no effect, either it exchanges the results +1 and 1. Finally, consider a permutation σ that respects the indices but exchanges the signs of a subset I = {i1, …, ik}, with 0 < k < n. In this case let us consider a weighting Fi,j(x) = xi − xj with i ∈ I and j ∈ [n]\I, the function Fi,jσ takes the value +1 for the states i, j, the value 1 for i+, j+ and the value 0 for the other states, which cannot happen for any weighting, because this weighting must involve both i and j, but it cannot be Fj,i(x) = xj − xi, which takes the value 1 for j, and it cannot be Fi,j which takes the value +1 for i+.
The probability laws we are considering express the beliefs in initial knowledge on the system, in this case it is legitimate to consider that they constrain the initial Galois group G0. This corresponds to the Jaynes principle [51,52].
We define in this framework the notion of observation tree adapted to a given subset A of S: it is a finite oriented rooted tree Γ where each vertex s is labelled by an observable Fs belonging to A and each arrow α beginning at s is labelled by an element Fs(i) of Fs. A priori we introduce as many branches as there exist elements in Fs. The disposition of the arrows in the trigonometric circular order makes that the tree Γ is imbedded in the Euclidian plane up to homotopy.
A branch γ in the tree Γ is a sequence α1, …, αk of oriented edges, such that, for each i the initial extremity of αi+1 is the terminal extremity of αi. Then αi+1 starts with the label Fi and ends with the label Fi+1. We will say that γ starts with the root if the initial extremity of α1 is the root s0, with a label F0.
For any edge α in Γ, there exists a unique branch γ(α) starting from the root, and abutting in α. Along this branch, the vertices are decorated with the variables Fi; i = 0, …, Fk and the edges are decorated with values vi of these functions; we note
S ( α ) = ( F 0 , v 0 ; F 1 , v 1 ; ; F k 1 , v k 1 ; F k )
By definition, the set X(α) of states which are compatible with α is the subset of elements of X such that F0(x) = v0, …, Fk−1(x) = vk−1.
At any level k the sets X(α) form a partition πk de X.
Definition 10. We say that an observation tree Γ labelled by A is allowed by S, if all joint observable along each branch belongs to S.
We say simply allowed if their is no risk of confusion.
In what follows this restriction is imposed on all considered tree. Of course if we start with the algebra of all ordered partitions this gives no restriction, but this would exclude the quantum case, where the best we can do is to take maximal commutative families.
Definition 11. Let α be an edge of Γ, we note Q ( α ) the set of probability laws on X(α) which are obtained by conditioning by the values v0, v1…, vk−1 of the observables F0, F1, …, Fk−1 along the branch γ(α) starting in the root and ending with α.
Definition 12. The Galois group G(α) is the set of permutations of elements of X(α) that belongs to G0, preserve all the equations Fi(x) = vi (resp. all the summands of the orthogonal decomposition Fi labelling the edges) and preserve the sets of probability Q(α) (resp. quantum probabilities).
We consider G(α) as embedded in G0 by fixing point by point all the elements of X outside X(α).
Remark 10. Let P be a probability law (either classical or quantum) on X, Φ = (Fi; i ∈ I) a collection of observables, and φ = (vi; i ∈ I) a vector of possible values of Φ; the law P |(Φ = φ) obtained by conditioning P by the equations Φ(x) = φ, is defined only if the set Xφ of all solutions of the system of equations Φ(x) = φ has a non-zero probability pφ = P (Xφ). It can be viewed either as a law on Xφ, or as a law on the whole X by taking the image by the inclusion of Xφ in X.
Definition 13. The edge α is said Galoisian if the set of equations and probabilities that are invariant by G(α) coincide respectively with X(α) and Q ( α ).
A tree Γ is said Galoisian when all its edges are Galoisian.
At each level k we define the group Gk which is the product of the groups G(α) for the free edges at level k; it is a subgroup of G0 preserving elements by elements the pieces of the partition πk.
Along the path γ the partition (or decomposition) πl, l ≤ k of X is increasing (finer and finer) and the sequence of groups Gl, l ≤ k is decreasing.
Along a branch the sets X(α) are decreasing and the sequence of groups G0, G(α1), …, G(αk) is decreasing. We propose that the quotient G(αi+1)/G(αi) gives a measure of the Galoisian information gained by applying Fi and obtaining the value vi.
On each set X(α) the images of the elements of the probability family Q form sets Q ( α ) of probabilities on X(α).
Thus also imposed in the group G(α) to preserve the set Q ( α ).
Remark 11. In terms of coding, introducing probabilities on the X(α) permits to formulate the principle, that it is more efficient to choose, after the edge α, the observation having the largest conditional entropy in Q(α). In what circumstances it gives the optimal discrimination tree is a difficult problem, even if the folklore admit that as a theorem. It is the problem of optimal coding.
In virtue of a Shannon’s theorem, the minimal length is bounded below by entropy of the law on X if this law is unique. We found it works in a simple example of weighting (cf. paper 3 [22]).
Note however important differences between our approach and the traditional one for coding: for us A is given and Q is given; they correspond respectively to an a priori limitation of possible codes for use (like a natural language), and to a set of possible a priori knowledges, for instance taking in account the Galois ambiguity in the system (Jaynes principle). All that is Bayesian in spirit.
Definition 14. We say that an observation tree Γ labelled by A is allowed by S and by X ∈ S, if it is allowed by SX, which means that all joint observable along each branch is divided by X.
Definition 15. S(A) is the set of (ordered) observables πk which can be obtained by allowed observation trees. For X ∈ S we note SX(A) the set of (ordered) observables πk which can be obtained by observation trees that are allowed by S and X.
Lemma 9. The joint product defines a structure of monoid on the set SX(A).
Proof. Let Γ, Γ be two observation trees allowed by A, S and X ∈ S, of respective lengths k, k′, giving final decompositions S, S′. To establish the lemma we must show that the joint SS′ is obtained by a tree associated with A, allowed by S and X.
For that we just graft one exemplar of Γ on each free edge of Γ. This new tree ΓΓ is associated with A, and its final partition is clearly finer than S. It is also finer than S′, because at the end of any branch of ΓΓ we have an X(β) which is contained in the corresponding element of the final partition πk). To finish the proof we have to show that each element of πk+k (ΓΓ) is the intersection of element of πk(Γ) with one element of πk), because we know these observables are in SX, which is a monoid, by the definition of information structure. But a complete branch γ.γ′ in ΓΓ, going from the root to a terminal edge at level k + k′, corresponds to a word (F0, v0, F1, v1, …, Fk−1, vk−1, F 0, v 0 , , F k 1, v k 1, thus the final set of the branch γ.γ′ is defined by the equations Fi = vi; i = 0, …, k−1 et F j = v j; j = 0, …, k′−1, and is the intersection of the sets respectively defined by the first and second groups of equations, that belong respectively to πk(Γ) and πk′ (Γ′).
Then S(A) form an information structure. In particular there is a unique maximal partition, initial element for each subcategory SX(A) in the information structure S(A).
But on S(A) the operation of grafting, that we will describe now, is much richer than what we used in the above Lemma 9: we can graft an allowed tree on each free edge of an allowed tree, and this introduces to a theory of operads and monads for information theory.

6.3. Co-Homology of Observation Strategies

Remember that the elements of the partitions or decompositions Y we are considering, are now numbered by the ordered set {1, …, L(Y)}, where L(Y) is the number of elements in the partition, or the decomposition, also called its length. In particular we consider as different two partitions which are labelled differently by the integers. This was already taken into account in the definition of the Galois groups.
We define the multi-products µ(m; n1, …, nm) on the set of ordered partitions:
They are defined between a partition equipped with an ordering (π, ω) with m pieces and m ordered partitions (π1, ω1), , (πm, ωm) of respective lengths n1, …, nm; the results is the ordered partition obtained by cutting each piece Xi of π by the corresponding decomposition πi and renumbering the non-empty pieces by integers in the unique way compatible with the orderings ω, ω1, …, ωm. Observe the important fact that the result has in general less than n = n1 + + nm pieces. This introduces a strong departure from usual multi-products (cf. P. May [17,53], Loday-Vallette [10]). We do not have an operad, when introducing vector spaces V (m) generated by decompositions of length m, we get filtered but not graded structures. However a form of associativity and neutral element are preserved, hence we propose to name this structure a filtered operads.
There exists an evident unit to the right which is the unique decomposition of length 1.
The action of the symmetric group S m on the products is evident, and does not respect the length of the result. We will designate by µm the collection of products for the same length m.
The numbers mi between 1 and ni that counts the pieces of the decomposition of the element Xi of π are functions mi(π, ω, πi, ωi). There exists a growing injection ηi : [mi] [ni], which depends only on (π, ω, πi, ωi) telling what indices of (πi, ωi) survive in the product. These injections are integral parts of the structure of filtered operad. In particular, if we apply a permutation σi to [ni], i.e., if we replace ωi by ωiσi, the number can change.
The axioms of operadic unity and associativity, conveniently modified are easy to verify (cf. [22]). The reference we follow here is Fresse “Basic concepts of operads” [16]. For unity nothing has to be modified. For associativity (Figure 1.3 in Fresse [16]), we modify by saying that if the (πi, ωi) of lengths ni, for i between 1 et k, are composed from µ(ni; n i 1 , , n i n i) with the ni-uples (, ( π i j , ω i j ), ) whose respective lengths are n i j, and if the result µi for each i has length ( m i 1 + + m i n i) where m i j is function of (πi, ωi) and ( π i j , ω i j ), then the product of (π, ω) of length k with the µi is the same as the one we would have obtained by composing µ(k; n1, …, nk)((π, ω); (π1, ω1), )) with the m = m1 + + mk ordered decompositions ( π i j , ω i j ) for j belonging to the image of ηi : [mi] [ni]. This result is more complicate to write than to prove, because it only expresses the associativity of the ordinary join of three partitions; from which ordering follows.
Moreover, the first axiom concerning permutations (Figure 1.1 in Fresse [16]), can be modified, by considering only permutations of ni letters which preserve the images of the maps ηi.
The second axiom, which concerns a permutation σ of k elements in π, and the inverse permutation of the partitions πi can be reformulated by telling the effect of σ on the multiple product µ is the same as the effect of σ on the indices of the (πi, ωi). In other terms, the effect of σ on ω is compensated by the action of σ1 on the indices of the (πi, ωi). One has to be careful, because the result of µ applied to (π, ωσ) has in general not the same length as µ applied to (π, ω). However the compensation implies that µk is well defined on the quotient of the set of sequences ((π, ω), (π1, ω1), ) by the diagonal action of S k, which permutes the k pieces of π and which permutes the indices i of the ni in the other factors.
Geometrically, if the partition (π, ω) in S(A) is generated by an observation tree Γ with m ending edges and the partitions (πi, ωi); i = 1, …, m are generated by a collection of observation trees Γi; then the result of the application of µ(m; n1, …, nm) to (π, ω) and (πi, ωi); i = 1, …, m is generated by the observation tree that is obtained by grafting each Γi on the vertex number i. Drawing the planar trees associated to three successive sets of decompositions for two successive grafting operations helps to understand the associativity property.
The fact that in general this does not give a tree with n1 + + nm free edges, where ni denotes the number of free edges of Γi comes from the possibility to find an empty set X(β) at some moment along a branch of the grafted tree; this we call a dead branch. It expresses the fact that the empty set is excluded from the elements of a partition in the classical context, and the zero space excluded from the orthogonal decomposition in the quantum context. When computing conditioned probabilities we encounter the same problem if a set X(β) at some place in a branch has measure zero.
The dead branches and the lack of graduation cause a lot of difficulties for studying algebraically the operations µm, thus we introduce more flexible objects, which are the ordered partitions with empty parts of Ω, resp. ordered orthogonal decompositions with zero summands of E: such a partition π* (resp. decomposition) is a family (E1, …, Em) of disjoint subsets of Ω (resp. orthogonal subspaces of E), such that their union (resp. sum) is Ω (resp. E). The only difference with respect to ordered partitions, resp. decompositions, is that we accept to repeat ø (resp. 0) an arbitrary high number of times. For shortening we will name generalized decompositions these new objects. The number m is named the degree of π*. These objects are the natural results of applying rooted observation trees embedded in an oriented half plane.
The notions of adaptation to A, S and X in S concerning the trees, apply to the generated generalized decompositions. The corresponding sets of generalized objets are written S*(A) and S X * ( A ).
The multi-product µ(m; n1, …, nm) extends naturally to generalized decompositions, and in this case the degrees are respected, i.e., the result of this operation is a generalized decomposition of degree n1 + n2 + + nm.
Remark that we could write µ*(m; n1, …, nm) for the multi-products extended to generalized decompositions, however we prefer to keep the same notation µ(m; n1, …, nm); this is justified by the following observation: to a generalized decomposition π* is associated a unique ordered decomposition (π, ω), by forgetting the empty sets (resp. zero spaces) in the family, and the multi-product is compatible with this forgetting application. The gain of the extension is the easy construction of a monad we expose now.
The definition of operad was introduced by P. May [17] as the right tool for studying the homology of infinite loop spaces; then it was recognized as a fundamental tool for algebraic topology, and many other topics, see Loday and Valette, Fresse.
We will encounter only “symmetric” operads.
The multiple products μm on generalized decompositions can be assembled in a structure of monad by using the standard Schur construction (cf. Loday et Valette [10], or Fresse, “on partitions” [16]): For each XS, we introduce the real vector space VX = VX(A) freely generated by the set S X * ( A ) , of generalized decompositions obtained by observation trees that are allowed by A, S and X; the length m define a graduation VX(m) of VX. We put VX(0) = 0.
The maps µm generate m-linear applications from products of these spaces to themselves which respect the graduation; these applications, also denoted by µm, are parameterized by the sets S X * ( m ) ,, whose elements are the generalized decompositions of degree m which are divided by X:
μ m : V X ( m ) S m V X m V X
The linear Schur functor from the category of real vector spaces to itself, is defined by the direct sum of symmetric co-invariants:
V X ( W ) = m 0 V X ( m ) S m W m
The composition of Schur functors is defined by
V X V X = m 0 V X ( m ) S m V X m .
i.e., for each real vector space W:
V X V X ( W ) = m 0 l m n 1 , , n m ; i n i = l V X ( m ) S m i V X ( n i ) S n i W n i
= l 0 m 0 n 1 , , n m ; i n i = l V X ( m ) S m i V X ( n i ) S n 1 , , n k W l ;
where S n 1 , n m denotes the groups of permutations by blocs.
Proposition 5. For each X in S, the collection of operations µm defines a linear natural transformation of functors µX : VX ◦ VX → VX; and the trivial partition defines a linear natural transformation of functors ηX : R → VX, which satisfy the axioms of a monad (cf. MacLane “Categories for Working Mathematician” 2nd ed. [4], and Alain Proute, Introduction a la Logique Categorique, 2013, Prepublications [54]):
μ X ( V X μ X ) = μ X ( V X μ X ) , μ X ( V X η X ) = I d = μ X ( η X V X )
Proof. The argument is the same as the argument given in Fresse (partitions …). The fact that the natural transformation µX is well defined on the quotient by the diagonal action of the symmetric group S m on V X ( m ) i V X ( n i ) S n 1 , , n m W s comes from the verification of the symmetry axiom and the properties of associativity and neutral element comes from the verification of the corresponding axiom.
Moreover all these operations are natural for the functor of inclusion from the category SY to the category SX of observables divided by Y and X respectively when X divides Y; therefore we have the following result:
Proposition 6. To each arrow X → Y in the category S is associated a natural transformation of functors ρ X , Y : V Y V X, making a morphism of monads; this defines a contravariant functor V from the category S to the category of monads, that we name the arborescent structural sheaf of S and A.
Considering the discrete topology on S, we introduce the topos of sheaves of modules over the functor in monads V, which we call the arborescent information topos associated to S and A.
As explained in Proute loc.cit. [54] a monad in a category C becomes a monoid in the category of endo-functors of C, thus the topos we introduce is equivalent to an ordinary ringed topos.
The monad V X, and the contravariant monadic functor V on S, are better understood by considering trees, cf. Getzler-Jones [55], Ginzburg-Kapranov [56] and Fresse [16]; in our context we consider all observation trees labelled by elements of S X * A: if Γ is an oriented rooted tree of level k, each vertex v of Γ gives birth to mv edges; we define
V X ( Γ ) ( W ) = v Γ V X ( m v ) S m v W m v .
The space V (Γ)(W) is the direct sum of spaces VXY)(W) associated to trees which are decorated by a subset Y in S X * ( A ), with one element Yv of SX(m) for each vertex v which gives birth to mv edges.
Then the iterated functors V k = V V for k ≥ 1 are the direct sums of the functors V (Γ) of level k. Remark that we could have worked directly with observation trees labelled by elements of A in spite of working with generalized partitions; this would have given a strictly larger monad but equivalent results.
Associated to probability families we define now a right V X-module (in the terms of Fresse, Partitions, the term V X-algebra being reserved to a structure of left module on a constant functor).
For that we introduce the notion of divided probability.
Definition 16. A divided probability law of degree m is a sequence of triplets (p, P, U) = (p1, P1, U1; ; pm, Pm, Um), where pi; i = 1, …, m are positive numbers of sum one, i.e., p1+…+pm = 1, where each Pi; i = 1, …, m is a classical (resp. quantum) probability law when the corresponding pi is strictly positive, and a probability law or the empty set when the corresponding pi is equal to 0, and where each Ui; i = 1, …, m is the support in X of Pi; moreover the Ui are assumed to be orthogonal (resp. disjoint in the classical case). The letter P will designate the probability p1P1 +…+pmPm, where 0.∅ = 0 when it happens.
The symbol D ( m ) designates the set of divided probabilities of degree m on X, and D X ( m ) denotes the subset made with probability laws in QX adapted to a variable X.
The vector space generated by D X ( m ) will be written X ( m ). We put X ( 0 ) = 0.
We also introduce the subspace K ( m ) of X ( m ) which is generated by two families of vectors in X ( m ):
First the vectors
L ( λ , p , p , P , U ) = λ ( p , P , U ) + ( 1 λ ) ( p , P , U ) ( λ p + ( 1 λ ) p , P , U ) ,
where λ is any real number between 0 and 1, and (p′, P, U), (p″, P, U) two divided probabilities associated to the same sequence of probability laws (P1,…, Pm) and the same supports (U1, …, Um);
Second the vectors
D ( p , P , U , Q , V ) = ( p , P , U ) ( p , P , U ) ,
where for each index i between 1 and m, such that pi > 0 we have P i = P i, and consequently U i = U i.
The we define the space of classes of divided probabilities as the quotient real vector space M X ( m ) = X ( m ) / K ( m ). In particular MX (0) = 0, MX (1) is freely generated over ℝ by the elements of QX.
Lemma 10. The space M X ( m ) is freely generated over ℝ by the vectors (∅, …, ∅, Pi, ∅, …, ∅) of length m, where at the rank i, Pi is an element of QX.
Proof. Let D = (p1, P1, U1), …, (pm, Pm, Um) be a divided probability; we consider for each i between 1 and m the divided probability
D i = ( 0 , P 1 , U 1 ) , , ( 0 , P i 1 , U i 1 ) , ( 1 , P i , U i ) , ( 0 , P i + 1 , U i + 1 ) , , ( 0 , P m , U m ) ,
then the vector D i p i D i is a sum of vectors of type L in K X ( m ). However, for each i, the vector Di (∅, …, ∅, Pi, ∅, …, ∅) is of type D, thus the particular vectors of the Lemma 10 generate M X ( m ).
Now, we prove that, if a linear combination of r of these vectors belongs to K X, the coefficients of this combination must all be equal to 0. We proceed by recurrence on r, the result being evident for r = 1. We also can suppose that at least two involved vectors have a non-empty element at the same place, which we can suppose to be i = 1. All vectors with p1 = 0 can be replaced by a vector where P1 = ∅ using an element of type D in K X ( m ), then we can assume that at least one of the vectors has a p1 strictly positive, i.e., equals to 1. Let us consider all these vectors D1, …, Ds, for 2 ≤ sr, their other numbers pi for i > 1 are zero. The other vectors Dj, for j > s having the coordinate p1 equal to zero. Let ∑j λj Dj be the linear combination of length r belonging to K X ( m ); this vector is a linear combination of vectors of type L and D. We can suppose that every λj is non-zero. Let us consider an element Q of QX which appears in at least one of the Dj, js; this Q cannot appear in only one Dj, because the sum of coefficients λ multiplied by the first p1 in front of any given Q in a vector L or D is zero. Thus we have at least two Dj with the same P1. We can replace the sum of them with λj positive (resp. negative) by only one special vector of the Lemma 10 using a sum of multiples of vectors of type L. Then we are left with the case of two vectors, D1, D2 having P1 = Q such that λ1 + λ2 = 0, which means that λ1D1 + λ2D2 is multiple of a vector of type D. Subtracting it we can apply the recurrence hypothesis and conclude that the considered linear relation is trivial.
As a corollary an equivalent definition of the spaces M X ( m ) would be the real vector space freely generated by pairs (P, i) where PQX and i ∈ [m]. Such a vector, identified with (∅, .., P, …, ∅) in X ( m ), where only the place i is non-empty, will be named a simple vector of degree m.
Let S = (S1, …, Sm) be a sequence of generalized decompositions in S X * ( A ), of respective degrees n1, …, nm, with n = n1 + + nm, and let (p, P, U) be an element of D X ( m ), we define θ((p, P, U), S) as the following divided probability of degree n: if, for i = 1, …, m the decomposition Si is made of pieces E i j i where ji varies between 1 and ni, we take for p i j i is the classical probability ( E i j i U i ); we take for P i j i the law Pi conditioned by the event Si = ji which corresponds to E i j i; and we take for U i j i the support of P i j i. Then we order the obtained family of triples ( p i j i , P i j i , U i j i ) i = 1 , , m ; j i = 1 , , n i by the lexicographic ordering. It is easy to verify that the resulting sequence is a divided probability.
Extending by linearity we get a linear map,
λ m : X ( m ) V X ( n 1 ) V X ( n m ) X ( n 1 + n m ) ,
By linearity a vector of type L in X ( m ), tensorized with S1⊗…⊗Sm goes to a linear combination of vectors of type L in X ( n ). Moreover, if pi = 0 for an index i in [m], all the p i j i are zero, thus a vector of type D goes to a vector of type D. Then the map λm sends the subspace K X ( m ) V X ( n 1 ) V X ( n m ) into the subspace K X ( n 1 + n m ), thus it defines a linear map
θ m : X ( m ) V X ( n 1 ) V X ( n m ) X ( n 1 + n m ) ,
On a simple vector (P, i), the operation θm is independent of the Sj for i ≠ i.
Now we introduce the Schur functor M X of symmetric co-invariant spaces X ( W ) = m X ( m ) S m W m from the category of real vector space to itself, associated to the S - module X * (cf. Loday and Valette [10], Fresse [16]), formed by the graded family X ( m ) ; m .
Then the maps θm define a natural transformation of functors:
θ X : X V X .
In addition, this set of transformations behaves naturally with respect to X in the information category S. Note that it defines a co-variant functor, not a presheaf.
For simplicity, we will note in general θ, µ, , V, … and not θX, µX, X, V X, …, but we memorize this is an abuse of language.
Then the composite functor V ( W ) is given by
X V X ( W ) = m 0 X ( m ) S m i ( V X ( n i ) S n i W n i ) = n 0 m 0 n 1 , , n m ; Σ i n i = n X ( m ) S m i V X ( n i ) S n 1 , , n k W n ;
where S n 1 , , n m denotes the groups of permutations by blocs.
Proposition 7. The natural transformation θ defines a right action in the sense of monads, i.e., we have
θ ( μ ) = θ ( θ V ) ; θ ( η ) = I d .
Proof. The proof is the same as for proposition 5, by using the associativity of conditioning, and the Bayes identity P (A ∩ B) = P (A|B)P (B).
Ginzburg and Kapranov [56] gave a construction of the (co)bar complex of an operad based on decorated trees. It is a graded complex of operads, with a differential operator of degree 1. The dual construction can be found in Getzler et Jones [55]; it gives a graded complex of co-operads with a differential operator of degree +1. The link with quasi-free co-operads and operads (Quillen’s construction) is developed by Fresse (in “partitions” [16]); in this article Fresse also shows that these constructions correspond to the simplicial bar construction for the monads (Maclane) and to the natural notions of derived functors in this context.
In our case, with two right modules, the easiest way is to use the bar construction of Beck (1967) [19], further explicited by Fresse with decorated trees in the case of monads coming from operads.
A morphism from a right module over V to a right module over V is a natural transformation f of the first functor in the second such that f θ M = θ R f V.
In what follows we will use the module R which comes from the functor of symmetric powers:
R ( W ) = m S m ( W ) ;
it is the Schur functor associated to the trivial S - module, ( m ) = , i.e., the action of S m on ( m ) is trivial. We put ( 0 ) = .
The right action of V X is given by the map
ρ m : X ( m ) V X ( n 1 ) V X ( n m ) X ( n 1 + n m ) ,
which send each generator (1, S1, …, Sm) to 1 in ( n ) = .
The axioms of a right module are easy to verify.
This V-module will play the dual role of the trivial module in the case of information structure co-homology.
Following Beck (Triples, Algebras, Cohomology, 1967, 2002 [19]), we consider the simplicial bar complex X V X * extending the right module on V by the sequence of modules . X V X ( k + 1 ) X V X k . Then we introduce the growing complex C ( X ) of measurable morphisms from X V X * to the symmetric right module R.
For a given k ≥ 0, a morphism F from X V X k to R is defined by a family of maps F (N) : X V X k ( N ) ( N ) = for N ∈ ℕ.
This gives a family of measurable numerical functions of a divided probability law (p, P, U), of degree m ≤ N, indexed by forests having m components trees of height k and having total number of ending branches N.
We denote such a family of functions by the symbol FX(S1; S2; …; Sk; (p, P, U)), indexed by X in S, where S1; …; Sk here designates the sets of decompositions present in the trees at each level from 1 to k.
First we remark that the compatibility with the action of V X to the right imposes that for any allowed set of variables Sk+1 we must have
F X ( S 1 ; S 2 ; ; μ ( S k , S k + 1 ) ; ( p , P , U ) ) = F X ( S 1 ; S 2 ; ; S k ; ( p , P , U ) ) .
By taking for Sk the collection (π0, …, π0), we deduce that FX is independent of the last variable.
This has the effect of decreasing the degree in k by one, for respecting the preceding conventions on information cochains; i.e., we pose C k ( M X ) = H o m ( X V ( k + 1 ) , ).
Secondly, as we are working with the quotient of the space generated by divided probabilities (p, P, U) by the space generated by linearity relations on the external law p, for (p, P, U) of degree m, we have
F X ( S 1 ; S 2 ; ; S k ; ( p , P , U ) ) = i = 1 m p i F X ( S 1 ; S 2 ; ; S k ; ( P i ; i , m ) ) ;
where (Q; i, m) designates the divided probability of degree m where all the laws in the sequence are empty except for the number i where it is equal to Q.
Moreover, from the definition of θ and the rule of composition of functors, for any m ≥ 1 and i ∈ [m], and any simple vector (Q, i, m), the value of F on any forest depends only on the tree component of index i; that we can summarize by the following identity:
F X ( S 1 ; S 2 ; ; S k ; ( Q ; i , m ) ) = F X ( T ( S 1 i ; S 2 i ; ; S k i ) ; ( Q ; i , m ) ) ;
where T ( S 1 i ; S 2 i ; ; S k i ) designates the tree numbered by i, prolonged in any manner at all the places ji.
Definition 17. An element of C k ( M X ) is said regular when for each degree m and each index i between 1 and m, we have, for each ordered forest S1; S2; …; Sk of m trees, and each probability Q,
F X ( S 1 ; S 2 ; ; S k i ; ( Q ; i , m ) ) = F X ( S 1 i ; S 2 i ; ; S k i ; Q ) ;
where S 1 i ; S 2 i ; ; S k i designates the tree number i.
Due to Equation (150), this makes that regular elements are defined by their values on trees and ordinary, not divided probabilities.
The adjective regular can be better interpreted as “local in the sense of observation trees”.
The vector space C X k ( N ) is generated by families of functions of divided probabilities FX(S1; S2; …; Sk; (p, P, U)), indexed by X in S and forests S1; …; Sk of level k. These families are supposed local with respect to X, which means that it is compatible with direct image of probabilities under observables in S.
Remark 12. As we showed in the static case, in the classical context, locality is equivalent to the fact that the values of the functions depend on ℙ through the direct images of ℙ by the joint of all the ordered observables which decorate the tree (the joint of the joints along branches); but this is not necessarily true in the quantum context, where it depends on Q. However it is true for Qmin, in particular Qcan which is the most natural choice.
The spaces C k ( M X ) form a natural degree one complex:
The faces δ i ( k ) ; 1 i k are given by applying µ on V V at the places (i, i + 1); the last face δ k + 1 ( k ) ; 1 i k consists in forgetting the last functor, the operation denoted by ϵ; and the zero face is given by the action θ. Then the boundary δ(k) is the alternate sum of the operators δ i ( k ) ; 0 i k + 1: if F is measurable morphism from V k to ℝ, then
δ F = F ( θ V k ) i = 0 , , k 1 ( 1 ) i F V i μ V k i 1 ( 1 ) k F V k ϵ .
The zero face in the complex C X * corresponds to the right action of the monad VX on divided probabilities; on regular cochains it is expressed by a generalization of the formula (20): if (P, i, m) is a simple vector of degree m and S0; S1; …; Sk a forest of level k + 1, with m component trees, then
F S 0 ( S 1 ; ; S k ; ( P , i , m ) ) = F ( S 1 ; ; S k ; θ ( ( P , i , m ) S 0 ) ) = j i = 1 , , n i ( S 0 i = j i ) F ( ( S 1 j i ; S 2 j i ; ; S 2 j i ; ( P | S 0 i = j i ) ) ,
where S 1 j i ; S 2 j i ; ; S k j i designates the tree number ji grafted on the branch ji of the variable S0,i at the place i in the collection S0.
The formula (154) is compatible with the existence of dead branches.
Note that natural integers come into the play under two different aspects: m is for the internal monadic degree and counts the number of components, or the length of partitions, k is for the height of the trees in the forest. The number k gives the degree in co-homology.
The coboundary δ of C is of degree +1 with respect to k and degree 0 with respect to m. For any m ∈ ℕ, the operator δ has the formula of the coboundary given by the simplicial structure associated to θ and µ:
δ F ( S 0 ; S 1 ; ; S k ; ( p , P , U ) ) = F S 0 ( S 1 ; ; S k ; ( p , P , U ) ) + i = 1 i = k ( 1 ) i F ( S 0 ; ; μ ( S i 1 S i ) ; S i + 1 ; ; S k ; ( p , P , U ) ) + ( 1 ) k + 1 F ( S 0 ; ; S k 1 ; ( p , P , U ) )
We constat that locality is preserved by δ.
Lemma 11. If the transformation F is regular, then δF is regular; in other terms, the regular elements form a sub-complex C k r ( X ).
Proof. Let (P, i, m) be a simple vector and S0; …; Sk a forest with m components; let us denote by S 0 j the variable number j having degree nj, and n = n1 + … + nm; we have
δ F ( S 0 ; ; S k ; ( P , i , m ) ) = F ( S 1 ; ; S k ; θ ( ( P , i , m ) S 0 i ) ) F ( μ ( S 0 , S 1 ) ; ; S k ; ( P , i , m ) ) + ( 1 ) k F ( S 0 ; ; μ ( S k 1 , S k ) ; ( P , i , m ) ) + ( 1 ) k + 1 F ( S 0 ; ; S k 1 ; ( P , i , m ) ) .
The first term on the right is a combination of the image of F for the n simple vectors P . S 0 i , j i of degree n = n1 + … + nm which result from the division of (P, i, m) by S 0 i. If F is regular, this combination is the same as the combination of the simple vectors of degree ni constituting the division of (P, i, m) by S 0 i, which gives the same result as the first term on the right in the formula
δ F ( S 0 i ; ; S k i ; ( P , 1 , 1 ) ) = F ( S 1 i ; ; S k i ; θ ( P , S 0 i ) ) F ( μ ( S 0 i , S 1 i ) ; ; S k i ; P ) + ( 1 ) k F ( S 0 i ; ; μ ( S k 1 i , S k i ) ; P ) + ( 1 ) k + 1 F ( S 0 i ; ; S k 1 i ; P ) .
If F is regular the term number l > 1 on the right of the equation (156) coincides with the corresponding term on the right of the Equation (157).
Therefore the terms on the left in Equation (156) coincides with the left term in (157); which establishes the lemma.
We define C r * ( X ) as the sub-complex of regular vectors in C ( X ). Its elements are named tree information cochains or arborescent information cochains.
By definition, the tree information co-homology is the homology of this regular complex, considered as a sheaf of complexes over the category S(A), i.e., a contravariant functor. This corresponds to the topos information co-homology in the monadic context.
To recover the case of the ordinary algebra of partitions, and the formulas of the bar construction in the first sections of this article, we have to take the special case where all the decompositions of the same level coincide at every level of the forests. In this case, we can replace the quotient X by the modules of conditioning by a redefinition of the action on functions X. However the notion of divided probabilities for observation trees and the definition of co-homology in the monadic context can be seen as the natural basis of information co-homology.
When k = 0, in the classical case, a cochain is a function f(ℙ), the locality condition tells that it is a constant; and in this case it is a cocycle because the sum of probabilities equals one implies f(ℙ) = fS(ℙ). Then H τ 0 has dimension one.
When k = 0, in the quantum case, the spectral functions of ρ in the QX gives invariant information co-chains. Among them the Von Neumann entropy is specially relevant because its co-boundary gives the classical entropy. However, only the constant function is an invariant zero degree co-cycle. Thus again H U 0 has dimension one.
For k = 1, a cochain is given by a function FX(S; P), such that, each time we have X → Y → S and elements of Y refines S, we have FX(S; P) = FY (S; YP). It is a cocycle when for every collection S1, …, Sm of m observables, where m is the length of S, we have
F ( μ m ( S , ( S 1 , , S m ) ) ; P ) = F ( S ; P ) + i ( S = i ) F ( F ( S i ; P | S = i ) .
Note that the partition µm(S, (S1, …, Sm)) is not the joint of S and the Si for i ≥ 1, except when all the Si coincide. Thus it is amazing that the ordinary entropy also satisfies this functional equation, finer than the Shannon’s identity:
Proposition 8. The usual entropy H(Sℙ) = H(S; ℙ) is an arborescent co-cycle.
Proof. By linearity on the module of divided probabilities X, we can decompose the probability ℙ in the conditional probabilities ℙ|(S = s), thus we can restrict the proof of the lemma to the case where S = π0 is the trivial partition, i.e., m = 1.
Let Xi; i = 1, …, m denote the elements of the partition associated to S0 and X i j ; j = 1 , , n i the pieces of the intersection of Xi with the elements of the partition associate to Si; note pi the probability of the event Xi and p i j the probability of the event X i j; we have
H ( μ m ( S 0 ; ( S 1 , , S m ) ) ; = i = 1 i = m j = 1 j = n i p i j log 2 p i j ,
and
H S 0 ( S 1 ; ; S m ; ) = i = 1 i = m p i j = 1 j = n i p i j p i log 2 p i j p i
= i = 1 i = m j = 1 j = n i p i j ( log 2 p i j log 2 p i )
= i = 1 i = m j = 1 j = n i p i j log 2 p i j + i = 1 i = m log 2 p i j = 1 i = n i p i j
= i = 1 i = m j = 1 j = n i p i j log 2 p i j + i = 1 i = m p i log 2 p i ,
then
H ( μ m ( S 0 ; ( S 1 , , S m ) ) ; ) H S 0 ( S 1 ; ; S m ; ) = H ( S 0 ; ) .
Q.E.D.
This identity was discovered by Faddeev, Baez, Fritz, Leinster see [20]. However, we propose that information homology explains its significance.
When the category of quantum information S, the set A and the probability functor Q are invariant under the unitary group, and if we choose a classical full subcategory S, there is trace map from Q to Q, induces a morphism from the classical arborescent co-homology of S, A and Q to the invariant quantum arborescent co-homology of S, A and Q.
As a corollary of the Lemma 10 and the Theorems 1 and 3, we obtain the following result:
Theorem 4. (i) both in the classical and the invariant quantum context, if S(A) is connected, sufficiently rich, and if Q is canonical, every 1-co-cycle is co-homologous to the entropy of Shannon; (ii) in the classical case H1( S, A, Q) is the vector space of dimension 1 generated by the entropy; (iii) in the quantum case H U 1 ( S , A , Q ) = 0, and the only invariant 0-cochain which has for co-boundary the Shannon entropy is (minus) the Von-Neumann entropy.

6.4. Arborescent Mutual Information

For k = 2, a cochain is given by a local function of a probability and a rooted decorated tree of level 2. It is a cocycle when the following functional equation is satisfied
i ( S = i ) F ( T i ; U i ; P | S = i ) F ( S ; T ; P ) = F ( μ m ( S T ) ; U ; P ) F ( S ; ( μ n i ( T i U i ) ; i [ m ] ) ; P ) ,
where S denotes a variable of length m, T a collection of m variables T1, …, Tm of respective lengths n1, …, nm and U a collection of variables U i , j k of respective lengths ni,j, with i going from 1 to m, j going from 1 to ni and k going from 1 to ni,j; the notation Ui denoting the collection of variables U i , j k of index i.
Our aim is to extend in the monadic context the topological action of the ordinary information structure on functions of probability used in the discussion of mutual information.
For that, we define another structure of V X -right module on the functor X associated to probabilities, by defining the following map θt(m) from X ( m ) tensorized with VX(n1)⊗…⊗VX(nm) to X ( n ), for n = n1 + … + nm:
θ t ( ( P , i , m ) S 1 S m ) = j = 1 , , n i ( P , ( i , j ) , n ) .
Remark that the generalized decompositions Sj are used only through the orders on their elements.
As for , it is easy to verify that the collection of maps θt(m) defines a right action of the monad VX on the Schur functor X.
Then we consider as before, the graded vector space C ( X ) of homomorphisms of V-modules from the functors V k ; k 0 to the functor which are measurable in the probabilities P . As before, on C ( X ), we shift the degree by one, because of the independency with respect to the last stage of the forest, which follows from the trivial action on .
The topological coboundary operator δt is defined in every degree by the formula of the simplicial bar construction, as in Equation (153) for δ, but with θt replacing θ. It corresponds to the usual simplicial complex of the family V k. A cochain is represented by a family of functions of probability laws FX(S1; …; Sk; (P, i, m)), where S1; …; Sk denotes a forest with m trees of level k. The operator δt is given by
δ t F ( S 0 ; ; S k ; ( P , i , m ) ) = F ( S 1 ; ; S k ; θ t ( ( P , i , m ) , S 0 ) ) F ( μ ( S 0 , S 1 ) ; ; S k ; ( P , i , m ) ) + ( 1 ) k F ( S 0 ; μ ( S k 1 , S k ) ; ( P , i , m ) ) + ( 1 ) k + 1 F ( S 0 ; ; S k 1 ; ( P , i , m ) ) .
where n = n1 + … + nm is the sum of numbers of branches of the generalized decompositions S 0 i for i = 1, …, m.
As for δ, a value F (S1; …; Sk; (P, j, n) depends only on the tree S 1 j ; ; S k j rooted at the place numbered by j in the forest S1; …; Sk.
Lemma 12. The coboundary δt sends a regular cochain to a regular cochain.
Proof. Consider a simple vector (P, i, m) in X(m) and a forest S0; …; Sk with m components; we denote by S 0 j the variable number j having degree nj, and n = n1 + … + nm, and we consider the formula (167).
If F is regular the first term on the right is the sum of the images by F for P and the n trees S 1 i , j i which result from the forgetting of the first branches S 0 j, and the other terms on the right are equal to the value of F for P and the tree rooted at i in S0. On the other side for the tree S 0 i ; ; S k i, if F is regular, we have
δ F ( S 0 i ; ; S k i ; ( P , 1 ) ) = j F ( S 1 i , j ; ; S k i , j ; ( P , 1 ) ) F ( μ ( S 0 i , S 1 i ) ; ; S k i ; ( P , 1 ) ) + ( 1 ) k F ( S 0 i ; ; μ ( S k 1 i , S k i ) ; ( P , 1 ) ) + ( 1 ) k + 1 F ( S 0 i ; ; S k 1 i ; ( P , 1 ) ) .
Thus δF is topologically regular.
Consequently we can restrict δt to the subcomplex C r * ( N X ), and name its homology the arborescent, or tree, topological information co-homology, written Hτ,t(S, A, Q).
Now we suggest to extend the notion of mutual information I(X; Y ; ℙ) in the way it will be a cocycle for this co-homology as it was the case for the Shannon mutual information in the ordinary topological information complex. We suggest to adopt the formulas using δ and δt, as in the standard case:
Definition 18. Let H(T ; (P, i, m)) denotes the regular extension to forests of the usual entropy; then the mutual arborescent information between a partition S of length m and a collection T of m partitions T1, …, Tm is defined by
I α ( S ; T ; ) = δ t H ( S ; T ; ) .
The identity δH = 0 implies
I α ( S ; T ; ) = i = 1 i = m H ( T i ; ) ( S = i ) H ( T i ; | S = i ) ) .
In the particular case were all the Ti are equal to a variable T , it gives
I α ( S ; T ; ) = i = 1 i = m ( S = i ) ( H ( T ; ) H ( T ; | S = i ) ) + ( m 1 ) H ( T ; ) = H ( T ; P ) i = 1 i = m ( S = i ) H ( T ; | ( S = i ) ) + ( m 1 ) H ( T ; ) = H ( T ; ) H S ( T ; ) + ( m 1 ) H ( T ; ) ,
then
I α ( S ; T ; ) = I ( S ; T ; ) + ( m 1 ) H ( T ; ) .
For S ( A (, the function Iα is an arborescent topological 2-cocycle.
It satisfies the Equation (165) were ℙ replaces conditional probabilities ℙ|(S = i) and where the factors ℙ(S = i) disappear. Remark that, in this manner, maximization of Iα(S; T ; ℙ) comports maximization of usual mutual information I(S; T ; ℙ) and unconditioned entropies H(Ti; ℙ).
Pursuing the homological interpretation of higher mutual information quantities given by the Formulas (55) and (56), we suggest the following definition:
Definition 19. The mutual arborescent informations of higher orders are given by Iα,N = (δδt)MH for N = 2M + 1 odd and by Iα,N = δt(δδt)M H for N = 2M + 2 even.

Acknowledgments

We thank MaxEnt14 for the opportunity to present these researches to the information science community. We thank Guillaume Marrelec for discussions and notably his participation to the research of the last part on optimal discrimination. We thank Frederic Barbaresco, Alain Chenciner, Alain Proute and Juan-Pablo Vigneaux for discussions and comments on the manupscript. We thank the "Institut des Systemes complexes" (ISC-PIF) region Ile-de-France, and Max Planck Institute For Mathematic in the Science for the financial support and hosting of P. Baudot.

Author Contributions

Both authors contribute equally to the research, the second author wrote the manuscript. Both authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J 1948, 27, 379–423. [Google Scholar]
  2. Kolmogorov, A. Combinatorial foundations of information theory and the calculus of probabilities. Russ. Math. Surv. 1983, 38. [Google Scholar] [CrossRef]
  3. Thom, R. Stabilité struturelle et morphogénèse; deuxième ed.; Dunod: Paris, France, 1977; in French. [Google Scholar]
  4. Mac Lane, S. Categories for the Working Mathematician; Springer: Berlin/Heidelberg, Germany, 1998. [Google Scholar]
  5. Mac Lane, S. Homology; Springer: Berlin/Heidelberg, Germany, 1975. [Google Scholar]
  6. Hu, K.T. On the Amount of Information. Theory Probab. Appl. 1962, 7, 439–447. [Google Scholar]
  7. Baudot, P.; Bennequin, D. Information Topology I, in preparation.
  8. Elbaz-Vincent, P.; Gangl, H. On poly(ana)logs I. Compos. Math. 2002, 130, 161–214. [Google Scholar]
  9. Cathelineau, J. Sur l’homologie de sl2 a coefficients dans l’action adjointe. Math. Scand. 1988, 63, 51–86. [Google Scholar]
  10. Loday, J.L.; Valette, B. Algebraic Operads; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
  11. Matsuda, H. Information theoretic characterization of frustrated systems. Physica A 2001, 294, 180–190. [Google Scholar]
  12. Brenner, N.; Strong, S.; Koberle, R.; Bialek, W. Synergy in a Neural Code. Neural Comput. 2000, 12, 1531–1552. [Google Scholar]
  13. Nielsen, M.; Chuang, I. Quantum Computation and Quantum Information; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]
  14. Baudot, P.; Bennequin, D. Topological forms of information. AIP Conf. Proc. 2015, 1641, 213–221. [Google Scholar]
  15. Baudot, P.; Bennequin, D. Information Topology II, in preparation.
  16. Fresse, B. Koszul duality of operads and homology of partitionn posets. Contemp. Math. Am. Math. Soc. 2004, 346, 115–215. [Google Scholar]
  17. May, J.P. The Geometry of Iterated Loop Spaces; Springer: Berlin/Heidelberg, Germany, 1972. [Google Scholar]
  18. May, J.P. Einfinite Ring Spaces and Einfinite Ring Spectra; Springer: Berlin/Heidelberg, Germany, 1977. [Google Scholar]
  19. Beck, J. Triples, Algebras and Cohomology. Ph.D. Thesis, Columbia University, New York, NY, USA, 1967. [Google Scholar]
  20. Baez, J.; Fritz, T.; Leinster, T. A Characterization of Entropy in Terms of Information Loss. Entropy 2011, 13, 1945–1957. [Google Scholar]
  21. Marcolli, M.; Thorngren, R. Thermodynamic Semirings 2011, arXiv. [CrossRef]
  22. Baudot, P.; Bennequin, D. Information Topology III, in preparation.
  23. Gromov, M. In a Search for a Structure, Part 1: On Entropy. 2013. Available online: http://www.ihes.fr/gromov/PDF/structre-serch-entropy-july5-2012.pdf accessed on 6 May 2015.
  24. Watkinson, J.; Liang, K.; Wang, X.; Zheng, T.; Anastassiou, D. Inference of Regulatory Gene Interactions from Expression Data Using Three-Way Mutual Information. Chall. Syst. Biol. Ann. N.Y. Acad. Sci. 2009, 1158, 302–313. [Google Scholar]
  25. Kim, H.; Watkinson, J.; Varadan, V.; Anastassiou, D. Multi-cancer computational analysis reveals invasion-associated variant of desmoplastic reaction involving INHBA, THBS2 and COL11A1. BMC Med. Genomics. 2010, 3. [Google Scholar] [CrossRef]
  26. Uda, S.; Saito, T.H.; Kudo, T.; Kokaji, T.; Tsuchiya, T.; Kubota, H.; Komori, Y.; ichi Ozaki, Y.; Kuroda, S. Robustness and Compensation of Information Transmission of Signaling Pathways. Science 2013, 341, 558–561. [Google Scholar]
  27. Han, T.S. Linear dependence structure of the entropy space. Inf. Control. 1975, 29, 337–368. [Google Scholar]
  28. McGill, W. Psychometrika. Multivar. Inf. Transm. 1954, 19, 97–116. [Google Scholar]
  29. Kolmogorov, A.N. Grundbegriffe der Wahrscheinlichkeitsrechnung; Springer: Berlin/Heidelberg, Germany, 1933; in German. [Google Scholar]
  30. Artin, M.; Grothendieck, A.; Verdier, J. Théorie des topos et cohomologie étale des schémas—(SGA 4) Tome I,II,III; Springer: Berlin/Heidelberg, Germany, in French.
  31. Grothendieck, A. Sur quelques points d’algèbre homologique, I. Tohoku Math. J 1957, 9, 119–221. [Google Scholar]
  32. Gabriel, P. Objets injectifs dans les catégories ab liennes. Séminaire Dubreil. Algèbre et théorie des nombres 12, 1–32.
  33. Bourbaki, N. Algèbre, chapitre 10, Algèbre homologique; Masson: Paris, France, 1980; in French. [Google Scholar]
  34. Cartan, H.; Eilenberg, S. Homological Algebra; The Princeton University Press: Princeton, NJ, USA, 1956. [Google Scholar]
  35. Tverberg, H. A new derivation of information function. Math. Scand. 1958, 6, 297–298. [Google Scholar]
  36. Kendall, D. Functional Equations in Information Theory. Z. Wahrscheinlichkeitstheorie 1964, 2, 225–229. [Google Scholar]
  37. Lee, P. On the Axioms of Information Theory. Ann. Math. Stat. 1964, 35, 415–418. [Google Scholar]
  38. Kontsevitch, M. The 1+1/2 logarithm. Unpublished note. Reproduced in Elbaz-Vincent & Gangl, 2002 On poly(ana)logs I. Compositio Mathematica, 1995; e-print math.KT/0008089. [Google Scholar]
  39. Khinchin, A. Mathematical Foundations of Information Theory; Dover: New York, NY, USA; Silverman, R.A.; Friedman, M.D., Translators; From two Russian articles in Uspekhi Matematicheskikh Nauk; 1957; pp. 17–75. [Google Scholar]
  40. Yeung, R. Information Theory and Network Coding; Springer: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
  41. Cover, T.M.; Thomas, J. Elements of Information Theory; Wiley: Weinheim, Germany, 1991. [Google Scholar]
  42. Rindler, W.; Penrose, R. Spinors and Spacetime, 2nd ed; Cambridge University Press: Cambridge, UK, 1986. [Google Scholar]
  43. Landau, L.D.; Lifshitz, E.M. Fluid Mechanics, 2nd ed; Volume 6 of a Course of Theoretical Physics; Pergamon Press, 1959. [Google Scholar]
  44. Balian, R. Emergences in Quantum Measurement Processes. KronoScope 2013, 13, 85–95. [Google Scholar]
  45. Borel, A.; Ji, L. Compactifications of Symmetric and Locally Symmetric Spaces. In Unitary Representations and Compactifications of Symmetric Spaces; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
  46. Doering, A.; Isham, C. Classical and quantum probabilities as truth values. J. Math. Phys. 2012, 53. [Google Scholar] [CrossRef]
  47. Meyer, P. Quantum Probability for Probabilists; Springer: Berlin, Germany, 1993. [Google Scholar]
  48. Souriau, J. Structure des Systemes Dynamiques; Jacques Gabay: Paris, France, 1970; in French. [Google Scholar]
  49. Catren, G. Towards a Group-Theoretical Interpretation of Mechanics. Philos. Sci. Arch. 2013. http://philsci-archive.pitt.edu/10116/.
  50. Bachet Claude-Gaspar, Problèmes plaisans et délectables, qui se font par les nombres; A. Blanchard: Paris, France, 1993; p. 1612, in French.
  51. Jaynes, E.T.; Information, Theory. Statistical Mechanics. In Statistical Physics; Ford, K., Ed.; Benjamin: New York, NY, USA, 1963; p. 181. [Google Scholar]
  52. Jaynes, E.T. Prior Probabilities. IEEE Trans. Syst. Sci. Cybern. 1968, 4, 227–241. [Google Scholar]
  53. Cohen, F.; Lada, T.; May, J. The Homology of Iterated Loop Spaces; Springer: Berlin, Germany, 1976. [Google Scholar]
  54. Prouté, A. Introduction la Logique Catégorique. 2013. Available online: www.logique.jussieu.fr/~alp/ accessed on 6 May 2015.
  55. Getzler, E.; Jones, J.D.S. Operads, homotopy algebra and iterated integrals for double loop spaces 1994, arXiv. hep-th/9403055v1.
  56. Ginzburg, V.; Kapranov, M.M. Koszul duality for operads. Duke Math. J 1994, 76, 203–272. [Google Scholar]

Share and Cite

MDPI and ACS Style

Baudot, P.; Bennequin, D. The Homological Nature of Entropy. Entropy 2015, 17, 3253-3318. https://doi.org/10.3390/e17053253

AMA Style

Baudot P, Bennequin D. The Homological Nature of Entropy. Entropy. 2015; 17(5):3253-3318. https://doi.org/10.3390/e17053253

Chicago/Turabian Style

Baudot, Pierre, and Daniel Bennequin. 2015. "The Homological Nature of Entropy" Entropy 17, no. 5: 3253-3318. https://doi.org/10.3390/e17053253

Article Metrics

Back to TopTop