Arrow Contraction and Expansion in Tropical Diagrams

Rostislav Matveev; Jacobus W. Portegies

doi:10.3390/e25121637

Abstract

Arrow contraction applied to a tropical diagram of probability spaces is a modification of the diagram, replacing one of the morphisms with an isomorphism while preserving other parts of the diagram. It is related to the rate regions introduced by Ahlswede and Körner. In a companion article, we use arrow contraction to derive information about the shape of the entropic cone. Arrow expansion is the inverse operation to the arrow contraction.

Keywords:

tropical probability; entropic cone

1. Introduction

In [1], we have initiated the theory of tropical probability spaces for the systematic study of information optimization problems in information theory and artificial intelligence, such as those arising in robotics [2], neuroscience [3], artificial intelligence [4], variational autoencoders [5], information decomposition [6], and causal inference [7]. In [8], we applied the techniques to derive a dimension-reduction result for the entropic cone of four random variables.

Two of the main tools used for the latter are what we call arrow contraction and arrow expansion. They are formulated for tropical commutative diagrams of probability spaces. Tropical diagrams are points in the asymptotic cone of the metric space of commutative diagrams of probability spaces endowed with the asymptotic entropy distance. Arrows in diagrams of probability spaces are (equivalence classes of) measure-preserving maps.

Arrow contraction and expansion take a commutative diagram of probability spaces as input, modify it, but preserve important properties of the diagram. The precise results are formulated as Theorems 3 and 4 in the main text. Their formulation requires language, notation, and definitions that we review in Section 2.

However, to give an idea of the results in this paper, we now present two examples. For basic terminology and notations used in these examples below, the reader unfamiliar with them is referred either to Section 2 of the present article or in the introductory material in the article [9].

1.1. Two Examples

1.1.1. Arrow Contraction and Expansion in a Two-Fan

Suppose we are given a fan

Z = (X \overset{}{\leftarrow} Z \overset{}{\to} Y)

, and we would like to complete it to a diamond

(1)

such that the entropy of V, denoted by

[V]

, equals the mutual information

[X : Y]

between X and Y, i.e., we would like to realize the mutual information between X and Y by a pair of reductions

X \overset{}{\to} V

and

Y \overset{}{\to} V

. This is not always possible, not even approximately. The Gacs-Körner Theorem [10] describes when such exact realization of mutual information is possible.

Arrow contraction instead produces another fan

Z^{'} = (X \overset{}{\leftarrow} Z^{'} \overset{}{\to} V)

, such that the reduction

Z^{'} \overset{}{\to} X

is an isomorphism and the relative entropy

[X | V]

of X given V equals

[X | Y]

. By collapsing this reduction, we obtain as a diagram just the reduction

X \overset{}{\to} V

. If necessary, we can keep the original spaces Z and Y in the modified diagram obtaining the “broken diamond” diagram

such that

[V] = [X : Y]

. Of course, no special technique is necessary to achieve this result since it is easy to find a reduction from a tropical space

[X]

to another tropical probability space with the prespecified entropy, as long as the Shannon inequalities are not violated.

However, a similar operation becomes non-trivial and in fact impossible without passing to the tropical limit, if instead of a single space X, there is a more complex sub-diagram as in the example in the next subsection.

To explain how arrow expansion works, we start with the chain of reductions

Z \overset{}{\to} X \overset{}{\to} V

. Can we extend it to a diamond, as in (1), so that

[X : Y | V] = 0

? This is again not possible, in general. However, if we pass to tropical diagrams, then such an extension always exists.

1.1.2. One More Example of Arrow Expansion and Contraction

Consider a diagram presented in Figure 1. Such a diagram is called a

Λ_{3}

-diagram. We would like to find a reduction

X \overset{}{\to} V

so that

[X | U] = [X | V]

. It is not possible to achieve this within the realm of diagrams of classical probability spaces. But once we pass to the tropical limit, the reduction

[X] \overset{}{\to} [V]

can be found by contracting and then collapsing the arrow

[Z] \overset{}{\to} [X]

, as shown in Figure 1.

Figure 1. Arrow contraction and expansion in a

Λ_{3}

-diagram. The fan

([X] \overset{}{\leftarrow} [Z] \overset{}{\to} [U])

(shown in red in the Figure) is admissible. Spaces

[Z_{1}]

,

[Z_{2}]

and

[Z]

belong to the co-ideal

⌊U⌋

. After the operation the part of the diagram shown in blue in the Figure is left unmodified.

Arrow contraction is closely related to the Shannon channel coding theorem. This is perhaps most obvious from the proof. Furthermore, arrow contraction has connections with rate regions, as introduced by Ahlswede and Körner, see [11,12]. These results by Ahlswede and Körner were applied by [13], resulting in a new non-Shannon information inequality. Moreover, in [13], a new proof was given of the results; this new proof is similar to the proof of the arrow contraction result in the present paper.

The main contribution of our work lies in the fact that we prove a much stronger preservation of properties of the diagram under arrow contraction.

2. Preliminaries

2.1. Probability Spaces and Their Diagrams

Our main objects of study will be commutative diagrams of probability spaces. A finite probability space X is a set with a probability measure on it, supported on a finite set. We denote by

| X |

the cardinality of the support of the measure. The statement

x \in X

means that point x is an atom with positive weight in X. For details see [1,9,14].

Examples of commutative diagrams of probability spaces are shown in Figure 2. The objects in such diagrams are finite probability spaces and morphisms are equivalence classes of measure-preserving maps. Two such maps are considered to be equivalent if they coincide on a set of full measurements. To record the combinatorial structure of a commutative diagram, i.e., the arrangement of spaces and morphisms, we use indexing categories, which are finite poset categories satisfying an additional property, which we describe below.

Figure 2. Examples of diagrams of probability spaces.

2.1.1. Indexing Categories

A poset category is a finite category such that there is at most one morphism between any two objects either way.

For a pair of objects

k, l

in a poset category

G = \{i; γ_{i j}\}

, such that there is a morphism

γ_{k l}

in

G

, we call k an ancestor of l and l a descendant of k. The set of all ancestors of an object k together with all the morphisms between them is itself a poset category and will be called a co-ideal generated by k and denoted by

⌊k⌋

. Co-ideals are also sometimes called filters. Similarly, a poset category consisting of all descendants of

k \in G

and morphisms between them will be called an ideal generated by k and denoted

⌈k⌉

.

An indexing category

G = \{i; γ_{i j}\}

used for indexing diagrams is a poset category satisfying the following additional property: for any pair of objects

i_{1}, i_{2} \in G

the the intersection of co-ideals is also a co-ideal generated by some object

i_{3} \in G

,

⌊i_{1}⌋ \cap ⌊i_{2}⌋ = ⌊i_{3}⌋

In other words, for any pair of objects

i_{1}, i_{2} \in G

there exists a least common ancestor

i_{3}

, i.e.,

i_{3}

is an ancestor to both

i_{1}

and

i_{2}

and any other common ancestor is also an ancestor of

i_{3}

. Any indexing category is initial, i.e., there is a (necessarily unique) initial object

\hat{ı}

in it, which is the ancestor of any other object in

G

, in other words

G = ⌈\hat{ı}⌉

.

A fan in a category is a pair of morphisms with the same domain. Such a diagram is also called a span in some literature on Category Theory. A fan

(i \overset{}{\leftarrow} k \overset{}{\to} j)

is called minimal, if for any other fan

(i \overset{}{\leftarrow} l \overset{}{\to} j)

included in a commutative diagram

the vertical morphism

(k \overset{}{\to} l)

must be an isomorphism. Any indexing category also satisfies the property that, for any pair of objects in it, there exists a unique minimal fan with target objects of the given ones.

This terminology will also be applied to diagrams of probability spaces indexed by

G

. Thus, given a space X in a

G

-diagram, we can talk about its ancestors, descendants, co-ideal

⌊X⌋

, and ideal

⌈X⌉

. We use square brackets to denote tropical diagrams and spaces in them. For the (co-)ideals in tropical diagrams, in order to unclutter notations, we will write

⌊X⌋ : = ⌊[X]⌋ and ⌈X⌉ : = ⌈[X]⌉

2.1.2. Diagrams

For an indexing category

G = \{i; γ_{i j}\}

and a category

Cat

, a commutative

G

-diagram

X = \{X_{i}; χ_{i j}\}

is a functor

X : G \overset{}{\to} Cat

. A diagram

X

is called minimal if it maps minimal fans in

G

to minimal fans in

Cat

.

A constant

G

-diagram denoted

X^{G}

is a diagram where all the objects equal to X, and all morphisms are identities.

Important examples of indexing categories are a two-fan, a diamond category, a full category

Λ_{n}

on n spaces, chains

C_{n}

. For detailed descriptions and more examples, the reader is referred to the articles cited at the beginning of this section.

2.2. Tropical Diagrams

2.2.1. Intrinsic Entropy Distance

For a fixed indexing category

G

, the space of commutative

G

-diagrams will be denoted by

Prob ⟨G⟩

. Evaluating entropy on every space in a

G

diagram gives a map

{Ent}_{*} : Prob ⟨G⟩ \overset{}{\to} R^{G}

where the target space

R^{G}

is the space of real-valued functions on objects of

G

. We endow this space with the

ℓ^{1}

-norm. For a fan

F = (X \overset{}{\leftarrow} Z \overset{}{\to} Y)

of

G

-diagrams we define the entropy distance between its terminal objects by

kd (F) : = {∥{Ent}_{*} Z - {Ent}_{*} X∥}_{1} + {∥{Ent}_{*} Z - {Ent}_{*} Y∥}_{1}

and the intrinsic entropy distance between two arbitrary

G

-diagrams by

k (X, Y) : = inf \{kd (F) : F = (X \overset{}{\leftarrow} Z \overset{}{\to} Y)\}

This intrinsic version of the entropy distance was introduced in [15,16]. The triangle inequality for

k

and various other properties are discussed in [1].

In the same article, a useful estimate for the intrinsic entropy distance called the Slicing Lemma is also proven. The following corollary ([1], Corollary 3.10(1)) of the Slicing Lemma will be used in the next section.

Proposition 1.

Let

G

be an indexing category,

X, Y \in Prob ⟨G⟩

and

U \in Prob

included in a pair of two fans

Then

k (X, Y) \leq \int_{U} k (X | u, Y | u) d p_{U} (u) + 2 \cdot [[G]] \cdot Ent (U)

2.2.2. Tropical Diagrams

Points in the asymptotic cone of

(Prob ⟨G⟩, k)

are called tropical

G

-diagrams and the space of all tropical

G

-diagrams, denoted

Prob [G]

, is endowed with the asymptotic entropy distance. We explain this now in more detail, and a more extensive description can be found in [14].

To describe points in

Prob [G]

we consider quasi-linear sequences

\bar{X} : = (X (n) : n \in N)

of

G

-diagrams. That is, we fix a “slowly growing” increasing function

φ : R_{\geq 0} \overset{}{\to} R

satisfying

t \cdot \int_{t}^{\infty} \frac{φ (t)}{t^{2}} d t \leq D_{φ} \cdot φ (t)

for some constant

D_{φ} > 0

and any

t > 1

. We call a sequence

\bar{X} : = (X (n) : n \in N)

φ

-quasi-linear if it satisfies the bound for all

m, n \in N

κ (X (n + m), X (n) \otimes X (m)) \leq C \cdot φ (n + m)

We have shown in [14] that the space

Prob [G]

does not depend on the choice of function

φ

as long as it is not zero. The space of all such sequences is endowed with the asymptotic entropy distance defined by

κ (\bar{X}, \bar{Y}) : = lim_{n \overset{}{\to} \infty} \frac{1}{n} k (X (n), Y (n))

A tropical diagram

[X]

is defined to be an equivalence class of such sequences, where two sequences

\bar{X}

and

\bar{Y}

are equivalent if

κ (\bar{X}, \bar{Y}) = 0

. The space

Prob [G]

carries the asymptotic entropy distance and has the structure of a

R_{\geq 0}

-semi-module—one can take linear combinations with non-negative coefficients of tropical diagrams. The linear entropy functional

{Ent}_{*} : Prob [G] \overset{}{\to} R^{G}

is defined by

{Ent}_{*} [X] : = lim_{n \overset{}{\to} \infty} \frac{1}{n} {Ent}_{*} X (n)

A detailed discussion about tropical diagrams can be found in [14]. In the cited article, we show that the space

Prob [G]

is metrically complete and isometrically isomorphic to a closed convex cone in some Banach space.

For

G = C_{k}

a chain category, containing k objects

\{1, \dots, k\}

and unique morphism

i \overset{}{\to} j

for every pair

i \geq j

, we have shown in [14] that the space

Prob [C_{k}]

is isomorphic to the following cone in

(R^{k}, {∥\cdot∥}_{1})

Prob [C_{k}] ≅ \{(\begin{matrix} x_{1} \\ ⋮ \\ x_{k} \end{matrix}) : 0 \leq x_{1} \leq \dots \leq x_{k}\}

The isomorphism is given by the entropy function. Thus, we can identify tropical probability spaces (elements in

Prob [C_{1}]

) with non-negative numbers via entropy. We will simply write

[X]

to mean the entropy of the space

[X]

. Along these lines, we also adopt the notations

[X | Y]

,

[X : Y]

and

[X : Y | Z]

for the conditional entropy and mutual information for the tropical spaces included in some diagrams.

2.3. Asymptotic Equipartition Property for Diagrams

2.3.1. Homogeneous Diagrams

A

G

-diagram

X

is called homogeneous if the automorphism group

Aut (X)

acts transitively on every space in

X

. Homogeneous probability spaces are uniform. For more complex indexing categories, this simple description is not sufficient.

2.3.2. Tropical Homogeneous Diagrams

The subcategory of all homogeneous

G

-diagrams will be denoted

Prob {⟨G⟩}_{h}

and we write

Prob {⟨G⟩}_{h, m}

for the category of minimal homogeneous

G

-diagrams. These spaces are invariant under the tensor product. Thus, they are metric Abelian monoids.

Passing to the tropical limit, we obtain spaces of tropical (minimal) homogeneous diagrams that we denote

Prob {[G]}_{h}

and

Prob {[G]}_{h, m}

.

2.3.3. Asymptotic Equipartition Property

In [1] the following theorem is proven

Theorem 1.

Suppose

X \in Prob ⟨G⟩

is a

G

-diagram of probability spaces for some fixed indexing category

G

. Then, there exists a sequence

\bar{H} = {(H_{n})}_{n = 0}^{\infty}

of homogeneous

G

-diagrams such that

\frac{1}{n} k (X^{n}, H_{n}) \leq C (| X_{0} |, [[G]]) \cdot \sqrt{\frac{{ln}^{3} n}{n}}

(2)

where

C (| X_{0} |, [[G]])

is a constant only depending on

| X_{0} |

and

[[G]]

.

The approximating sequence of homogeneous diagrams is evidently quasi-linear with the defect bounded by the admissible function

φ (t) : = 2 C (| X_{0} |, [[G]]) \cdot t^{3 / 4} \geq 2 C (| X_{0} |, [[G]]) \cdot t^{1 / 2} \cdot {ln}^{3 / 2} t

Thus, Theorem 1 above states that

L (Prob ⟨G⟩) \subset Prob {[G]}_{h}

. On the other hand, we have shown in [14] that the space of linear sequences

L (Prob ⟨G⟩)

is dense in

Prob [G]

. Combining the two statements, we obtain the following theorem.

Theorem 2.

For any indexing category

G

, the space

Prob {[G]}_{h}

is dense in

Prob [G]

. Similarly, the space

Prob {[G]}_{h, m}

is dense in

Prob {[G]}_{m}

.

It is possible that the spaces

Prob {[G]}_{h}

and

Prob [G]

coincide. At this time, we have neither a proof nor a counterexample to this conjecture.

2.4. Conditioning in Tropical Diagrams

For a tropical

G

-diagram

[X]

containing a space

[U]

we defined a conditioned diagram

[X | U]

. It can be understood as the tropical limit of the sequence

(X (n) | u_{n})

, where

(X (n))

is the homogeneous approximation of

[X]

,

U (n)

is the space in

X (n)

that corresponds to

[U]

under combinatorial isomorphism and

u_{n}

is any atom in

U (n)

.

We have shown in [9] that operation of conditioning is Lipschitz-continuous with respect to the asymptotic entropy distance.

3. Arrow Contraction

3.1. Arrow Collapse, Arrow Contraction, and Arrow Expansion

3.1.1. Prime Morphisms

A morphism

γ_{i j} : i \overset{}{\to} j

in an indexing category

G = \{i; γ_{i j}\}

will be called prime if it cannot be factored into a composition of two non-identity morphisms in

G

. A morphism in a

G

-diagram indexed by a prime morphism in

G

will also be called prime.

3.1.2. Arrow Collapse

Suppose

Z

is a

G

-diagram such that for some pair

i, j \in G

, the prime morphism

ζ_{i j} : Z_{i} \overset{}{\to} Z_{j}

is an isomorphism. Arrow collapse applied to

Z

results in a new diagram

Z^{'}

obtained from

Z

by identifying

Z_{i}

and

Z_{j}

via the isomorphism

ζ_{i j}

. The combinatorial type of

Z^{'}

is different from that of

Z

. The spaces

Z_{i}

and

Z_{j}

are replaced by a single space, and the new space will inherit all the morphisms in

Z

with targets and domains

Z_{i}

and

Z_{j}

.

3.1.3. Arrow Contraction and Expansion

Arrow contraction and expansion are two operations on tropical

G

-diagrams. Roughly speaking, arrow contraction applied to a tropical

G

-diagram

[Z]

results in another tropical

G

-diagram

[Z^{'}]

such that one of the arrows becomes an isomorphism, while some parts of the diagram are not modified. Arrow expansion is an inverse operation to arrow contraction.

3.1.4. Admissible and Reduced Sub-Fans

An admissible fan in a

G

-diagram

Z

is a minimal fan

X \overset{}{\leftarrow} Z \overset{}{\to} U

, such that Z is the initial space of

Z

and any space in

Z

belongs either to the co-ideal

⌈X⌉

or ideal

⌊U⌋

. For example, in the left-most diagram of Figure 1, the fan

X \overset{}{\leftarrow} Z \overset{}{\to} U

is admissible, while

X_{1} \overset{}{\leftarrow} Z_{1} \overset{}{\to} U

or

X \overset{}{\leftarrow} Z \overset{}{\to} Z_{2}

are not.

An admissible fan

X \overset{}{\leftarrow} Z \overset{}{\to} U

in a diagram will be called reduced if the morphism

Z \overset{}{\to} X

is an isomorphism.

3.2. The Contraction Theorem

Our aim is to prove the following theorem.

Theorem 3.

Let

([X] \overset{}{\leftarrow} [Z] \overset{}{\to} [U])

be an admissible fan in some tropical

G

-diagram

[Z]

. Then for every

ε > 0

there exists a

G

-diagram

[Z^{'}]

containing an admissible fan

([X^{'}] \overset{}{\leftarrow} [Z^{'}] \overset{}{\to} [U^{'}])

, corresponding to the original admissible fan through the combinatorial isomorphism, such that, with the notations

X = ⌈X⌉

and

X^{'} = ⌈X^{'}⌉

, the diagram

[Z^{'}]

satisfies

(i): $κ ([X^{'} | U^{'}], [X | U]) \leq ε$
(ii): $κ (X^{'}, X) \leq ε$
(iii): $[Z^{'} | X^{'}] \leq ε$

It is not clear that constructing diagrams

Z^{'}

as in the theorem above for a sequence of values of parameter

ε

decreasing to 0, we can obtain a convergent sequence in

Prob [G]

with the limiting diagram satisfying conclusions of the theorem with

ε = 0

. If

Prob [G]

were a locally compact space, which is an open question at the moment. The convergence would be guaranteed, and then

ε

in the theorem above could be replaced by 0.

The proof of Theorem 3 is based on the following proposition, which will be proven in Section 5.

Proposition 2.

Let

(X_{0} \overset{}{\leftarrow} Z_{0} \overset{}{\to} U)

be an admissible fan in some homogeneous

G

-diagram of probability spaces

Z

. Then there exists a

G

-diagram

Z^{'}

containing the admissible fan

(X_{0}^{'} \overset{}{\leftarrow} Z_{0}^{'} \overset{}{\to} U^{'})

such that, with the notations

X : = ⌈X_{0}⌉

and

X^{'} : = ⌈X_{0}^{'}⌉

, it holds that

(1): $X | u = X^{'} | u^{'}$ for any $u \in U$ and $u^{'} \in U^{'}$ .
(2): $κ (X, X^{'}) \leq k (X, X^{'}) \leq 20 \cdot [[G]]$
(3): $[Z_{0}^{'} | X_{0}^{'}] \leq 4 ln ln | X_{0} |$

Proof of Theorem 3.

First, we assume that

[Z]

is a homogeneous tropical diagram. It means that it can be represented by a quasi-linear sequence

{(Z (n))}_{n \in N_{0}}

of homogeneous diagrams, with defect of the sequence bounded by the function

φ (t) : = C \cdot t^{3 / 4}

for some

C \geq 0

. This means that for any

m, n \in N

\begin{matrix} κ (Z (m) \otimes Z (n), Z (m + n)) & \leq φ (m + n) \\ κ (Z^{m} (n), Z (m \cdot n)) & \leq D_{φ} \cdot m \cdot φ (n) \end{matrix}

where

D_{φ}

is some constant depending on

φ

, see [14].

Fix a number

n \in N

and apply Proposition 2 to the homogeneous diagram

Z (n)

, containing the admissible fan

X_{0} (n) \overset{}{\leftarrow} Z_{0} (n) \overset{}{\to} U (n)

and sub-diagram

X (n) = ⌈X_{0} (n)⌉

. As a result, we obtain a diagram

Z^{″}

containing the fan

X_{0}^{″} \overset{}{\leftarrow} Z_{0}^{″} \overset{}{\to} U^{″}

and the sub-diagram

X^{″} = ⌈X_{0}^{″}⌉

, such that

\begin{matrix} X^{″} | u^{″} = X (n) | u & for any u^{″} \in U^{″} and u \in U (n) \\ κ (X^{″}, X (n)) \leq 20 [[G]] \\ [Z_{0}^{″} | X_{0}^{″}] \leq 4 ln ln | X_{0} (n) | \end{matrix}

(3)

Recall that for a diagram

A

of probability spaces, we denote by

\vec{A}

the tropical diagram represented by the linear sequence

(A^{k} : k \in N_{0})

. As an element of a closed convex cone

Prob [G]

, it can be scaled by an arbitrary non-negative real number; see, for instance, Section 2.3.5 in [14]. For example,

\frac{1}{n} \vec{A}

is represented by the sequence

(A^{⌊\frac{k}{n}⌋} : k \in N_{0})

.

Define the two tropical diagrams

\begin{matrix} [Z^{'}] & : = \frac{1}{n} \vec{Z^{″}} \\ [\tilde{Z}] & : = \frac{1}{n} \vec{Z (n)} \end{matrix}

Since

X^{″} | u^{″}

does not depend on

u^{″}

and

X (n) | u

does not depend on u we have

[X^{'} | U^{'}] = (1 / n) \cdot \vec{(X^{″} | u^{″})}

and

[\tilde{X} | \tilde{U}] = (1 / n) \cdot \vec{(X (n) | u)}

. From (3), we obtain

\begin{matrix} [X^{'} | U^{'}] = [\tilde{X} | \tilde{U}] \\ κ ([X^{'}], [\tilde{X}]) \leq \frac{20 [[G]]}{n} \\ [Z_{0}^{'} | X_{0}^{'}] \leq \frac{4 ln ln | X_{0} (n) |}{n} \end{matrix}

(4)

The distance between

[\tilde{Z}]

and

[Z]

can be bounded as follows

\begin{matrix} κ ([\tilde{Z}], [Z]) & = \frac{1}{n} κ (\vec{Z (n)}, n \cdot [Z]) = \frac{1}{n} lim_{m \overset{}{\to} \infty} \frac{1}{m} κ (Z^{m} (n), Z (m \cdot n)) \\ \leq \frac{1}{n} D_{φ} \cdot φ (n) \end{matrix}

(5)

This also implies

κ ([\tilde{X}], [X]) \leq \frac{1}{n} D_{φ} \cdot φ (n)

(6)

Since conditioning is a Lipschitz-continuous operation with Lipschitz constant 2, we also have

κ ([\tilde{X} | \tilde{U}], [X | U]) \leq \frac{2}{n} D_{φ} \cdot φ (n)

(7)

Combining the estimates in (4)–(7) we obtain

\begin{matrix} κ ([X^{'} | U^{'}], [X | U]) \leq 2 D_{φ} \cdot \frac{φ (n)}{n} \\ κ ([X^{'}], [X]) \leq \frac{20 [[G]]}{n} + D_{φ} \frac{φ (n)}{n} \\ [Z_{0}^{'} | X_{0}^{'}] \leq \frac{4 ln ln | X_{0} (n) |}{n} + 2 D_{φ} \frac{φ (n)}{n} \end{matrix}

Please note that

| X_{0} (n) |

grows at most exponentially (it is bounded by

e^{n ([X_{0}] + C)}

for some C) and

φ

is a strictly sub-linear function. Thus, by choosing sufficiently large n depending on the given

ε > 0

, we obtain

[Z^{'}]

, satisfying conclusions of the theorem for homogeneous

[Z]

.

To prove the theorem in full generality, observe that all the quantities on the right-hand side of the inequalities are Lipschitz-continuous. Since

Prob {[G]}_{h}

is dense in

Prob [G]

the theorem extends to any

[Z]

by first approximating it with any precision by a homogeneous configuration and applying the argument above. □

3.3. The Expansion Theorem

The following theorem is complementary to Theorem 3. The expansion applied to a diagram containing a reduced admissible fan produces a diagram with an admissible fan, such that the contraction of it is the original diagram. Thus, arrow expansion is a right inverse of the arrow contraction operation.

In general, contraction erases some information stored in the diagram, so there are many right inverses. We prove the theorem below by providing a simple construction of one such right inverse.

Theorem 4.

Let

([X] \overset{}{\leftarrow} [Z^{'}] \overset{}{\to} [U^{'}])

be a reduced admissible fan in some tropical

G

-diagram

[Z^{'}]

and

λ > 0

. Let

[X] : = ⌈X⌉

. Then there exists a

G

-diagram

[Z]

containing the copy of

[X]

, such that the corresponding admissible fan

([X] \overset{}{\leftarrow} [Z] \overset{}{\to} [U])

has

[Z | X] = λ

and

[X | U] = [X | U^{'}]

.

Proof.

Let

[W]

be a tropical probability space with entropy equal to

λ

. For any reduction of tropical spaces

[A] \overset{}{\to} [B]

, there are natural reductions

\begin{matrix} ([A] + [W]) & \overset{}{\to} ([B] + [W]) \\ ([A] + [W]) & \overset{}{\to} [W] \end{matrix}

We construct the diagram

[Z]

by replacing every space

[V]

in the ideal

⌊U^{'}⌋

with

[U] + [W]

. Every morphism

[V_{1}] \overset{}{\to} [V_{2}]

within

⌊U^{'}⌋

is replaced by

([V_{1}] + [W]) \overset{}{\to} ([V_{2}] + [W])

And any morphism from

[V]

in

⌊U^{'}⌋

to a space

[Y]

in

⌈X⌉

is replaced by a composition

([V] + [W]) \overset{}{\to} [V] \overset{}{\to} [Y]

Clearly, the resulting diagram satisfies the conclusion of the theorem. □

The rest of the article is devoted to the development of the necessary tools and the proof of Proposition 2.

4. Local Estimate

In this section, we derive a bound, very similar to Fano’s inequality, on the intrinsic entropic distance between two diagrams of probability spaces with the same underlying diagram of sets. The bound will be in terms of the total variation distance between two distributions corresponding to the diagrams of probability spaces. It will be used in the next section to prove Proposition 2.

4.1. Distributions

4.1.1. Distributions on Sets

For a finite set S we denote by

Δ S

the collection of all probability distributions on S and by

∥ π_{1} - π_{2} ∥_{1}

we denote the total variation distance between

π_{1}, π_{2} \in Δ S

.

4.1.2. Distributions on Diagrams of Sets

Let

Set

denote the category of finite sets and surjective maps. For an indexing category

G

, we denote by

Set ⟨G⟩

the category of

G

-diagrams in

Set

. That is, objects in

Set ⟨G⟩

are commutative diagrams of sets indexed by the category

G

, the spaces in such a diagram are finite sets, and arrows represent surjective maps, subject to commutativity relations.

For a diagram of sets

S = \{S_{i}; σ_{i j}\}

we define the space of distributions on the diagram

S

by

Δ S : = \{(π_{i}) \in \prod_{i} Δ S_{i} : {(σ_{i j})}_{*} π_{i} = π_{j}\}

where

f_{*} : Δ S \overset{}{\to} Δ S^{'}

is the affine map induced by a surjective map

f : S \overset{}{\to} S^{'}

. If

S_{0}

is the initial space of

S

, then there is an isomorphism

\begin{matrix} Δ S_{0} & \overset{≅}{\leftrightarrow} Δ S \\ Δ S_{0} ∋ π_{0} & \mapsto \{{(σ_{0 i})}_{*} π_{0}\} \in Δ S \\ Δ S_{0} ∋ π_{0} & ↤ \{π_{i}\} \in Δ \end{matrix}

(8)

Using the isomorphism (8) we define total variation distance between two distributions

π, π^{'} \in Δ S

as

{∥π - π^{'}∥}_{1} : = {∥π_{0} - π_{0}^{'}∥}_{1}

Given a

G

-diagram of sets

S = \{S_{i}; σ_{i j}\}

and an element

π \in Δ S

we can construct a

G

-diagram of probability spaces

(S, π) : = \{(S_{i}, π_{i}); σ_{i j}\}

.

Below, we give the estimate of the entropy distance between two

G

-diagrams of probability spaces

(S, π)

and

(S, π^{'})

in terms of the total variation distance

∥π - π^{'}∥

between distributions.

4.2. The Estimate

The upper bound on the entropy distance, which we derive below, has two summands. One is linear in the total variation distance with the slope proportional to the log-cardinality of

S_{0}

. The second one is super-linear in the total variation distance, but it does not depend on

S

. So, we have the following interesting observation: of course, the super-linear summand always dominates the linear one locally. However, as the cardinality of

S

becomes large, it is the linear summand that starts playing the main role. This will be the case when we apply the bound in the next section.

For

α \in [0, 1]

consider a binary probability space with the weight of one of the atoms equal to

α

B_{α} : = (\{□, ■\}; p (□) = 1 - α, p (■) = α)

Proposition 3.

For an indexing category

G

, consider a

G

-diagram of sets

S = \{S_{i}, σ_{i j}\} \in Set ⟨G⟩

. Let

π, π^{'} \in Δ S

be two probability distributions on

S

. Denote

X : = (S, π)

,

Y : = (S, π^{'})

and

α : = \frac{1}{2} {∥π - π^{'}∥}_{1}

. Then

k (X, Y) \leq 2 [[G]] (α \cdot ln | S_{0} | + Ent (B_{α}))

Proof.

To prove the local estimate, we decompose both

π

and

π^{'}

into a convex combination of a common part

\hat{π}

and rests

π^{+}

and

π^{' +}

. The coupling between the common parts gives no contribution to the distance and the worst possible estimate on the other parts is still enough to obtain the bound in the lemma, using Proposition 1.

Let

S_{0}

be the initial set in the diagram

S

. We will need the following obvious rough estimate of the entropy distance that holds for any

π, π^{'} \in Δ S

:

k (X, Y) \leq 2 [[G]] \cdot ln | S_{0} |

(9)

It can be obtained by taking a tensor product for the coupling between

X

and

Y

.

Our goal now is to write

π

and

π^{'}

as the convex combination of three other distributions

\hat{π}

,

π^{+}

and

π^{' +}

as in

\begin{matrix} π & = (1 - α) \cdot \hat{π} + α \cdot π^{+} \\ π^{'} & = (1 - α) \cdot \hat{π} + α \cdot π^{' +} \end{matrix}

with the smallest possible

α \in [0, 1]

.

We could do it the following way. Let

π_{0}

and

π_{0}^{'}

be the distributions on

S_{0}

that correspond to

π

and

π^{'}

under isomorphisms (8). Let

α : = \frac{1}{2} {∥π - π^{'}∥}_{1}

. If

α = 1

then the proposition follows from the rough estimate (9), so from now on, we assume that

α < 1

. Define three probability distributions

{\hat{π}}_{0}

,

π_{0}^{+}

and

π_{0}^{' +}

on

S_{0}

by setting for every

x \in S_{0}

\begin{matrix} {\hat{π}}_{0} (x) & : = \frac{1}{1 - α} min \{π_{0} (x), π_{0}^{'} (x)\} \\ π_{0}^{+} & : = \frac{1}{α} (π_{0} - (1 - α) {\hat{π}}_{0}) \\ π_{0}^{' +} & : = \frac{1}{α} (π_{0}^{'} - (1 - α) {\hat{π}}_{0}) \end{matrix}

Denote by

\hat{π}, π^{+}, π^{' +} \in Δ S

the distributions corresponding to

{\hat{π}}_{0}, π_{0}^{+}, π_{0}^{' +} \in Δ S_{0}

under isomorphism (8). Thus, we have

\begin{matrix} π & = (1 - α) \hat{π} + α \cdot π^{+} \\ π^{'} & = (1 - α) \hat{π} + α \cdot π^{' +} \end{matrix}

Now, we construct two fans of

G

-diagrams

(10)

by setting

\begin{matrix} {\tilde{X}}_{i} & : = (S_{i} \times \underset{̲}{B}_{α}; {\tilde{π}}_{i} (s, □) = (1 - α) {\hat{π}}_{i} (s), {\tilde{π}}_{i} (s, ■) = α \cdot π_{i}^{+} (s)) \\ {\tilde{Y}}_{i} & : = (S_{i} \times \underset{̲}{B}_{α}; {\tilde{π}}_{i}^{'} (s, □) = (1 - α) {\hat{π}}_{i} (s), {\tilde{π}}_{i}^{'} (s, ■) = α \cdot π_{i}^{' +} (s)) \end{matrix}

and

\begin{matrix} \tilde{X} & : = \{{\tilde{X}}_{i}; σ_{i j} \times id\} \\ \tilde{Y} & : = \{{\tilde{Y}}_{i}; σ_{i j} \times id\} \end{matrix}

The reduction in the fans in (10) is given by coordinate projections. Note that the following isomorphisms hold

\begin{matrix} X | □ & ≅ (S, \hat{π}) \\ X | ■ & ≅ (S, π^{+}) \\ Y | □ & ≅ (S, \hat{π}) ≅ X | □ \\ Y | ■ & ≅ (S, π^{' +}) \end{matrix}

Now we apply Proposition 1 along with the rough estimate in (9) to obtain the desired inequality

\begin{matrix} k (X, Y) & \leq (1 - α) k (X | □, Y | □) + α \cdot k (X | ■, Y | ■) \\ + \sum_{i} [Ent (B_{α} | X_{i}) + Ent (B_{α} | Y_{i})] \\ \leq 2 [[G]] (α \cdot ln | S_{0} | + Ent (B_{α})) \end{matrix}

□

5. Proof of Proposition 2

In this section, we prove Proposition 2, which is shown below verbatim. The proof consists of the construction in Section 5.1 and estimates in Propositions 5 and 6.

5.1. The Construction

In this section, we fix an indexing category

G

, a minimal

G

-diagram of probability spaces

Z

with an admissible sub-fan

X_{0} \overset{}{\leftarrow} Z_{0} \overset{}{\to} U

. We denote

X : = ⌈X_{0}⌉

and by

H

we denote the combinatorial type of

X = \{X_{i}; χ_{i j}\}

.

Instead of diagram

Z

, we consider an extended diagram, which is a two-fan of

H

-diagrams

(11)

where

Y = \{Y_{i}; υ_{i j}\}

consists of those spaces in

Z

, which are initial spaces of two fans with feet in U and in some space in

X

. That is for every

i \in H

the space

Y_{i}

is defined to be the initial space in the minimal fan

X_{i} \overset{}{\leftarrow} Y_{i} \overset{}{\to} U

in

Z

. It may happen that for some pair of indices

i_{1}, i_{2} \in H

the initial spaces of the fans with one feet U and the other

X_{i_{1}}

and

X_{i_{2}}

coincide in

Z

. In

Y

, however, they will be treated as separate spaces so that the combinatorial type of

Y

is

H

. Starting with the diagram in (11) one can recover

Z

by collapsing all the isomorphism arrows. The initial space of

Y

will be denoted

Y_{0}

.

We would like to construct a new fan

X^{'} \overset{π_{1}^{'}}{\leftarrow} Y^{'} \overset{}{\to} V^{H}

, such that

\{\begin{matrix} X | u = X^{'} | v & for any u \in U and v \in V \\ k (X^{'}, X) \leq 20 [[G]] \\ [Y_{0}^{'} | X_{0}^{'}] \leq 4 ln ln | X_{0} | \end{matrix}

(12)

Once this goal is achieved, we collapse all the isomorphisms to obtain

G

-diagram satisfying conditions in the conclusion of Proposition 2.

We start with a general description of the idea behind the construction, followed by a detailed argument. To introduce the new space V we take its points to be N atoms in

u_{1}, \dots, u_{N} \in U

. Ideally, we would like to choose the atoms in such a way that

X_{0} | u_{n}

are disjoint and cover the whole of

X_{0}

. It is not always possible to achieve this exactly. However, when

| X_{0} |

is large, N is taken slightly larger than

e^{[X_{0} : U]}

, and

u_{1}, \dots, u_{N}

are chosen at random, then with high probability the spaces

X_{0} | u_{n}

will overlap only little and will cover most of

X_{0}

. The details of the construction follow.

We fix

N \in N

and construct several new diagrams. For each of the new diagrams, we provide a verbal and formal description.

The space $U^{N}$ . Points in it are independent samples of length N of points in U.
The space $V_{N} = (\{1, \dots, N\}, unif)$ . A point $n \in V_{N}$ should be interpreted as a choice of index in a sample $\bar{u} \in U^{N}$ .
The $H$ -diagram $A$ , where

$\begin{matrix} A & = \{A_{i}; α_{i j}\} \\ A_{i} & = (\{(x, n, \bar{u}) : x \in X_{i} | u_{n}\}, unif) \\ α_{i j} & = (χ_{i j}, Id, Id) \end{matrix}$

A point $(x, n, \bar{u})$ in $A_{i}$ corresponds to the choice of a sample $\bar{u} \in U^{N}$ , an independent choice of a member of the sample $u_{n}$ and a point $x \in X_{i} | u_{n}$ . Recall that the original diagram $Z$ was assumed to be homogeneous and, in particular, the distribution on $X_{i} | u_{n}$ is uniform. Due to the assumption on homogeneity of $Z$ , the space $X_{i} | u$ does not depend on $u \in U$ . Since $V_{N}$ is also equipped with the uniform distribution, it follows that the distribution on $A_{i}$ will also be uniform.
The $H$ -diagram $B$ , where

$\begin{matrix} B & = \{B_{i}; β_{i j}\} \\ B_{i} & = (\{(x, \bar{u}) : x \in ⋃_{n = 1}^{N} X_{i} | u_{n}\}, p_{B_{i}}) \\ β_{i j} & = (χ_{i j}, Id) \end{matrix}$

A point $(x, \bar{u}) \in B_{i}$ is the choice of a sample $\bar{u} \in U^{N}$ and a point x in one of the fibers $X_{i} | u_{n}$ , $n = 1, \dots, N$ . The distribution $p_{B_{i}}$ on $B_{i}$ is chosen so that the natural projection $A_{i} \overset{}{\to} B_{i}$ is the reduction of probability spaces. Given a sample $\bar{u}$ , if the fibers $X_{i} | u_{n}$ are not disjoint, then the distribution on $B_{i} | \bar{u}$ need not to be uniform. Below, we will give an explicit description of $p_{B}$ and study the dependence of $p_{B} (\cdot | \bar{u})$ on the sample $\bar{u} \in U^{N}$ .

These diagrams can be organized into a minimal diamond diagram of

H

-diagrams, where reductions are obvious projections.

(13)

To describe the probability distribution on

B

, first we define several relevant quantities:

\begin{matrix} ρ & : = \frac{| X_{0} | u |}{| X_{0} |} = e^{- [X_{0} : U]} \\ N (x, \bar{u}) & : = | \{n \in V_{N} : x \in X_{0} | u_{n}\} | \\ ν (x, \bar{u}) & : = \frac{N (x, \bar{u})}{N} = p_{V_{N}} \{n \in V_{N} : x \in X_{0} | u_{n}\} \end{matrix}

(14)

Recall that the distribution

p_{B}

is completely determined by the distribution

p_{B_{0}}

on the initial space of

B

via isomorphism (8). From homogeneity of

Z

it follows that distributions on both

A_{0}

and

{A |}_{\bar{u}}

are uniform. Therefore

p_{B_{0}} (x | \bar{u}) : = \frac{ν (x, \bar{u})}{ρ \cdot | X_{0} |}

(15)

The desired fan

(X^{'} \overset{}{\leftarrow} Y^{'} \overset{}{\to} V^{H})

mentioned in the beginning of the section is obtained from the top fan in the diagram in (13) by conditioning on

\bar{u} \in U^{N}

. We will show later that for an appropriate choice of N and for most choices of

\bar{u}

, the fan we obtain in this way has the required properties.

First, we would like to make the following observations. Fix an arbitrary

\bar{u} \in U^{N}

. Then:

(1): The underlying set of the probability space $B_{0} | \bar{u} = X_{0} | \bar{u}$ is $\underset{̲}{X}_{0}$ .
(2): The diagrams

$\begin{matrix} Y_{\bar{u}}^{'} & : = A | \bar{u} \\ X_{\bar{u}}^{'} & : = B | \bar{u} \end{matrix}$

are included in a two-fan of $H$ -diagrams

which is obtained by conditioning the top fan in the diagram in (13).
The very important observation is that diagrams $X_{\bar{u}}^{'} | n$ and $X | u$ are isomorphic for any choice of $n \in V_{N}$ and $u \in U$ . The isomorphism is the composition of the following sequence of isomorphisms

$X_{\bar{u}}^{'} | n \overset{}{\to} B | (\bar{u}, n) \overset{}{\to} A | (\bar{u}, n) \overset{}{\to} X | u_{n} \overset{}{\to} X | u$

where the first isomorphism follows from the definition of $X_{\bar{u}}^{'}$ , the second—from minimality of the fan $B \overset{}{\leftarrow} A \overset{}{\to} V_{N}$ , the third—from the definition of $A$ and the fourth—from the homogeneity of $Z$ .

5.2. The Estimates

We now claim and prove that one could choose a number N and

\bar{u}

in

U^{N}

such that

(1): $k (X_{\bar{u}}^{'}, X) \leq 20 [[H]]$ .
(2): $[Y_{\bar{u}, 0}^{'} | X_{\bar{u}, 0}^{'}] \leq 4 ln ln | X_{0} |$ , where $Y_{\bar{u}, 0}^{'}$ and $X_{\bar{u}, 0}^{'}$ are initial spaces in $X_{\bar{u}}^{'}$ and $Y_{\bar{u}}^{'}$ , respectively.

5.2.1. Total Variation and Entropic Distance Estimates

If we fix some

x_{0} \in X_{0}

, then

ν = ν (x_{0}, \cdot)

is a scaled binomially distributed random variable with parameters N and

ρ

, which means that

N \cdot ν \sim Bin (N, ρ)

.

First, we state the following bounds on the tails of a binomial distribution.

Lemma 1.

Let ν be a scaled binomial random variable with parameters N and ρ, then

(i): for any $t \in [0, 1]$ holds

$P \{| ν - ρ | > ρ \cdot t\} \leq 2 \cdot e^{- \frac{1}{3} \cdot N \cdot ρ \cdot t^{2}}$
(ii): for any $t \in [0, 2]$ holds

$P \{\frac{ν}{ρ} ln \frac{ν}{ρ} > t\} \leq e^{- \frac{1}{12} \cdot N \cdot ρ \cdot t^{2}}$

The proof of Lemma 1 can be found at the end of this section.

Below we use the notation

P : = p_{U^{N}}

for the probability distribution on

U^{N}

. For a pair of complete diagrams

C

,

C^{'}

with the same underlying diagram of sets and with initial spaces

C_{0}

,

C_{0}^{'}

, we will write

α (C, C^{'})

for the halved total variation distance between their distributions

α (C, C^{'}) : = \frac{1}{2} {∥p_{C_{0}} - p_{C_{0}^{'}}∥}_{1}

Proposition 4.

In the settings above, for

t \in [0, 1]

, the following inequality holds

P \{\bar{u} \in U^{N} : 2 α (X_{\bar{u}}^{'}, X) > t\} \leq 2 | X_{0} | \cdot e^{- \frac{1}{3} N \cdot ρ \cdot t^{2}}

Proof.

Recall that by definition

X_{\bar{u}}^{'} = B | \bar{u}

. We use Equation (15) to expand the left-hand side of the inequality as follows

\begin{array}{l} P \{\bar{u} \in U^{N} : 2 α (B | \bar{u}, X) > t\} = P \{\bar{u} \in U^{N} : \sum_{x \in X_{0}} |\frac{ν (x, \bar{u})}{ρ \cdot | X_{0} |} - \frac{1}{| X_{0} |}| > t\} \\ = P \{\bar{u} \in U^{N} : \sum_{x \in X_{0}} |ν (x, \bar{u}) - ρ| > ρ \cdot | X_{0} | \cdot t\} \\ \leq P \{\bar{u} \in U^{N} : there exists x_{0} such that |ν (x_{0}, \bar{u}) - ρ| > ρ \cdot t\} \\ \leq \sum_{x \in X_{0}} P \{\bar{u} \in U^{N} : |ν (x, \bar{u}) - ρ| > ρ \cdot t\} \end{array}

Since by homogeneity of the original diagram, all the summands are the same, we can fix some

x_{0} \in X_{0}

and estimate further:

\begin{matrix} P \{\bar{u} \in U^{N} : 2 α (B | \bar{u}, X) > t\} \leq | X_{0} | \cdot P \{\bar{u} \in U^{N} : |ν (x_{0}, \bar{u}) - ρ| > ρ \cdot t\} \end{matrix}

Applying Lemma 1(i), we obtain the required inequality. □

In the propositions below we assume that

| X_{0} |

is sufficiently large (larger than

e^{20}

).

Proposition 5.

In the settings above and for any

\frac{10}{ln | X_{0} |} \leq t \leq 1

holds:

P \{\bar{u} \in U^{N} : k (X_{\bar{u}}^{'}, X) > t (2 \cdot [[G]] \cdot ln | X_{0} |)\} \leq 2 | X_{0} | \cdot e^{- \frac{1}{3} N \cdot ρ \cdot t^{2}}

Proof.

We will use local estimate, Proposition 3, to bound the entropy distance and then apply Proposition 4. To simplify notations, we will write simply

α

for

α (X_{\bar{u}}^{'}, X) = α (B | \bar{u}, X)

.

\begin{matrix} P \{\bar{u} \in U^{N} : k (B | \bar{u}, X) > (2 \cdot [[G]] \cdot ln | X_{0} |) t\} \\ \leq P \{\bar{u} \in U^{N} : 2 \cdot [[G]] (α \cdot ln | X_{0} | + Ent (Λ_{α})) > (2 \cdot [[G]] \cdot ln | X_{0} |) t\} \\ \leq P \{\bar{u} \in U^{N} : α + Ent (Λ_{α}) / ln | X_{0} | > t\} \end{matrix}

Please note that in the chosen regime,

t \geq 10 / ln | X_{0} |

, the first summand in the right-hand side of the inequality is larger than the second, i.e.,

α \geq Ent (Λ_{α}) / ln | X_{0} |

and therefore we can write

\begin{matrix} P \{\bar{u} \in U^{N} : k (B | \bar{u}, X) > (2 \cdot [[G]] \cdot ln | X_{0} |) t\} \\ \leq P \{\bar{u} \in U^{N} : 2 α > t\} \\ \leq 2 | X_{0} | \cdot e^{- \frac{1}{3} N \cdot ρ \cdot t^{2}} \end{matrix}

□

5.2.2. The “Height” Estimate

Recall that for given

N \in N

and

\bar{u} \in U^{N}

we have constructed a two-fan of

H

-diagrams

X_{\bar{u}}^{'} \overset{}{\leftarrow} Y_{\bar{u}}^{'} \overset{}{\to} V_{N}^{H}

We will now estimate the length of the arrow

Y_{\bar{u}, 0}^{'} \overset{}{\to} X_{\bar{u}, 0}^{'}

.

Proposition 6.

In the settings above and for

t \in [0, 2]

P \{\bar{u} \in U^{N} : [Y_{\bar{u}, 0}^{'} | X_{\bar{u}, 0}^{'}] > ln (N \cdot ρ) + t\} \leq | X_{0} | \cdot e^{- \frac{1}{12} N \cdot ρ \cdot t^{2}}

Proof.

First, we observe that the fiber of the reduction

Y_{\bar{u}, 0}^{'} \overset{}{\to} X_{\bar{u}, 0}^{'}

over a point

x \in X_{\bar{u}, 0}^{'}

is a homogeneous probability space of cardinality equal to

N (x, \bar{u})

, therefore its entropy is

ln N (x, \bar{u})

.

\begin{matrix} P \{\bar{u} \in U^{N} : [Y_{\bar{u}, 0}^{'} | X_{\bar{u}, 0}^{'}] > ln (N \cdot ρ) + t\} \\ P \{\bar{u} \in U^{N} : \int_{X_{\bar{u}, 0}^{'}} [Y_{\bar{u}, 0}^{'} | x] d p_{X_{\bar{u}, 0}^{'}} (x) > ln (N \cdot ρ) + t\} \\ = P \{\bar{u} \in U^{N} : \sum_{x \in X_{0}} \frac{ν (x, \bar{u})}{ρ \cdot | X_{0} |} ln (N \cdot ν (x, \bar{u})) > ln (N \cdot ρ) + t\} \\ \leq P \{\bar{u} \in U^{N} : \sum_{x \in X_{0}} \frac{ν (x, \bar{u})}{ρ \cdot | X_{0} |} ln (\frac{ν (x, \bar{u})}{ρ}) > t\} \\ \leq | X_{0} | \cdot P \{\bar{u} \in U^{N} : \frac{ν (x_{0}, \bar{u})}{ρ} ln (\frac{ν (x_{0}, \bar{u})}{ρ}) > t\} \\ \leq | X_{0} | \cdot e^{- \frac{1}{12} N \cdot ρ \cdot t^{2}} \end{matrix}

The last inequality above follows from Lemma 1 (ii). □

5.3. Proof of Proposition 2

Let

X_{\bar{u}}^{'} \overset{}{\leftarrow} Y_{\bar{u}}^{'} \overset{}{\to} V_{N}

be the fan constructed in Section 5.1. The construction is parameterized by number N and atom

\bar{u} \in U^{N}

. Below, we will choose a particular value for N and apply estimates in Propositions 5 and 6 with particular choice of parameter t to show that there is

\bar{u} \in U^{N}

, so that the fan satisfies the conclusions of Proposition 2.

Let

\begin{array}{l} N : = {ln}^{3} | X_{0} | \cdot ρ^{- 1} = {ln}^{3} | X_{0} | \cdot e^{[X_{0} : U]} \\ t : = \frac{10}{ln | X_{0} |} \end{array}

With these choices of N and t, Proposition 5 implies

P \{\bar{u} \in U^{N} : k (X_{\bar{u}}^{'}, X) > 20 [[G]]\} \leq \frac{1}{4}

while Proposition 6 gives

P \{\bar{u} \in U^{N} : [Y_{\bar{u}, 0}^{'} | X_{\bar{u}, 0}^{'}] > 4 ln ln | X_{0} |\} \leq \frac{1}{4}

Therefore, there is a choice of

\bar{u}

such that the fan

(X^{'} \overset{}{\leftarrow} Y^{'} \overset{}{\to} V) : = (X_{\bar{u}, 0}^{'} \overset{}{\leftarrow} Y_{\bar{u}, 0}^{'} \overset{}{\to} V_{N})

satisfies conditions in (12). As we have explained at the beginning of Section 5.1, by collapsing isomorphism arrows, we obtain

G

-diagram

Z^{'}

satisfying conclusions of Proposition 2.

5.4. Proof of Lemma 1

The Chernoff bound for the tail of a binomially distributed random variable

X \sim Bin (N, ρ)

asserts that for any

0 \leq δ \leq 1

holds

\begin{matrix} P \{X < (1 - δ) N \cdot ρ\} \leq e^{- \frac{1}{2} δ^{2} N \cdot ρ} \\ P \{X > (1 + δ) N \cdot ρ\} \leq e^{- \frac{1}{3} δ^{2} N \cdot ρ} \end{matrix}

Applying the bound for the upper and lower tail for the binomially distributed random variable

N \cdot ν

, we obtain the inequality in (i).

The second assertion follows from the following estimate

\begin{matrix} P \{\frac{ν}{ρ} ln \frac{ν}{ρ} > t\} & \leq P \{\frac{ν}{ρ} (\frac{ν}{ρ} - 1) > t\} \\ = P \{ν > ρ \cdot (\frac{\sqrt{1 + 4 t} - 1}{2} + 1)\} \end{matrix}

For

0 \leq t \leq 2

we have

\sqrt{1 + 4 t} - 1 \geq t

, therefore

P \{\frac{ν}{ρ} ln \frac{ν}{ρ} > t\} \leq P \{ν > ρ \cdot (\frac{t}{2} + 1)\}

By the Chernoff bound, we have

P \{\frac{ν}{ρ} ln \frac{ν}{ρ} > t\} \leq e^{- \frac{1}{12} N \cdot ρ \cdot t^{2}}

Author Contributions

Investigation, R.M. and J.W.P. All authors have read and agreed to the published version of the manuscript.

Funding

Open Access funding was provided by the Max Planck Society.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Matveev, R.; Portegies, J.W. Asymptotic dependency structure of multiple signals. Inf. Geom. 2018, 1, 237–285. [Google Scholar] [CrossRef]
Ay, N.; Bertschinger, N.; Der, R.; Güttler, F.; Olbrich, E. Predictive information and explorative behavior of autonomous robots. Eur. Phys. J. B 2008, 63, 329–339. [Google Scholar] [CrossRef]
Friston, K. The free-energy principle: A rough guide to the brain? Trends Cogn. Sci. 2009, 13, 293–301. [Google Scholar] [CrossRef] [PubMed]
Van Dijk, S.G.; Polani, D. Informational constraints-driven organization in goal-directed behavior. Adv. Complex Syst. 2013, 16, 1350016. [Google Scholar] [CrossRef]
Kingma, D.P.; Welling, M. Auto-encoding variational Bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
Bertschinger, N.; Rauh, J.; Olbrich, E.; Jost, J.; Ay, N. Quantifying unique information. Entropy 2014, 16, 2161–2183. [Google Scholar] [CrossRef]
Steudel, B.; Ay, N. Information-theoretic inference of common ancestors. Entropy 2015, 17, 2304–2327. [Google Scholar] [CrossRef]
Matveev, R.; Portegies, J.W. Tropical probability theory and an application to the entropic cone. Kybernetika 2020, 56, 1133–1153. [Google Scholar] [CrossRef]
Matveev, R.; Portegies, J.W. Conditioning in tropical probability theory. Entropy 2023, 25, 1641. [Google Scholar]
Gács, P.; Körner, J. Common information is far less than mutual information. Probl. Control Inf. Theory 1973, 2, 149–162. [Google Scholar]
Ahlswede, R.; Körner, J. On the connection between the entropies of input and output distributions of discrete memoryless channels. In Proceedings of the Fifth Conference on Probability Theory, Brasov, Romania, 1–6 September 1974. [Google Scholar]
Ahlswede, R.; Körner, J. Appendix: On Common Information and Related Characteristics of Correlated Information Sources; Springer: Berlin/Heidelberg, Germany, 2006; pp. 664–677. [Google Scholar]
Makarychev, K.; Makarychev, Y.; Romashchenko, A.; Vereshchagin, N. A new class of non-Shannon-type inequalities for entropies. Commun. Inf. Syst. 2002, 2, 147–166. [Google Scholar] [CrossRef]
Matveev, R.; Portegies, J.W. Tropical diagrams of probability spaces. Inf. Geom. 2020, 3, 61–88. [Google Scholar] [CrossRef]
Kovačević, M.; Stanojević, I.; Šenk, V. On the hardness of entropy minimization and related problems. In Proceedings of the 2012 IEEE Information Theory Workshop, Lausanne, Switzerland, 3–7 September 2012; pp. 512–516. [Google Scholar]
Vidyasagar, M. A metric between probability distributions on finite sets of different cardinalities and applications to order reduction. IEEE Trans. Autom. Control 2012, 57, 2464–2477. [Google Scholar] [CrossRef]

Figure 1. Arrow contraction and expansion in a

Λ_{3}

-diagram. The fan

([X] \overset{}{\leftarrow} [Z] \overset{}{\to} [U])

(shown in red in the Figure) is admissible. Spaces

[Z_{1}]

,

[Z_{2}]

and

[Z]

belong to the co-ideal

⌊U⌋

. After the operation the part of the diagram shown in blue in the Figure is left unmodified.

Figure 1. Arrow contraction and expansion in a

Λ_{3}

-diagram. The fan

([X] \overset{}{\leftarrow} [Z] \overset{}{\to} [U])

(shown in red in the Figure) is admissible. Spaces

[Z_{1}]

,

[Z_{2}]

and

[Z]

belong to the co-ideal

⌊U⌋

. After the operation the part of the diagram shown in blue in the Figure is left unmodified.

Figure 2. Examples of diagrams of probability spaces.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.