Non-linear Information Inequalities

Chan, Terence; Grant, Alex

doi:10.3390/e10040765

Open AccessArticle

Non-linear Information Inequalities

by

Terence Chan

^* and

Alex Grant

Institute for Telecommunications Research, University of South Australia, Australia

^*

Author to whom correspondence should be addressed.

Entropy 2008, 10(4), 765-775; https://doi.org/10.3390/e10040765

Submission received: 24 May 2008 / Accepted: 9 December 2008 / Published: 22 December 2008

(This article belongs to the Special Issue Concepts of Entropy and Their Applications - Papers presented at the Meeting at University of Melbourne, 26 November - 11 December 2007)

Download

Browse Figures

Versions Notes

Abstract

:

We construct non-linear information inequalities from Matúš’ infinite series of linear information inequalities. Each single non-linear inequality is sufficiently strong to prove that the closure of the set of all entropy functions is not polyhedral for four or more random variables, a fact that was already established using the series of linear inequalities. To the best of our knowledge, they are the first non-trivial examples of non-linear information inequalities.

Keywords:

Entropy; entropy function; nonlinear information inequality; nonshannon type information inequality

1. Introduction

Information inequalities play a crucial role in the proofs for almost all source and channel coding converse theorems. Roughly speaking, these inequalities govern the impossibility in information theory. Among information inequalities discovered to date, the most well-known are the Shannon-type inequalities, including the non-negativity of (conditional) entropies and (conditional) mutual information. In [2], a non-Shannon information inequality (that cannot be deduced from any set of Shannon-type inequalities) involving more than three random variables was discovered. Since then, many additional information inequalities have been discovered [4].

Apart from their application in proving converse coding theorems, information inequalities (either linear or non-linear) were shown to have a very close relation with inequalities involving the cardinality of a group and its subgroups [3]. Specifically, an information inequality is valid if and only if its group-theoretic counterpart (obtained by mechanical substitution of symbols) is also valid. For example, the non-negativity of mutual information is equivalent to the group inequality

| G | | G_{1} \cap G_{2} | \geq | G_{1} | | G_{2} |

, where

G_{1}

and

G_{2}

are subgroups of the group G.

Information inequalities are also the most common tool (perhaps even unique), for the characterization of entropy functions (see Definition 1 below). In fact, entropy functions and information inequalities are two sides of the same coin. A complete characterization for entropy functions requires complete knowledge of the set of all information inequalities.

The set of entropy functions involving n random variables,

Γ_{n}^{*}

, and its closure

{\bar{Γ}}_{n}^{*}

are of extreme importance not only because of their relation to information inequalities [6], but also for determination of the set of feasible multicast rates in communication networks employing network coding [5, 7]. Furthermore, determination of

Γ^{*}

would resolve the implication problem of conditional independence (determination of every other conditional independence relation implied by a given set of conditional independence relationships). A simple and explicit characterization of

Γ^{*}

, and

{\bar{Γ}}^{*}

will indeed be very useful. Unfortunately, except in the case when

n < 4

, such a characterization is still missing [1, 2, 4].

Recently, it was shown by Matúš that there are countably infinite many information inequalities [1]. This result, summarized below in Section 2, implies that

{\bar{Γ}}_{n}^{*}

is not polyhedral. The main result of this paper is non-linear inequalities, which we derive from Matúš’ series in Section 3. To the best of our knowledge this is the first example of a non-trivial non-linear information inequality. We use the non-linear inequality to deduce that the closure of the set of all entropy functions is not polyhedral – a fact previously proved in [1] using the infinite sequence of linear inequalities. Finally, in Section 4, we compare the series of linear inequalities and the proposed nonlinear inequality on a projection of

{\bar{Γ}}_{n}^{*}

.

2. Background

Let the index set

N = {1, 2 \dots, n}

induce a real

2^{n}

dimensional Euclidean space

ℱ_{n}

with coordinates indexed by the set of all subsets of N. Specifically, if

g \in ℱ_{n}

, then its coordinates are denoted

(g (α) : α \subseteq N)

. Consequently, points

𝗀 \in ℱ_{n}

can be regarded as functions

𝗀 : 2^{N} \mapsto ℝ

. The focus of this paper is the subset of

ℱ_{n}

corresponding to (almost) entropic functions.

Definition 1 (Entropic function)

A function

𝗀 \in ℱ_{n}

is entropic if

𝗀 (\emptyset) = 0

and there exists discrete random variables

X_{1}, \dots, X_{n}

such that the joint entropy of

{X_{i} : i \in α}

is

𝗀 (α)

for all

\emptyset \neq α \subseteq N

. Furthermore, g is almost entropic if it is the limit of a sequence of entropic functions.

Let

Γ_{n}^{*}

be the set of all entropic functions. Its closure

{\bar{Γ}}_{n}^{*}

(i.e., the set of all almost entropic functions) is well-known to be a closed, convex cone [6]. An important recent result with significant implications for

{\bar{Γ}}_{n}^{*}

is the series of linear information inequalities obtained by Matúš [1] (restated below in Theorem 1). Using this series,

{\bar{Γ}}_{n}^{*}

was proved to be non-polyhedral for

n \geq 4

. This means

{\bar{Γ}}_{n}^{*}

cannot be defined by an intersection of any finite set of linear information inequalities.

Following [1], we will use the following notational conventions. Specific subsets of N will be denoted by concatenation of elements, e.g. 123 will be written for

{1, 2, 3}

. For any

𝗀 \in ℱ_{n}

and sets

I, J \subseteq N

, define

\begin{matrix} ▵_{I, J} 𝗀 & ≜ 𝗀 (I) + 𝗀 (J) - 𝗀 (I \cup J) + 𝗀 (I \cap J) \\ ☐_{12, 34} 𝗀 & ≜ 𝗀 (13) + 𝗀 (23) + 𝗀 (14) + 𝗀 (24) + 𝗀 (34) \\ - 𝗀 (12) - 𝗀 (3) - 𝗀 (4) - 𝗀 (134) - 𝗀 (234) . \end{matrix}

Furthermore, for singletons

i, j, k \in N

, write

▵_{i j | k}

as shorthand for

▵_{i k, j k}

.

Theorem 1 (Matúš)

Let

s \in ℤ^{+},

the set of positive integers, and

𝗀 \in Γ_{n}^{*}

be the entropy function of discrete random variables

{X_{1}, \dots, X_{n}}

. Then

\begin{matrix} s (☐_{12, 34} 𝗀 + ▵_{34 | 5} 𝗀 + ▵_{45 | 3} 𝗀) + ▵_{35 | 4} 𝗀 + \frac{s (s - 1)}{2} (▵_{24 | 3} 𝗀 + ▵_{34 | 2} 𝗀) \geq 0 . \end{matrix}

(1)

Furthermore, assuming that

X_{5} = X_{2}

, the inequality reduces to

\begin{matrix} s (☐_{12, 34} 𝗀 + ▵_{34 | 2} 𝗀 + ▵_{24 | 3} 𝗀) + ▵_{23 | 4} 𝗀 + \frac{s (s - 1)}{2} (▵_{24 | 3} 𝗀 + ▵_{34 | 2} 𝗀) \geq 0 . \end{matrix}

(2)

To the best of our knowledge, this is the only result indicating the existence of infinitely many linear information inequalities. Reductions to

{\bar{Γ}}_{4}^{*}

with

s = 1

recovers the Zhang-Yeung inequality [2] and

s = 2

obtains an inequality of [4].

3. Main Results

3.1. Non-linear information inequalities

The series of information inequalities given in Theorem 1 are all “quadratic” in the parameter

s \in ℤ^{+}

,

Q (s; a (𝗀), b (𝗀), c (𝗀)) ≜ s b (𝗀) + c (𝗀) + s (s - 1) a (𝗀) \geq 0

or equivalently

\begin{matrix} s^{2} a (𝗀) + s (b (𝗀) - a (𝗀)) + c (𝗀) \geq 0, \end{matrix}

(3)

where in the first series of inequalities (1)

\begin{matrix} a (𝗀) & ≜ \frac{1}{2} (▵_{24 | 3} 𝗀 + ▵_{34 | 2} 𝗀) \\ b (𝗀) & ≜ ☐_{12, 34} 𝗀 + ▵_{34 | 5} 𝗀 + ▵_{45 | 3} 𝗀 \\ c (𝗀) & ≜ ▵_{35 | 4} 𝗀 \end{matrix}

(4)

and in the second series of inequalities (2)

\begin{matrix} a (𝗀) & ≜ \frac{1}{2} (▵_{24 | 3} 𝗀 + ▵_{34 | 2} 𝗀) \\ b (𝗀) & ≜ ☐_{12, 34} 𝗀 + ▵_{34 | 2} 𝗀 + ▵_{24 | 3} 𝗀 \\ c (𝗀) & ≜ ▵_{23 | 4} 𝗀 . \end{matrix}

(5)

Proposition 1

Suppose

𝗀 \in F_{n}

satisfies (3) for all positive integers s and

c (𝗀) \geq 0

(or equivalently,

Q (0; a (𝗀), b (𝗀), c (𝗀)) \geq 0

). Then

$a (𝗀) \geq 0$ ,
$a (𝗀) = 0 \Rightarrow b (𝗀) \geq 0$ , and
$a (𝗀) + b (𝗀) + 2 c (𝗀) \geq 0$ . Furthermore, equality holds if and only if $a (𝗀) = b (𝗀) = c (𝗀) = 0$ .

Proof:

Direct verification. ☐

In the following, we will derive non-linear information inequalities from the sequence of linear inequalities (3).

Theorem 2

Suppose that

𝗀 \in F_{n}

and

b (𝗀) \leq 2 a (𝗀)

. Let

\begin{matrix} w (𝗀) ≜ \{\begin{matrix} - \frac{b (𝗀) - a (𝗀)}{2 a (𝗀)} & i f a (𝗀) > 0 \\ 0 & otherwise . \end{matrix} \end{matrix}

Then g satisfies (3) for all nonnegative integers s if and only if

a (𝗀), c (𝗀) \geq 0

and

\begin{matrix} {(b (𝗀) - a (𝗀))}^{2} - 4 a (𝗀) c (𝗀) \leq min (4 a {(𝗀)}^{2} {(w (𝗀) - ⌊ w (𝗀) ⌋)}^{2}, 4 a {(𝗀)}^{2} {(⌈ w (𝗀) ⌉ - w (𝗀))}^{2}) . \end{matrix}

(6)

Proof:

To simplify notation,

a (𝗀), b (𝗀)

and

c (𝗀)

will simply be denoted as

a, b

and c. We will first prove the only-if part. Assume that g satisfies (3) for all nonnegative integers s. When

s = 0

,

c \geq 0

. By Proposition 1,

a \geq 0

. It remains to prove that (6) holds.

Suppose first that

a > 0

. If the quadratic

Q (s; a, b, c)

has no distinct real roots in s, then clearly

{(b - a)}^{2} - 4 a c \leq 0

and the theorem holds. On the other hand, if

Q (s; a, b, c)

has distinct real roots, implying

{(b - a)}^{2} - 4 a c > 0

, then

Q (s; a, b, c)

is negative and is at its minimum when

s = - (b - a) / 2 a

which is greater than

- 1 / 2

by assumption.

Since

Q (s; a, b, c) \geq 0

for all non-negative integer s, the “distance” between the two roots can be at most

2 min (w - ⌊ w ⌋, ⌈ w ⌉ - w)

. In other words,

\begin{matrix} \frac{\sqrt{{(b - a)}^{2} - 4 a c}}{a} \leq 2 min (w - ⌊ w ⌋, ⌈ w ⌉ - w) \end{matrix}

or equivalently,

{(b - a)}^{2} - 4 a c \leq min (4 a^{2} {(w - ⌊ w ⌋)}^{2}, 4 a^{2} {(⌈ w ⌉ - w)}^{2})

.

If on the other hand that

a = 0

, then the assumption

b \leq 2 a

and Proposition 1 implies that

b = 0

. As such, the quadratic inequality

{(b - a)}^{2} - 4 a c \leq 0

and (6) clearly holds. Hence, the only-if part of the theorem is proved.

Now, we will prove the if-part. If

a = 0

, then (6) and the assumption

b \leq 2 a

implies that

b = 0

. The theorem then holds as

c \geq 0

by assumption. Now suppose

a > 0

and

b \leq 2 a

. Using a similar argument as before, (6) implies that either

Q (s; a, b, c) \geq 0

has no real roots or the two real roots are within the closed interval

[⌊ w ⌋, ⌈ w ⌉]

. Since

a > 0

, for all nonnegative integer s, we have

Q (s; a, b, c) \geq 0

, or equivalently,

s^{2} a + s (b - a) + c \geq 0

and hence the theorem is proved. ☐

Theorem 2 showed that Matúš series of linear inequalities is equivalent to the single non-linear inequality (6) under the condition that that

b (𝗀) \leq 2 a (𝗀)

and

a (𝗀), c (𝗀) \geq 0

.

Clearly,

a (𝗀), c (𝗀) \geq 0

holds for all entropic g because of the nonnegativity of conditional mutual information. Therefore, imposing these two conditions does not very much weaken (6). If on the other hand that

b (𝗀) \leq 2 a (𝗀)

does not hold, then Matúš series of inequalities are implied by that

a (𝗀), c (𝗀) \geq 0

. In that case, Matúš’ inequalities will not be of interest. Therefore, our proposed nonlinear inequality essentially is not much weaker than Matúš’ ones.

While (6) is interesting in its own right, it is not so easy to work with. In the following, we shall consider a weaker form.

Corollary 1 (Quadratic information inequality)

Suppose that g satisfies (3) for all nonnegative integers s. If

b (𝗀) \leq 2 a (𝗀)

, then

\begin{matrix} {(b (𝗀) - a (𝗀))}^{2} - 4 a (𝗀) c (𝗀) \leq a {(𝗀)}^{2} . \end{matrix}

(7)

Consequently, if g is almost entropic and

☐_{12, 34} 𝗀 \leq 0

then

\begin{matrix} {(☐_{12, 34} 𝗀 + \frac{▵_{24 | 3} 𝗀 + ▵_{34 | 2} 𝗀}{2})}^{2} - 2 (▵_{24 | 3} 𝗀 + ▵_{34 | 2} 𝗀) ▵_{32 | 4} 𝗀 \leq \frac{{(▵_{24 | 3} 𝗀 + ▵_{34 | 2} 𝗀)}^{2}}{4} . \end{matrix}

Proof:

Since

min (w (𝗀) - ⌊ w (𝗀) ⌋, ⌈ w (𝗀) ⌉ - w (𝗀)) \leq 1 / 2

, the corollary then follows directly from Theorem 2. ☐

Despite the fact that the above “quadratic" information inequality is a consequence of a series of linear inequalities, to the best of our knowledge, it is indeed the first non-trivial non-linear information inequality.

3.2. Implications of Corollary 1

In Proposition 1, we showed that Matúš’ inequalities imply that if

a (𝗀) = 0

, then

b (𝗀) \geq 0

. The same result can also be proved by using the quadratic information inequality in (7).

Implication 1

For any

𝗀 \in F_{n}

such that

\begin{matrix} b (𝗀) \leq 2 a (𝗀) \Rightarrow {(b (𝗀) - a (𝗀))}^{2} - 4 a (𝗀) c (𝗀) \leq a {(𝗀)}^{2}, \end{matrix}

(8)

then

a (𝗀) = 0

implies

b (𝗀) \geq 0

.

Proof:

If

a (𝗀) = 0

, then

{(b (𝗀) - a (𝗀))}^{2} - 4 a (𝗀) c (𝗀) - a {(𝗀)}^{2} = b {(𝗀)}^{2}

. Hence, if

b (𝗀) < 0

, then (8) will be violated leading to a contradiction. ☐

In [1], it was proved that the cone

{\bar{Γ}}_{n}^{*}

is not polyhedral for

n \geq 4

. Ignoring the technical details, the idea of the proof is very simple. First, a sequence of entropic functions

𝗀_{t}

was constructed such that (1) the sequence converges to

𝗀_{0}

, and (2) it has a one-side tangent

{\dot{𝗀}}_{0 +}

which is defined as

{lim}_{t \to 0^{+}} (𝗀_{t} - 𝗀_{0}) / t

. Clearly, if

{\bar{Γ}}_{n}^{*}

is polyhedral, then there exists

ϵ > 0

such that

𝗀_{0} + ϵ {\dot{𝗀}}_{0 +}

is contained in

{\bar{Γ}}_{n}^{*}

. It was then shown that for any

ϵ > 0

, the function

𝗀_{0} + ϵ {\dot{𝗀}}_{0 +}

is not in

{\bar{Γ}}_{n}^{*}

because it violates (3) for sufficiently large s. Therefore,

{\bar{Γ}}_{n}^{*}

is not polyhedral, or equivalently, there are infinitely many information inequalities.

In fact, we can also show that

𝗀_{0} + ϵ {\dot{𝗀}}_{0 +}

also violates the quadratic information inequality obtained in Corollary 1 for any positive ϵ. As such, (7) is sufficient to prove that

{\bar{Γ}}_{n}^{*}

is not polyhedral for

n \geq 4

and hence the following implication.

Implication 2

The quadratic inequality (7) is strong enough to imply that

{\bar{Γ}}_{n}^{*}

is not polyhedral.

Some nonlinear information inequalities are direct consequences of basic linear information inequalities (e.g.,

H {(X)}^{2} I (X; Y) \geq 0

). Such inequalities are trivial in that they are obtained directly as nonlinear transformations of known linear inequalities. Our proposed quadratic inequality (7) is non-trivial, as proved in the following.

Implication 3

The quadratic inequality (7) is a non-linear inequality that cannot be implied by any finite number of linear information inequalities. Specifically, for any given finite set of valid linear information inequalities, there exists

𝗀 \notin {\bar{Γ}}_{n}^{*}

such that g does not satisfy (7) but satisfies all the given linear inequalities.

Proof:

Suppose we are given a finite set of valid linear information inequalities. Then the set of

𝗀 \in F_{n}

satisfying all these linear inequalities is polyhedral. In other words, the set is obtained by taking intersection of a finite number of half-spaces. For simplicity, such a polyhedron will be denoted by Ψ.

We will once again use the sequence of entropic functions

{𝗀_{t}}_{t = 1}^{\infty}

constructed in [1]. Clearly,

𝗀_{t} \in Ψ

for all t since

𝗀 \in Γ_{n}^{*}

. Again, as Ψ is polyhedral,

𝗀_{ϵ} ≜ 𝗀_{0} + ϵ {\dot{𝗀}}_{0 +} \in Ψ

for sufficiently small

ϵ > 0

. In other words,

𝗀_{0} + ϵ {\dot{𝗀}}_{0 +}

satisfies all the given linear inequalities. However, as explained earlier,

𝗀_{ϵ}

violates the quadratic inequality (7) and hence the theorem follows. ☐

4. Characterizing ${\bar{Γ}}_{n}^{*}$ by projection

Although the set of almost entropic functions

{\bar{Γ}}_{n}^{*}

is a closed and convex cone, finding a complete characterization is an extremely difficult task. Therefore, instead of tackling the hard problem directly, it is sensible to consider a relatively simpler problem – the characterization of a “projection" of

{\bar{Γ}}_{n}^{*}

. This projection problem is easier because the dimension of a projection can be much smaller, making it easier to be visualized and to be described. Furthermore, its low dimensionality may also facilitate the use of numerical techniques to find an approximation for the projection.

In this section, we consider a particular projection and will show how inequalities obtained in the previous section be expressed by equivalent ones on the proposed projection. As such, we can have a better idea how the projection looks like. First, we will define our proposed projection Υ.

Define

Υ = \{(a (𝗀), b (𝗀) - a (𝗀)) : 𝗀 \in {\bar{Γ}}_{n}^{*} and a (𝗀) + b (𝗀) + 2 c (𝗀) = 1\}

, or equivalently,

\begin{matrix} Υ = \{(\frac{a (𝗀)}{a (𝗀) + b (𝗀) + 2 c (𝗀)}, \frac{b (𝗀) - a (𝗀)}{a (𝗀) + b (𝗀) + 2 c (𝗀)}) : 𝗀 \in {\bar{Γ}}_{n}^{*} and 𝗀 \neq 0\} . \end{matrix}

(9)

Lemma 1

Υ is a closed and convex set.

Proof:

Since the set

{(a (𝗀), b (𝗀) - a (𝗀)) : 𝗀 \in {\bar{Γ}}_{n}^{*}}

is a closed and convex one, its cross-section (and its affine transform) Υ is also closed and convex. ☐

Since Υ is obtained by projecting

{\bar{Γ}}_{n}^{*}

onto a two-dimensional Euclidean space, any inequality satisfied by all points in Υ induces a corresponding information inequality. Specifically, we have the following proposition.

Proposition 2

Suppose that there exists

k \geq 0

such that

\begin{matrix} {(a + b + 2 c)}^{k} ψ (\frac{a}{a + b + 2 c}, \frac{b - a}{a + b + 2 c}) \geq 0 i f a = b = c = 0 . \end{matrix}

(10)

Then

\begin{matrix} ψ (u, v) \geq 0, \forall (u, v) \in Υ \end{matrix}

(11)

if and only if

\begin{matrix} {(a (𝗀) + b (𝗀) + 2 c (𝗀))}^{k} ψ (\frac{a (𝗀)}{a (𝗀) + b (𝗀) + 2 c (𝗀)}, \frac{b (𝗀) - a (𝗀)}{a (𝗀) + b (𝗀) + 2 c (𝗀)}) \geq 0, \forall_{𝗀} \in {\bar{Γ}}_{n}^{*} . \end{matrix}

(12)

Similarly, (11) holds for all

(u, v) \in Υ

and

v \leq u

if and only if (12) holds for all

𝗀 \in {\bar{Γ}}_{n}^{*}

and

b (𝗀) \leq 2 a (𝗀)

.

Proof:

First, we will prove that (11) implies (12). For any

𝗀 \in {\bar{Γ}}_{n}^{*}

. If

a (𝗀) + b (𝗀) + 2 c (𝗀) = 0

, then by Proposition 1,

a (𝗀) = b (𝗀) = c (𝗀) = 0

and (12) follows from (10). Otherwise,

a (𝗀) + b (𝗀) + 2 c (𝗀) > 0

and (12) follows from (9).

Conversely, for any

(u, v) \in Υ

, by definition, there exists

𝗀 \in {\bar{Γ}}_{n}^{*}

such that (1)

𝗀 \neq 0

and (2)

u = a (𝗀) / (a (𝗀) + b (𝗀) + 2 c (𝗀))

and

v = b (𝗀) - a (𝗀) / (a (𝗀) + b (𝗀) + 2 c (𝗀))

. The inequality (11) then follows from (12) and that

𝗀 \neq 0

(hence,

a (𝗀) + b (𝗀) + 2 c (𝗀) > 0

).

Finally, the constrained counterpart follows from that

(b (𝗀) - a (𝗀)) \leq a (𝗀)

if and only if

b (𝗀) \leq 2 a (𝗀)

. ☐

By Proposition 2, there is a mechanical way to rewrite inequalities for

{\bar{Γ}}_{n}^{*}

as ones for Υ, and vice versa. Therefore, we will abuse notations by calling that (11) and (12) equivalent. In the following, we will rewrite inequalities obtained in previous sections by using Proposition 2.

Proposition 3 (Matúš’ inequalities)

When s is a positive integer, the inequality (3) is equivalent to

\begin{matrix} v \geq \frac{2 u - 2 s^{2} u - 1}{2 s - 1} . \end{matrix}

(13)

Proof:

A direct consequence of Proposition 2 and that

\begin{matrix} \frac{c (𝗀)}{a (𝗀) + b (𝗀) + 2 c (𝗀)} = \frac{1}{2} (1 - \frac{b (𝗀) - a (𝗀)}{a (𝗀) + b (𝗀) + 2 c (𝗀)} - 2 \frac{a (𝗀)}{a (𝗀) + b (𝗀) + 2 c (𝗀)}) . \end{matrix}

(14)

☐

By optimizing the choice of s, we can obtain a stronger piecewise linear inequality which can be rewritten as follows.

Theorem 3 (Piecewise linear inequality)

The piecewise linear inequality

\begin{matrix} min_{s \in Z^{+}} s^{2} a (𝗀) + s (b (𝗀) - a (𝗀)) + c (𝗀) \geq 0 \end{matrix}

(15)

is equivalent to that

\begin{matrix} v \geq L^{l i} (u), \end{matrix}

(16)

where

L^{l i} (u) ≜ {sup}_{s \in Z^{+}} (2 u - 2 s^{2} u - 1) / (2 s - 1)

.

Proof:

A direct consequence of Propositions 2 and 3. ☐

As we shall see in the following lemma,

L^{l i} (u)

can be explicitly characterized.

Lemma 2

L^{l i} (0) = 0

and

L^{l i} (u) = (2 u - 2 s_{o}^{2} u - 1) / (2 s_{o} - 1)

for any

0 < u \leq 1

, where

s_{o}

is the smallest positive integer such that

1 / (1 + 2 s_{o}^{2}) \leq u

.

Proof:

Let

f (s, u) ≜ (2 u - 2 s^{2} u - 1) / (2 s - 1)

. First,

f (s, 0) = - 1 / (2 s - 1)

. Therefore,

L^{l i} (0) = {sup}_{s \in Z^{+}} f (s, 0) = 0

. Also, it is straightforward to prove that

for any fixed $u \geq 1 / 2$ , $f (s, u)$ is a decreasing function of s and hence ${sup}_{s \in Z^{+}} f (s, u) = f (1, u) = - 1$ .
for $0 < u \leq 1 / 2$ , $f (s, u)$ is a strictly concave function of s for $s \geq 1$ and is at its maximum when $s = \frac{1}{2} + \sqrt{\frac{1}{2 u} - \frac{3}{4}} \geq 1$ . As a result, $L^{l i} (u) = max (f (s_{l}, u), f (s_{h}, u))$ where $s_{l} = ⌊ \frac{1}{2} + \sqrt{\frac{1}{2 u} - \frac{3}{4}} ⌋$ and $s_{h} = ⌈\frac{1}{2} + \sqrt{\frac{1}{2 u} - \frac{3}{4}}⌉$ .
Clearly, for any positive integer $s_{o}$ ,

$L^{l i} (u) = \{\begin{matrix} f (s_{o}, u) & if u = 1 / 2 (1 - s_{o} + s_{o}^{2}) \\ f (s_{o} + 1, u) & if u = 1 / 2 (1 + s_{o} + s_{o}^{2}) \end{matrix} .$

Furthermore, if $1 / 2 (1 + s_{o} + s_{o}^{2}) < u < 1 / 2 (1 - s_{o} + s_{o}^{2})$ , we have $s_{l} = s_{o}$ and $s_{h} = s_{o} + 1$ and hence,

$\begin{matrix} L^{l i} (u) & = max (\frac{2 u - 2 s_{o}^{2} u - 1}{2 s_{o} - 1}, \frac{- 4 s_{o} u - 2 s_{o}^{2} u - 1}{2 s_{o} + 1}) . \end{matrix}$

(17)

By solving a system of linear equations, we can show that $f (s_{o}, u) = f (s_{o} + 1, u)$ if and only if $u = 1 / (1 + 2 s_{o}^{2})$ . Therefore,

$L^{l i} (u) = \{\begin{matrix} f (s_{o} + 1, u) & if 1 / 2 (1 + s_{o} + s_{o}^{2}) < u \leq 1 / (1 + 2 s_{o}^{2}) \\ f (s_{o}, u) & if 1 / (1 + 2 s_{o}^{2}) \leq u \leq 1 / 2 (1 - s_{o} + s_{o}^{2}) . \end{matrix}$

Together with the fact that

L^{l i} (u) = - 1 = f (1, u)

for

1 / 2 \leq u \leq 1

, the lemma follows. ☐

Proposition 4 (Quadratic inequality)

The quadratic inequality (7) (subject to that

b (𝗀) \leq 2 a (𝗀)

) is equivalent to

\begin{matrix} {(v + u)}^{2} + 2 u^{2} \leq 2 u \end{matrix}

(18)

subject to that

v \leq u

.

Proof:

By using Proposition 2 and (14), it is straightforward to rewrite (7) as (18). ☐

To illustrate (18), we plot the curves

{(v + u)}^{2} + 2 u^{2} = 2 u

and

v = u

in Figure 1. From the proposition, if

v \leq u

(i.e., the point

(u, v)

is below the dotted line), then

(u, v) \in Υ

implies that

(u, v)

is inside the ellipse.

Proposition 4 gives a nonlinear information inequality on Υ subject to a condition that

v \leq u

. In the following theorem, we relax the inequality so as to remove the condition.

Figure 1. Quadratic inequality (18).

Theorem 4 (Non-linear inequality)

Let

\begin{matrix} L^{n l} (u) = - u - \sqrt{2 u - 2 u^{2}} . \end{matrix}

(19)

For any

(u, v) \in Υ

,

v \geq L^{n l} (u)

. Consequently, by Proposition 2,

\begin{matrix} b (𝗀) & \geq - \sqrt{2 a (𝗀) (a (𝗀) + b (𝗀) + 2 c (𝗀)) - 2 a {(𝗀)}^{2}} \end{matrix}

(20)

\begin{matrix} = - \sqrt{2 a (𝗀) (b (𝗀) + 2 c (𝗀))} \end{matrix}

(21)

Proof:

By Proposition 4, if

(u, v) \in Υ

such that

v \leq u

, then

{(v + u)}^{2} + 2 u^{2} \leq 2 u .

As a result,

v + u \geq - \sqrt{2 u - 2 u^{2}}

or equivalently,

v \geq - u - \sqrt{2 u - 2 u^{2}} .

On the other hand, if

v \geq u

, then

v \geq 0

and hence

v \geq - u - \sqrt{2 u - 2 u^{2}}

. The theorem then follows from Proposition 2. ☐

In the next proposition, we will show that the piecewise linear inequality

v \geq L^{l i} (u)

and the proposed nonlinear inequality

v \geq L^{n l} (u)

coincides for countably infinite number of u.

Proposition 5

For any

0 \leq u \leq 1

, we have

L^{n l} (u) \leq L^{l i} (u)

. Furthermore, equality holds if

u = 1 / (1 + 2 s^{2})

for some nonnegative integer s.

Proof:

By definition,

L^{n l} (0) = L^{l i} (0) = 0

and the proposition holds in this case. Assume now that

0 < u \leq 1

. We first show that

L^{n l} (u) = L^{l i} (u)

when

u = 1 / (1 + 2 s^{2})

for some nonnegative integer s. Suppose that

s = 0

, then

L^{n l} (u) = L^{l i} (u) = - 1

. On the other hand, if

u = 1 / (1 + 2 s^{2})

where s is a positive integer, then it is straightforward to prove that

\begin{matrix} L^{l i} (u) = f (s, u) = f (s + 1, u) = \frac{- 1 - 2 s}{1 + 2 s^{2}} = L^{n l} (u) . \end{matrix}

(22)

Figure 2. Piecewise linear inequality and nonlinear inequality.

By differentiating

L^{n l} (u)

with respect to u, we can prove that

L^{n l} (u)

is convex over

[0, 1]

. For each nonnegative integer s,

L^{l i} (u)

is linear over the interval

[1 / (1 + 2 {(s + 1)}^{2}), 1 / (1 + 2 s^{2})]

and

L^{n l} (u) = L^{l i} (u)

when

u = 1 / (1 + 2 s^{2})

or

1 / (1 + 2 {(s + 1)}^{2})

. Hence,

L^{l i} (u) \geq L^{n l} (u)

over the interval by the convexity of

L^{n l} (u)

. As s can be arbitrarily large,

L^{l i} (u) \geq L^{n l} (u)

for

u \in (0, 1]

and the theorem then follows. ☐

5. Conclusion

In this paper, we constructed several piecewise linear and quadratic information inequalities from a series of information inequalities proved in [1]. Our proposed nonlinear inequality (6) was shown to be equivalent to the whole set of Matúš’ linear inequalities. Hence, we can replace all Matúš’ inequalities with our proposed ones.

However, the inequality is not smooth and may not be easy to work with. Therefore, we relax these nonlinear inequalities to quadratic ones. These quadratic inequalities are strong enough to show that the set of almost entropic functions is not polyhedral.

It is certain that the proposed quadratic inequalities we obtained in (16) and (19) are a consequence of Matúš’ linear inequalities. Yet, the non-linear inequality has a much simpler form. By comparing the inequalites on projections of

{\bar{Γ}}_{n}^{*}

, our figures suggested that these nonlinear inequalities are indeed fairly good approximations to the corresponding piecewise linear inequalities. Furthermore, they are of particular interest for several reasons.

First, all these inequalities are non-trivial and cannot be deduced from any finite number of linear information inequalities. To the best of our knowledge, they are the first non-trivial nonlinear information inequalities. Second, in some cases, it will be relatively easier to work with a single nonlinear inequality, rather than an infinite number of linear inequalities. For example, in order to compute some bounds on a capacity region (say, in a network coding problem), a characterization of

{\bar{Γ}}_{n}^{*}

may be needed as input to a computing system. Surely,

{\bar{Γ}}_{n}^{*}

is unknown and hence an outer bound of

{\bar{Γ}}_{n}^{*}

will be used instead. If one replace the countably infinite number of linear inequalities with a single nonlinear inequality, it may greatly simplify the computing problem. Third, these nonlinear inequalities prompt us to ask new fundamental questions - are nonlinear information inequalities more fundamental than linear information inequalities? Would it be possible that the set

{\bar{Γ}}_{n}^{*}

be completely characterized by a finite number of nonlinear inequalities? If so, what will they look like?

As a final remark, Matúš’ inequalities, and also all the non-linear inequalities we obtained, are “tighter” than the Shannon inequalities only in the region where

b (𝗀) \leq 2 a (𝗀)

. When

b (𝗀) \geq 2 a (𝗀)

, the two inequalities are direct consequences of non-negativity of conditional mutual information. This phenomenon seems to suggest that entropic functions are much more difficult to characterize in the region

b (𝗀) < 2 a (𝗀)

. An explanation for this phenomenon is still lacking.

Acknowledgements

This work was supported by the Australian Government under ARC grant DP0557310.

References and Notes

Matúš, F. Infinitely many information inequalities. In IEEE Int. Symp. Inform. Theory; Nice, France, July 2007; pp. 41–44. [Google Scholar]
Zhang, Z.; Yeung, R.W. On the characterization of entropy function via information inequalities. IEEE Trans. Inform. Theory 1998, 44, 1440–1452. [Google Scholar] [CrossRef]
Chan, T.H.; Yeung, R.W. On a relation between information inequalities and group theory. IEEE Trans. Inform. Theory 2002, 48, 1992–1995. [Google Scholar] [CrossRef]
Dougherty, R.; Freiling, C.; Zeger, K. Six new non-Shannon information inequalities. In IEEE Int. Symp. Inform. Theory; Seattle, USA, July 2006; pp. 233–236. [Google Scholar]
Chan, T.H.; Grant, A. Dualities between entropy functions and network codes. IEEE Trans. Inform. Theory 2008, 54, 4470–4487. [Google Scholar] [CrossRef]
Yeung, R. A framework for linear information inequalities. IEEE Trans. Inform. Theory 1997, 43, 1924–1934. [Google Scholar] [CrossRef]
Song, L.; Yeung, R.W.; Cai, N. Zero-error network coding for acyclic networks. IEEE Trans. Inform. Theory 2003, 49, 3129–3139. [Google Scholar] [CrossRef]

Share and Cite

MDPI and ACS Style

Chan, T.; Grant, A. Non-linear Information Inequalities. Entropy 2008, 10, 765-775. https://doi.org/10.3390/e10040765

AMA Style

Chan T, Grant A. Non-linear Information Inequalities. Entropy. 2008; 10(4):765-775. https://doi.org/10.3390/e10040765

Chicago/Turabian Style

Chan, Terence, and Alex Grant. 2008. "Non-linear Information Inequalities" Entropy 10, no. 4: 765-775. https://doi.org/10.3390/e10040765

APA Style

Chan, T., & Grant, A. (2008). Non-linear Information Inequalities. Entropy, 10(4), 765-775. https://doi.org/10.3390/e10040765

Article Menu

Non-linear Information Inequalities

Abstract

1. Introduction

2. Background

3. Main Results

3.1. Non-linear information inequalities

3.2. Implications of Corollary 1

4. Characterizing ${\bar{Γ}}_{n}^{*}$ by projection

5. Conclusion

Acknowledgements

References and Notes

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Non-linear Information Inequalities

Abstract

1. Introduction

2. Background

3. Main Results

3.1. Non-linear information inequalities

3.2. Implications of Corollary 1

4. Characterizing Γ ¯ n * by projection

5. Conclusion

Acknowledgements

References and Notes

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4. Characterizing ${\bar{Γ}}_{n}^{*}$ by projection