Information Inequalities via Submodularity and a Problem in Extremal Graph Theory

Igal Sason

doi:10.3390/e24050597

Abstract

The present paper offers, in its first part, a unified approach for the derivation of families of inequalities for set functions which satisfy sub/supermodularity properties. It applies this approach for the derivation of information inequalities with Shannon information measures. Connections of the considered approach to a generalized version of Shearer’s lemma, and other related results in the literature are considered. Some of the derived information inequalities are new, and also known results (such as a generalized version of Han’s inequality) are reproduced in a simple and unified way. In its second part, this paper applies the generalized Han’s inequality to analyze a problem in extremal graph theory. This problem is motivated and analyzed from the perspective of information theory, and the analysis leads to generalized and refined bounds. The two parts of this paper are meant to be independently accessible to the reader.

Keywords:

extremal combinatorics; graphs; Han’s inequality; information inequalities; polymatroid; rank function; set function; Shearer’s lemma; submodularity

1. Introduction

Information measures and information inequalities are of fundamental importance and wide applicability in the study of feasibility and infeasibility results in information theory, while also offering very useful tools which serve to deal with interesting problems in various fields of mathematics [1,2]. The characterization of information inequalities has been of interest for decades (see, e.g., [3,4] and references therein), mainly triggered by their indispensable role in proving direct and converse results for channel coding and data compression for single and multi-user information systems. Information inequalities, which apply to classical and generalized information measures, have also demonstrated far-reaching consequences beyond the study of the coding theorems and fundamental limits of communication systems. One such remarkable example (among many) is the usefulness of information measures and information inequalities in providing information–theoretic proofs in the field of combinatorics and graph theory (see, e.g., [5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22]).

A basic property that is commonly used for the characterization of information inequalities relies on the nonnegativity of the (conditional and unconditional) Shannon entropy of discrete random variables, the nonnegativity of the (conditional and unconditional) relative entropy and the Shannon mutual information of general random variables, and the chain rules which hold for these classical information measures. A byproduct of these properties is the sub/supermodularity of some classical information measures, which also prove to be useful by taking advantage of the vast literature on sub/supermodular functions and polymatroids [22,23,24,25,26,27,28,29,30,31]. Another instrumental information inequality is the entropy power inequality, which dates back to Shannon [32]. It has been extensively generalized for different types of random variables and generalized entropies, studied in regard to its geometrical relations [33], and it has been also ubiquitously used for the analysis of various information–theoretic problems.

Among the most useful information inequalities are Han’s inequality [34], its generalized versions (e.g., [15,25,30,31]), and Shearer’s lemma [7] with its generalizations and refinements (e.g., [15,31,35]). In spite of their simplicity, these inequalities prove to be useful in information theory, and other diverse fields of mathematics and engineering (see, e.g., [6,35]). More specifically in regard to these inequalities, in Proposition 1 of [22], Madiman and Tetali introduced an information inequality which can be specialized to Han’s inequality, and which also refines Shearer’s lemma while also providing a counterpart result. In [30], Tian generalized Han’s inequality by relying on the sub/supermodularity of the unconditional/conditional Shannon entropy. Likewise, the work in [31] by Kishi et al., relies on the sub/supermodularity properties of Shannon information measures, and it provides refinements of Shearer’s lemma and Han’s inequality. Apart of the refinements of these classical and widely-used inequalities in [31], the suggested approach in the present work can be viewed in a sense as a (nontrivial) generalization and extension of a result in [31] (to be explained in Section 3.2).

This work is focused on the derivation of information inequalities via submodularity and nonnegativity properties, and on a problem in extremal graph theory whose analysis relies on an information inequality. The field of extremal graph theory, which is a subfield of extremal combinatorics, was among the early and fast developing branches of graph theory during the 20th century. Extremal graph theory explores the relations between properties of graphs such as its order, size, chromatic number or maximal and minimal degrees, under some constraints on the graph (by, e.g., considering graphs of a fixed order, and by also imposing a type of a forbidden subgraph). The interested reader is referred to the comprehensive textbooks [10,36] on the vast field of extremal combinatorics and extremal graph theory.

This paper suggests an approach for the derivation of families of inequalities for set functions, and it applies it to obtain information inequalities with Shannon information measures that satisfy sub/supermodularity and monotonicity properties. Some of the derived information inequalities are new, while some known results (such as the generalized version of Han’s inequality [25]) are reproduced as corollaries in a simple and unified way. This paper also applies the generalized Han’s inequality to analyze a problem in extremal graph theory, with an information–theoretic proof and interpretation. The analysis leads to some generalized and refined bounds in comparison to the insightful results in Theorems 4.2 and 4.3 of [6]. For the purpose of the suggested problem and analysis, the presentation here is self-contained.

The paper is structured as follows: Section 2 provides essential notation and preliminary material for this paper. Section 3 presents a new methodology for the derivation of families of inequalities for set functions which satisfy sub/supermodularity properties (Theorem 1). The suggested methodology is then applied in Section 3 for the derivation of information inequalities by relying on sub/supermodularity properties of Shannon information measures. Section 3 also considers connections of the suggested approach to a generalized version of Shearer’s lemma, and to other results in the literature. Most of the results in Section 3 are proved in Section 4. Section 5 applies the generalized Han’s inequality to a problem in extremal graph theory (Theorem 2). A byproduct of Theorem 2, which is of interest in its own right, is also analyzed in Section 5 (Theorem 3). The presentation and analysis in Section 5 is accessible to the reader, independently of the earlier material on information inequalities in Section 3 and Section 4. Some additional proofs, mostly for making the paper self-contained or for suggesting an alternative proof, are relegated to the appendices (Appendix A and Appendix B).

2. Preliminaries and Notation

The present section provides essential notation and preliminary material for this paper.

$N ≜ {1, 2, \dots}$ denotes the set of natural numbers.
$R$ denotes the set of real numbers, and $R_{+}$ denotes the set of nonnegative real numbers.
Ø denotes the empty set.
$2^{Ω} ≜ \{A : A \subseteq Ω\}$ denotes the power set of a set $Ω$ (i.e., the set of all subsets of $Ω$ ).
$T^{c} ≜ Ω \ T$ denotes the complement of a subset $T$ in $Ω$ .
$1 {E}$ is an indicator of E; it is 1 if event E is satisfied, and zero otherwise.
$[n] ≜ {1, \dots, n}$ for every $n \in N$ ;
$X^{n} ≜ (X_{1}, \dots, X_{n})$ denotes an n-dimensional random vector;
$X_{S} ≜ {(X_{i})}_{i \in S}$ is a random vector for a nonempty subset $S \subseteq [n]$ ; if $S = Ø$ , then $X_{S}$ is an empty set, and conditioning on $X_{S}$ is void.
Let X be a discrete random variable that takes its values on a set $X$ , and let $P_{X}$ be the probability mass function (PMF) of X. The Shannon entropy of X is given by

$\begin{matrix} H (X) ≜ - \sum_{x \in X} P_{X} (x) log P_{X} (x), \end{matrix}$

(1)

where throughout this paper, we take all logarithms to base 2.
The binary entropy function $H_{b} : [0, 1] \to [0, log 2]$ is given by

$\begin{matrix} H_{b} (p) ≜ - p log p - (1 - p) log (1 - p), p \in [0, 1], \end{matrix}$

(2)

where, by continuous extension, the convention $0 log 0 = 0$ is used.
Let X and Y be discrete random variables with a joint PMF $P_{X Y}$ , and a conditional PMF of X given Y denoted by $P_{X | Y}$ . The conditional entropy of X given Y is defined as

$\begin{array}{l} (3a) & H (X | Y) & ≜ - \sum_{(x, y) \in X \times Y} P_{X Y} (x, y) log P_{X | Y} (x | y) \\ (3b) & = \sum_{y \in Y} P_{Y} (y) H (X | Y = y), \end{array}$

and

$\begin{matrix} H (X | Y) & = H (X, Y) - H (Y) . \end{matrix}$

(4)
The mutual information between X and Y is symmetric in X and Y, and it is given by

$\begin{array}{l} (5a) & I (X; Y) & = H (X) + H (Y) - H (X, Y) \\ (5b) & = H (X) - H (X | Y) \\ (5c) & = H (Y) - H (Y | X) . \end{array}$
The conditional mutual information between two random variables X and Y, given a third random variable Z, is symmetric in X and Y and it is given by

$\begin{array}{l} (6a) & I (X; Y | Z) & = H (X | Z) - H (X | Y, Z) \\ (6b) & = H (X, Z) + H (Y, Z) - H (Z) - H (X, Y, Z) . \end{array}$
For continuous random variables, the sums in (1) and (3) are replaced with integrals, and the PMFs are replaced with probability densities. The entropy of a continuous random variable is named differential entropy.
For an n-dimensional random vector $X^{n}$ , the entropy power of $X^{n}$ is given by

$\begin{matrix} N (X^{n}) ≜ exp (\frac{2}{n} H (X^{n})), \end{matrix}$

(7)

where the base of the exponent is identical to the base of the logarithm in (1).

We rely on the following basic properties of the Shannon information measures:

Conditioning cannot increase the entropy, i.e.,

$\begin{matrix} H (X | Y) \leq H (X), \end{matrix}$

(8)

with equality in (8) if and only if X and Y are independent.
Generalizing (4) to n-dimensional random vectors gives the chain rule

$\begin{matrix} H (X^{n}) = \sum_{i = 1}^{n} H (X_{i} | X^{i - 1}) . \end{matrix}$

(9)
The subadditivity property of the entropy is implied by (8) and (9):

$\begin{matrix} H (X^{n}) \leq \sum_{i = 1}^{n} H (X_{i}), \end{matrix}$

(10)

with equality in (10) if and only if $X_{1}, \dots, X_{n}$ are independent random variables.
Nonnegativity of the (conditional) mutual information: In light of (5) and (8), $I (X; Y) \geq 0$ with equality if and only if X and Y are independent. More generally, $I (X; Y | Z) \geq 0$ with equality if and only if X and Y are conditionally independent given Z.

Let

Ω

be a finite and non-empty set, and let

f : 2^{Ω} \to R

be a real-valued set function (i.e., f is defined for all subsets of

Ω

). The following definitions are used.

Definition 1.

(Sub/Supermodular function). The set function

f : 2^{Ω} \to R

is submodular if

\begin{matrix} f (T) + f (S) \geq f (T \cup S) + f (T \cap S), \forall S, T \subseteq Ω \end{matrix}

(11)

Likewise, f is supermodular if

- f

is submodular.

An identical characterization of submodularity is the diminishing return property (see, e.g., Proposition 2.2 in [23]), where a set function

f : 2^{Ω} \to R

is submodular if and only if

\begin{matrix} S \subset T \subset Ω, ω \in T^{c} ⟹ f (S \cup {ω}) - f (S) \geq f (T \cup {ω}) - f (T) . \end{matrix}

(12)

This means that the larger is the set, the smaller is the increase in f when a new element is added.

Definition 2.

(Monotonic function). The set function

f : 2^{Ω} \to R

is monotonically increasing if

\begin{matrix} S \subseteq T \subseteq Ω ⟹ f (S) \leq f (T) . \end{matrix}

(13)

Likewise, f is monotonically decreasing if

- f

is monotonically increasing.

Definition 3.

(Polymatroid, ground set and rank function). Let

f : 2^{Ω} \to R

be submodular and monotonically increasing set function with

f (Ø) = 0

. The pair

(Ω, f)

is called a polymatroid, Ω is called a ground set, and f is called a rank function.

Definition 4.

(Subadditive function). The set function

f : 2^{Ω} \to R

is subadditive if, for all

S, T \subseteq Ω

,

\begin{matrix} f (S \cup T) \leq f (S) + f (T) . \end{matrix}

(14)

A nonnegative and submodular set function is subadditive (this readily follows from (11) and (14)). The next proposition introduces results from [25,28,37]. For the sake of completeness, we provide a proof in Appendix A.

Proposition 1.

Let Ω be a finite and non-empty set, and let

{X_{ω}}_{ω \in Ω}

be a collection of discrete random variables. Then, the following holds:

(a): The set function $f : 2^{Ω} \to R$ , given by

$\begin{matrix} f (T) ≜ H (X_{T}), T \subseteq Ω, \end{matrix}$

(15)

is a rank function.
(b): The set function $f : 2^{Ω} \to R$ , given by

$\begin{matrix} f (T) ≜ H (X_{T} | X_{T^{c}}), T \subseteq Ω, \end{matrix}$

(16)

is supermodular, monotonically increasing, and $f (Ø) = 0$ .
(c): The set function $f : 2^{Ω} \to R$ , given by

$\begin{matrix} f (T) ≜ I (X_{T}; X_{T^{c}}), T \subseteq Ω, \end{matrix}$

(17)

is submodular, $f (Ø) = 0$ , but f is not a rank function. The latter holds since the equality $f (T) = f (T^{c})$ , for all $T \subseteq Ω$ , implies that f is not a monotonic function.
(d): Let $U, V \subseteq Ω$ be disjoint subsets, and let the entries of the random vector $X_{V}$ be conditionally independent given $X_{U}$ . Then, the set function $f : 2^{V} \to R$ given by

$\begin{matrix} f (T) ≜ I (X_{U}; X_{T}), T \subseteq V, \end{matrix}$

(18)

is a rank function.
(e): Let $X_{Ω} = {X_{ω}}_{ω \in Ω}$ be independent random variables, and let $f : 2^{Ω} \to R$ be given by

$\begin{matrix} f (T) ≜ H (\sum_{ω \in T} X_{ω}), T \subseteq Ω . \end{matrix}$

(19)

Then, f is a rank function.

The following proposition addresses the setting of general alphabets.

Proposition 2.

For general alphabets, the set functions f in (15) and (17)–(19) are submodular, and the set function f in (16) is supermodular with

f (Ø) ≜ 0

. Moreover, the function in (18) stays to be a rank function, and the function in (19) stays to be monotonically increasing.

Proof.

The sub/supermodularity properties in Proposition 1 are preserved due to the nonnegativity of the (conditional) mutual information. The monotonicity property of the functions in (18) and (19) is preserved also in the general alphabet setting due to (A10) and (A14c), and the mutual information in (18) is nonnegative. □

Remark 1.

In contrast to the entropy of discrete random variables, the differential entropy of continuous random variables is not functionally submodular in the sense of Lemma A.2 in [38]. This refers to a different form of submodularity, which was needed by Tao [38] to prove sumset inequalities for the entropy of discrete random variables. A follow-up study in [39] by Kontoyiannis and Madiman required substantially new proof strategies for the derivation of sumset inequalities with the differential entropy of continuous random variables. The basic property which replaces the discrete functional submodularity is the data-processing property of mutual information [39]. In the context of the present work, where the commonly used definition of submodularity is used (see Definition 1), the Shannon entropy of discrete random variables and the differential entropy of continuous random variables are both submodular set functions.

We rely, in this paper, on the following standard terminology for graphs. An undirected graph G is an ordered pair

G = (V, E)

where

V = V (G)

is a set of elements, and

E = E (G)

is a set of 2-element subsets (pairs) of V. The elements of V are called the vertices of G, and the elements of E are called the edges of G. We use the notation

V = V (G)

and

E = E (G)

for the sets of vertices and edges, respectively, in the graph G. The number of vertices in a finite graph G is called the order of G, and the number of edges is called the size of G. Throughout this paper, we assume that the graph G is undirected and finite; it is also assumed to be a simple graph, i.e., it has no loops (no edge connects a vertex in G to itself) and there are no multiple edges which connect a pair of vertices in G. If

e = {u, v} \in E (G)

, then the vertices u and v are the two ends of the edge e. The elements u and v are adjacent vertices (neighbors) if they are connected by an edge in G, i.e., if

e = {u, v} \in E (G)

.

3. Inequalities via Submodularity

3.1. A New Methodology

The present subsection presents a new methodology for the derivation of families of inequalities for set functions, and in particular inequalities with information measures. The suggested methodology relies, to large extent, on the notion of submodularity of set functions, and it is presented in the next theorem.

Theorem 1.

Let Ω be a finite set with

| Ω | = n

. Let

f : 2^{Ω} \to R

with

f (Ø) = 0

, and

g : R \to R

. Let the sequence

{\{t_{k}^{(n)}\}}_{k = 1}^{n}

be given by

\begin{matrix} t_{k}^{(n)} ≜ \frac{1}{(\binom{n}{k})} \sum_{T \subseteq Ω : | T | = k} g (\frac{f (T)}{k}), k \in [n] . \end{matrix}

(20)

(a): If f is submodular, and g is monotonically increasing and convex, then the sequence ${\{t_{k}^{(n)}\}}_{k = 1}^{n}$ is monotonically decreasing, i.e.,

$\begin{matrix} t_{1}^{(n)} \geq t_{2}^{(n)} \geq \dots \geq t_{n}^{(n)} = g (\frac{f (Ω)}{n}) . \end{matrix}$

(21)

In particular,

$\begin{matrix} \sum_{T \subseteq Ω : | T | = k} g (\frac{f (T)}{k}) \geq (\binom{n}{k}) g (\frac{f (Ω)}{n}), k \in [n] . \end{matrix}$

(22)
(b): If f is submodular, and g is monotonically decreasing and concave, then the sequence ${\{t_{k}^{(n)}\}}_{k = 1}^{n}$ is monotonically increasing.
(c): If f is supermodular, and g is monotonically increasing and concave, then the sequence ${\{t_{k}^{(n)}\}}_{k = 1}^{n}$ is monotonically increasing.
(d): If f is supermodular, and g is monotonically decreasing and convex, then the sequence ${\{t_{k}^{(n)}\}}_{k = 1}^{n}$ is monotonically decreasing.

Proof.

See Section 4.1. □

Corollary 1.

Let Ω be a finite set with

| Ω | = n

,

f : 2^{Ω} \to R

, and

g : R \to R

be convex and monotonically increasing. If

f is a rank function,
$g (0) > 0$ or there is $ℓ \in N$ such that $g (0) = \dots = g^{(ℓ - 1)} (0) = 0$ with $g^{(ℓ)} (0) > 0$ ,
${k_{n}}_{n = 1}^{\infty}$ is a sequence such that $k_{n} \in [n]$ for all $n \in N$ with $k_{n} \underset{n \to \infty}{⟶} \infty$ ,

then

\begin{matrix} lim_{n \to \infty} \{\frac{1}{n} log (\sum_{T \subseteq Ω : | T | = k_{n}} g (\frac{f (T)}{k_{n}})) - H_{b} (\frac{k_{n}}{n})\} = 0, \end{matrix}

(23)

and if

lim_{n \to \infty} \frac{k_{n}}{n} = β \in [0, 1]

, then

\begin{matrix} lim_{n \to \infty} \frac{1}{n} log (\sum_{T \subseteq Ω : | T | = k_{n}} g (\frac{f (T)}{k_{n}})) = H_{b} (β) . \end{matrix}

(24)

Proof.

See Section 4.2. □

Corollary 2.

Let Ω be a finite set with

| Ω | = n

, and

f : 2^{Ω} \to R

be submodular and nonnegative with

f (Ø) = 0

. Then,

(a): For $α \geq 1$ and $k \in [n - 1]$

$\begin{matrix} \sum_{T \subseteq Ω : | T | = k} (f^{α} (Ω) - f^{α} (T)) \leq c_{α} (n, k) f^{α} (Ω), \end{matrix}$

(25)

with

$\begin{matrix} c_{α} (n, k) ≜ (1 - \frac{k^{α}}{n^{α}}) (\binom{n}{k}) . \end{matrix}$

(26)

For $α = 1$ , (25) holds with $c_{1} (n, k) = (\binom{n - 1}{k})$ regardless of the nonnegativity of f.
(b): If f is also monotonically increasing (i.e., f is a rank function), then for $α \geq 1$

$\begin{matrix} {(\frac{k}{n})}^{α - 1} (\binom{n - 1}{k - 1}) f^{α} (Ω) \leq \sum_{T \subseteq Ω : | T | = k} f^{α} (T) \leq (\binom{n}{k}) f^{α} (Ω), k \in [n] . \end{matrix}$

(27)

Proof.

See Section 4.3. □

Corollary 2 is next specialized to reproduce Han’s inequality [34], and a generalized version of Han’s inequality (Section 4 of [25]).

Let

X^{n} = (X_{1}, \dots, X_{n})

be a random vector with finite entropies

H (X_{i})

for all

i \in [n]

. The set function

f : 2^{[n]} \to [0, \infty)

, given by

f (T) = H (X_{T})

for all

T \subseteq [n]

, is submodular [25] (see Proposition 1a and Proposition 2). From (25), the following holds:

(a): Setting $α = 1$ in (25) implies that, for all $k \in [n - 1]$ ,

$\begin{array}{l} (28a) & \sum_{1 \leq i_{1} < \dots < i_{k} \leq n} (H (X^{n}) - H (X_{i_{1}}, \dots, X_{i_{k}})) & \leq (1 - \frac{k}{n}) (\binom{n}{k}) H (X^{n}) \\ (28b) & = (\binom{n - 1}{k}) H (X^{n}), \end{array}$
(b): Consequently, setting $k = n - 1$ in (28) gives

$\begin{matrix} \sum_{i = 1}^{n} (H (X^{n}) - H (X_{1}, \dots, X_{i - 1}, X_{i + 1}, \dots, X_{n})) \leq H (X^{n}), \end{matrix}$

(29)

which gives Han’s inequality.

Further applications of Theorem 1 lead to the next corollary, which partially introduces some known results that have been proved on a case-by-case basis in Theorems 17.6.1–17.6.3 of [1] and Section 2 of [2]. In particular, the monotonicity properties of the sequences in (30) and (32)–(34) were proved in Theorems 1 and 2, and Corollaries 1 and 2 of [2]. Both known and new results are readily obtained here, in a unified way, from Theorem 1. The utility of one of these inequalities in extremal combinatorics is discussed in the continuation to this subsection (see Proposition 3), providing a natural generalization of a beautiful combinatorial result in Section 3.2 of [19].

Corollary 3.

Let

{X_{i}}_{i = 1}^{n}

be random variables with finite entropies. Then, the following holds:

(a): The sequences

$\begin{matrix} h_{k}^{(n)} ≜ \frac{1}{(\binom{n}{k})} \sum_{T \subseteq [n] : | T | = k} \frac{H (X_{T})}{k}, k \in [n], \end{matrix}$

(30)

$\begin{matrix} ℓ_{k}^{(n)} ≜ \frac{1}{(\binom{n}{k})} \sum_{T \subseteq [n] : | T | = k} \frac{I (X_{T}; X_{T^{c}})}{k}, k \in [n] \end{matrix}$

(31)

are monotonically decreasing in k. If ${X_{i}}_{i = 1}^{n}$ are independent, then also the sequence

$\begin{matrix} m_{k}^{(n)} ≜ \frac{1}{(\binom{n - 1}{k - 1})} \sum_{T \subseteq [n] : | T | = k} H (\sum_{ω \in T} X_{ω}), k \in [n] \end{matrix}$

(32)

is monotonically decreasing in k.
(b): The sequence

$\begin{matrix} r_{k}^{(n)} ≜ \frac{1}{(\binom{n}{k})} \sum_{T \subseteq [n] : | T | = k} \frac{H (X_{T} | X_{T^{c}})}{k}, k \in [n] \end{matrix}$

(33)

is monotonically increasing in k.
(c): For every $r > 0$ , the sequences

$\begin{array}{l} (34) & s_{k}^{(n)} (r) ≜ \frac{1}{(\binom{n}{k})} \sum_{T \subseteq [n] : | T | = k} N^{r} (X_{T}), k \in [n], \\ (35) & u_{k}^{(n)} (r) ≜ \frac{1}{(\binom{n}{k})} \sum_{T \subseteq [n] : | T | = k} exp (- \frac{r H (X_{T} | X_{T^{c}})}{k}), k \in [n], \\ (36) & v_{k}^{(n)} (r) ≜ \frac{1}{(\binom{n}{k})} \sum_{T \subseteq [n] : | T | = k} exp (\frac{r I (X_{T}; X_{T^{c}})}{k}), k \in [n] \end{array}$

are monotonically decreasing in k. If ${X_{i}}_{i = 1}^{n}$ are independent, then also the sequence

$\begin{matrix} w_{k}^{(n)} (r) ≜ \frac{1}{(\binom{n}{k})} \sum_{T \subseteq [n] : | T | = k} N^{r} (\sum_{ω \in T} X_{ω}), k \in [n] \end{matrix}$

(37)

is monotonically decreasing in k.

Proof.

The finite entropies of

{X_{i}}_{i = 1}^{n}

assure that the entropies involved in the sequences (30)–(37) are finite. Item (a) follows from Theorem 1a, where the submodular set functions f which correspond to (30)–(32) are given in (15), (17) and (19), respectively, and g is the identity function on the real line. The identity

k (\binom{n}{k}) = n (\binom{n - 1}{k - 1})

is used for (32). Item (b) follows from Theorem 1c, where f is the supermodular function in (16) and g is the identity function on the real line. We next prove Item (c). The sequence (34) is monotonically decreasing by Theorem 1a, where f is the submodular function in (15), and

g : R \to R

is the monotonically increasing and convex function defined as

g (x) = exp (2 r x)

for

x \in R

(with

r > 0

). The sequence (35) is monotonically decreasing by Theorem 1d, where f is the supermodular function in (16), and

g : R \to R

is the monotonically decreasing and convex function defined as

g (x) = exp (- r x)

for

x \in R

. The sequence (36) is monotonically decreasing by Theorem 1a, where f is the submodular function in (17) and g is the monotonically increasing and convex function defined as

g (x) = exp (r x)

for

x \in R

. Finally, the sequence (37) is monotonically decreasing by Theorem 1a, where f is the submodular function in (19) and g is the monotonically increasing and convex function defined as

g (x) = exp (2 r x)

for

x \in R

. □

Remark 2.

From Proposition 2, since the proof of Corollary 3 only relies on the sub/supermodularity property of f, the random variables

{X_{i}}_{i = 1}^{n}

do not need to be discrete in Corollary 3. In the reproduction of Han’s inequality as an application of Corollary 2, the random variables

{X_{i}}_{i = 1}^{n}

do not need to be discrete as well since f is not required to be nonnegative if

α = 1

(only the submodularity of f in (15) is required, which holds due to Proposition 2).

The following result exemplifies the utility of the monotonicity result of the sequence (30) in extremal combinatorics. It also generalizes the result in Section 3.2 of [19] for an achievable upper bound on the cardinality of a finite set in the three-dimensional Euclidean space, expressed as a function of its number of projections on each of the planes

X Y, X Z

and

Y Z

. The next result provides an achievable upper bound on the cardinality of a finite set of points in an n-dimensional Euclidean space, expressed as a function of its number of projections on each of the k-dimensional Euclidean subspaces with an arbitrary

k < n

.

Proposition 3.

Let

P \subseteq R^{n}

be a finite set of points in the n-dimensional Euclidean space with

| P | ≜ M

. Let

k \in [n - 1]

, and

ℓ ≜ (\binom{n}{k})

. Let

R_{1}, \dots, R_{ℓ}

be the projections of

P

on each of the k-dimensional subspaces of

R^{n}

, and let

| R_{j} | = M_{j}

for all

j \in [ℓ]

. Then,

\begin{matrix} | P | \leq {(\prod_{j = 1}^{(\binom{n}{k})} M_{j})}^{\frac{1}{(\binom{n - 1}{k - 1})}} . \end{matrix}

(38)

Let

R ≜ \frac{log M}{n}

, and

R_{j} ≜ \frac{log M_{j}}{k}

for all

j \in [ℓ]

. An equivalent form of (38) is given by the inequality

\begin{matrix} R \leq \frac{1}{ℓ} \sum_{j = 1}^{ℓ} R_{j} . \end{matrix}

(39)

Moreover, if

M_{1} = \dots = M_{ℓ}

and

\sqrt[k]{M_{1}} \in N

, then (38) and (39) are satisfied with equality if

P

is a grid of points in

R^{n}

with

\sqrt[k]{M_{1}}

points on each dimension (so,

M = M_{1}^{\frac{n}{k}}

).

Proof.

Pick uniformly at random a point

X^{n} = (X_{1}, \dots, X_{n}) \in P

. Then,

\begin{matrix} H (X^{n}) = log | P | . \end{matrix}

(40)

The sequence in (30) is monotonically decreasing, so

h_{k}^{(n)} \geq h_{n}^{(n)}

, which is equivalent to

\begin{matrix} (\binom{n - 1}{k - 1}) H (X^{n}) \leq \sum_{T \subseteq [n] : | T | = k} H (X_{T}) . \end{matrix}

(41)

Let

S_{1}, \dots, S_{ℓ}

be the k-subsets of the set

[n]

, ordered in a way such that

M_{j}

is the cardinality of the projection of the set

P

on the k-dimensional subspace whose coordinates are the elements of the subset

S_{j}

. Then, (41) can be expressed in the form

\begin{matrix} (\binom{n - 1}{k - 1}) H (X^{n}) \leq \sum_{j = 1}^{ℓ} H (X_{S_{j}}), \end{matrix}

(42)

and also

\begin{matrix} H (X_{S_{j}}) \leq log M_{j}, j \in [ℓ], \end{matrix}

(43)

since the entropy of a random variable is upper bounded by the logarithm of the number of its possible values. Combining (40), (42) and (43) gives

\begin{matrix} (\binom{n - 1}{k - 1}) log | P | \leq \sum_{j = 1}^{ℓ} log M_{j} . \end{matrix}

(44)

Exponentiating both sides of (44) gives (38). In addition, using the identity

(\binom{n}{k}) = \frac{n}{k} (\binom{n - 1}{k - 1})

gives (39) from (44). Finally, the sufficiency condition for equalities in (38) or (39) can be easily verified, which is obtained if

P

is a grid of points in

R^{n}

with the same finite number of projections on each dimension. □

3.2. Connections to a Generalized Version of Shearer’s Lemma and Other Results in the Literature

The next proposition is a known generalized version of Shearer’s Lemma.

Proposition 4.

Let Ω be a finite set, let

{S_{j}}_{j = 1}^{M}

be a finite collection of subsets of Ω (with

M \in N

), and let

f : 2^{Ω} \to R

be a set function.

(a): If f is non-negative and submodular, and every element in Ω is included in at least $d \geq 1$ of the subsets ${S_{j}}_{j = 1}^{M}$ , then

$\begin{matrix} \sum_{j = 1}^{M} f (S_{j}) \geq d f (Ω) . \end{matrix}$

(45)
(b): If f is a rank function, $A \subset Ω$ , and every element in $A$ is included in at least $d \geq 1$ of the subsets ${S_{j}}_{j = 1}^{M}$ , then

$\begin{matrix} \sum_{j = 1}^{M} f (S_{j}) \geq d f (A) . \end{matrix}$

(46)

The first part of Proposition 4 was pointed out in Section 1.5 of [35], and the second part of Proposition 4 is a generalization of Remark 1 and inequality (47) in [20]. We provide a (somewhat different) proof of Proposition 4a, as well as a self-contained proof of Proposition 4b in Appendix B.

Let

{X_{i}}_{i = 1}^{n}

be discrete random variables, and consider the set function

f : 2^{[n]} \to R_{+}

which is defined as

f (A) = H (X_{A})

for all

A \subseteq [n]

. Since f is a rank function [25], Proposition 4 then specializes to Shearer’s Lemma [7] and a modified version of this lemma (see Remark 1 of [20]).

In light of Proposition 1e and Proposition 4b, Corollaries 4 and 5 are obtained as follows.

Corollary 4.

Let

{X_{i}}_{i = 1}^{n}

be independent discrete random variables,

{S_{j}}_{j = 1}^{M}

be subsets of

[n]

, and

A \subseteq [n]

. If each element in

A

belongs to at least

d \geq 1

of the sets

{S_{j}}_{j = 1}^{M}

, then

\begin{matrix} d H (\sum_{i \in A} X_{i}) \leq \sum_{j = 1}^{M} H (\sum_{i \in S_{j}} X_{i}) . \end{matrix}

(47)

In particular, if every

i \in [n]

is included in at least

d \geq 1

of the subsets

{S_{j}}_{j = 1}^{M}

, then

\begin{matrix} d H (\sum_{i = 1}^{n} X_{i}) \leq \sum_{j = 1}^{M} H (\sum_{i \in S_{j}} X_{i}) . \end{matrix}

(48)

Remark 3.

Inequality (48) is also a special case of [37] (Theorem 2), and they coincide if every element

i \in [n]

is included in a fixed number (d) of the subsets

{S_{j}}_{j = 1}^{M}

.

A specialization of Corollary 4 gives the next result.

Corollary 5.

Let

{X_{i}}_{i = 1}^{n}

be independent and discrete random variables with finite variances. Then, the following holds:

(a): For every $k \in [n - 1]$ ,

$\begin{matrix} H (\sum_{i = 1}^{n} X_{i}) \leq \frac{1}{(\binom{n - 1}{k - 1})} \sum_{T \subseteq [n] : | T | = k} H (\sum_{ω \in T} X_{ω}), \end{matrix}$

(49)

and equivalently,

$\begin{matrix} N (\sum_{i = 1}^{n} X_{i}) \leq {\{\prod_{T \subseteq [n] : | T | = k} N (\sum_{ω \in T} X_{ω})\}}^{\frac{1}{(\binom{n - 1}{k - 1})}} . \end{matrix}$

(50)
(b): For every $k \in [n - 1]$ ,

$\begin{matrix} N (\sum_{i = 1}^{n} X_{i}) \leq \frac{1}{(\binom{n}{k})} \sum_{T \subseteq [n] : | T | = k} N^{\frac{n}{k}} (\sum_{ω \in T} X_{ω}), \end{matrix}$

(51)

where (51) is in general looser than (50), with equivalence if ${X_{i}}_{i = 1}^{n}$ are i.i.d.; in particular,

$\begin{array}{l} (52a) & N (\sum_{i = 1}^{n} X_{i}) & \leq {\{\prod_{j = 1}^{n} N (\sum_{i \neq j} X_{i})\}}^{\frac{1}{n - 1}} \\ (52b) & \leq \frac{1}{n} \sum_{j = 1}^{n} {\{N (\sum_{i \neq j} X_{i})\}}^{\frac{n}{n - 1}} . \end{array}$

Proof.

Let

{S_{j}}_{j = 1}^{M}

be all the k-element subsets of

Ω = [n]

(with

M = (\binom{n}{k})

). Then, every element

i \in [n]

belongs to

d = \frac{k M}{n} = (\binom{n - 1}{k - 1})

such subsets, which then gives (49) as a special case of (48). Alternatively, (49) follows from Corollary 3b, which yields

m_{k}^{(n)} \geq m_{n}^{(n)}

for all

k \in [n - 1]

. Exponentiating both sides of (49) gives (50). Inequality (51) is a loosened version of (50), which follows by invoking the AM-GM inequality (i.e., the geometric mean of nonnegative real numbers is less than or equal to their arithmetic mean, with equality between these two means if and only if these numbers are all equal), in conjunction with the identity

\frac{k}{n} (\binom{n}{k}) = (\binom{n - 1}{k - 1})

. Inequalities (50) and (51) are consequently equivalent if

{X_{i}}_{i = 1}^{n}

are i.i.d. random variables, and (52) is a specialized version of (50) and the loosened inequality (51) by setting

k = n - 1

. □

The next remarks consider information inequalities in Corollaries 3–5, in light of Theorem 1 here, and some known results in the literature.

Remark 4.

Inequality (49) was derived by Madiman as a special case of Theorem 2 in [37]. The proof of Corollary 5a shows that (49) can be also derived in two different ways as special cases of both Theorem 1a and Proposition 4a.

Remark 5.

Inequality (51) can be also derived as a special case of Theorem 1a, where f is the rank function in (19), and

g : R \to R

is given by

g (x) ≜ exp (2 n x)

for all

x \in R

. It also follows from the monotonicity property in Corollary 3c, which yields

w_{k}^{(n)} (n) \geq w_{n}^{(n)} (n)

for all

k \in [n - 1]

.

Remark 6.

The result in Theorem 8 of [31] is a special case of Theorem 1a here, which follows by taking the function g in Theorem 1a to be the identity function. The flexibility in selecting the function g in Theorem 1 enables to obtain a larger collection of information inequalities. This is in part reflected from a comparison of Corollary 3 here with Corollary 9 of [31]. More specifically, the findings about the monotonicity properties in (30), (31) and (33) were obtained in Corollary 9 of [31], while relying on Theorem 8 of [31] and the sub/supermodularity properties of the considered Shannon information measures. It is noted, however, that the monotonicity results of the sequences (34)–(37) (Corollary 3c) are not implied by Theorem 8 of [31].

Remark 7.

Inequality (52) forms a counterpart of an entropy power inequality by Artstein et al., (Theorem 3 of [40]), where for independent random variables

{X_{i}}_{i = 1}^{n}

with finite variances:

\begin{matrix} N (\sum_{i = 1}^{n} X_{i}) \geq \frac{1}{n - 1} \sum_{j = 1}^{n} N (\sum_{i \neq j} X_{i}) . \end{matrix}

(53)

Inequality (50), and also its looser version in (51), form counterparts of the generalized inequality by Madiman and Barron, which reads (see inequality (4) in [41]):

\begin{matrix} N (\sum_{i = 1}^{n} X_{i}) \geq \frac{1}{(\binom{n - 1}{k - 1})} \sum_{T \subseteq [n] : | T | = k} N (\sum_{ω \in T} X_{ω}), k \in [n - 1] . \end{matrix}

(54)

4. Proofs

The present section provides proofs of (most of the) results in Section 3.

4.1. Proof of Theorem 1

We prove Item (a), and then readily prove Items (b)–(d). Define the auxiliary sequence

\begin{matrix} f_{k}^{(n)} ≜ \frac{1}{(\binom{n}{k})} \sum_{T \subseteq Ω : | T | = k} f (T), k \in [0 : n], \end{matrix}

(55)

averaging f over all k-element subsets of the n-element set

Ω ≜ {ω_{1}, \dots, ω_{n}}

. Let the permutation

π : [n] \to [n]

be arbitrary. For

k \in [n - 1]

, let

\begin{matrix} S_{1} ≜ {ω_{π (1)}, \dots, ω_{π (k - 1)}, ω_{π (k)}}, \end{matrix}

(56a)

\begin{matrix} S_{2} ≜ {ω_{π (1)}, \dots, ω_{π (k - 1)}, ω_{π (k + 1)}}, \end{matrix}

(56b)

which are k-element subsets of

Ω

with

k - 1

elements in common. Then,

\begin{matrix} f (S_{1}) + f (S_{2}) \geq f (S_{1} \cup S_{2}) + f (S_{1} \cap S_{2}), \end{matrix}

(57)

which holds by the submodularity of f (by assumption), i.e.,

\begin{matrix} f ({ω_{π (1)}, \dots, ω_{π (k)}}) + f ({ω_{π (1)}, \dots, ω_{π (k - 1)}, ω_{π (k + 1)}}) \\ \geq f ({ω_{π (1)}, \dots, ω_{π (k + 1)}}) + f ({ω_{π (1)}, \dots, ω_{π (k - 1)}}) . \end{matrix}

(58)

Averaging the terms on both sides of (58) over all the

n!

permutations

π

of

[n]

gives

\begin{array}{l} (59a) & \frac{1}{n!} \sum_{π} f ({ω_{π (1)}, \dots, ω_{π (k)}}) & = \frac{k! (n - k)!}{n!} \sum_{T \subseteq Ω : | T | = k} f (T) \\ (59b) & = \frac{1}{(\binom{n}{k})} \sum_{T \subseteq Ω : | T | = k} f (T) \\ (59c) & = f_{k}^{(n)}, \end{array}

and similarly

\begin{array}{l} (60a) & \frac{1}{n!} \sum_{π} f ({ω_{π (1)}, \dots, ω_{π (k - 1)}, ω_{π (k + 1)}}) = f_{k}^{(n)}, \\ (60b) & \frac{1}{n!} \sum_{π} f ({ω_{π (1)}, \dots, ω_{π (k + 1)}}) = f_{k + 1}^{(n)}, \\ (60c) & \frac{1}{n!} \sum_{π} f ({ω_{π (1)}, \dots, ω_{π (k - 1)}}) = f_{k - 1}^{(n)}, \end{array}

with

f_{0}^{(n)} = 0

since by assumption

f (Ø) = 0

. Combining (58)–(60) gives

\begin{matrix} 2 f_{k}^{(n)} \geq f_{k + 1}^{(n)} + f_{k - 1}^{(n)}, k \in [n - 1], \end{matrix}

(61)

which is rewritten as

\begin{matrix} f_{k}^{(n)} - f_{k - 1}^{(n)} \geq f_{k + 1}^{(n)} - f_{k}^{(n)}, k \in [n - 1] . \end{matrix}

(62)

Consequently, it follows that

\begin{array}{l} (63a) & \frac{f_{k}^{(n)}}{k} - \frac{f_{k + 1}^{(n)}}{k + 1} & = \frac{1}{k} \sum_{j = 1}^{k} (f_{j}^{(n)} - f_{j - 1}^{(n)}) - \frac{1}{k + 1} \sum_{j = 1}^{k + 1} (f_{j}^{(n)} - f_{j - 1}^{(n)}) \\ (63b) & = (\frac{1}{k} - \frac{1}{k + 1}) \sum_{j = 1}^{k} (f_{j}^{(n)} - f_{j - 1}^{(n)}) - \frac{1}{k + 1} (f_{k + 1}^{(n)} - f_{k}^{(n)}) \\ (63c) & = \frac{1}{k (k + 1)} \sum_{j = 1}^{k} \{(f_{j}^{(n)} - f_{j - 1}^{(n)}) - (f_{k + 1}^{(n)} - f_{k}^{(n)})\} \\ (63d) & \geq 0, \end{array}

where equality (63a) holds since

f_{0}^{(n)} = 0

, and inequality (63d) holds by (62). The sequence

{\{\frac{f_{k}^{(n)}}{k}\}}_{k = 1}^{n}

is therefore monotonically decreasing, and in particular

\begin{matrix} f_{k}^{(n)} \geq \frac{k f_{n}^{(n)}}{n} = \frac{k}{n} . \end{matrix}

(64)

We next prove (25) when

α = 1

, and then proceed to prove Theorem 1. By (64)

\begin{matrix} \frac{f_{n}^{(n)}}{n} \leq \frac{f_{n - 1}^{(n)}}{n - 1}, \end{matrix}

(65)

where, by (55),

\begin{matrix} f_{n}^{(n)} = f (Ω), f_{n - 1}^{(n)} = \frac{1}{n} \sum_{T \subseteq Ω : | T | = n - 1} f (T) . \end{matrix}

(66)

Combining (65) and (66) gives

\begin{matrix} (n - 1) f (Ω) \leq \sum_{T \subseteq Ω : | T | = n - 1} f (T) . \end{matrix}

(67)

Since there are n subsets

T \subseteq Ω

with

| T | = n - 1

, rearranging terms in (67) gives (25) for

α = 1

; it is should be noted that, for

α = 1

, the set function f does not need to be nonnegative for the satisfiability of (25) (however, this will be required for

α > 1

).

We next prove Item (a). By (20), for

k \in [n]

,

\begin{array}{l} (68a) & t_{k}^{(n)} & = \frac{1}{(\binom{n}{k})} \sum_{T \subseteq Ω : | T | = k} g (\frac{f (T)}{k}) \\ (68b) & = \frac{1}{(\binom{n}{k})} \sum_{T = {t_{1}, \dots, t_{k}} \subseteq Ω} g (\frac{f ({t_{1}, \dots, t_{k}})}{k}) . \end{array}

Fix

Ω_{k} ≜ {t_{1}, \dots, t_{k}} \subseteq Ω

, and let

\tilde{f} : 2^{Ω_{k}} \to R

be the restriction of the function f to the subsets of

Ω_{k}

. Then,

\tilde{f}

is a submodular set function with

\tilde{f} (Ø) = 0

; similarly to (55), (65) and (66) with f replaced by

\tilde{f}

, and n replaced by k, the sequence

{\{\frac{{\tilde{f}}_{j}^{(k)}}{j}\}}_{j = 1}^{k}

is monotonically decreasing. Hence, for

k \in [2 : n]

,

\begin{matrix} \frac{{\tilde{f}}_{k}^{(k)}}{k} \leq \frac{{\tilde{f}}_{k - 1}^{(k)}}{k - 1}, \end{matrix}

(69)

where

\begin{array}{l} (70a) & {\tilde{f}}_{k}^{(k)} & = \tilde{f} (Ω_{k}) = f ({t_{1}, \dots, t_{k}}), \\ (70b) & {\tilde{f}}_{k - 1}^{(k)} & = \frac{1}{k} \sum_{T \subseteq Ω_{k} : | T | = k - 1} \tilde{f} (T) \\ (70c) & = \frac{1}{k} \sum_{T \subseteq Ω_{k} : | T | = k - 1} f (T) \\ (70d) & = \frac{1}{k} \sum_{i = 1}^{k} f ({t_{1}, \dots, t_{i - 1}, t_{i + 1}, \dots, t_{k}}) . \end{array}

Combining (69) and (70) gives

\begin{matrix} f ({t_{1}, \dots, t_{k}}) \leq \frac{1}{k - 1} \sum_{i = 1}^{k} f ({t_{1}, \dots, t_{i - 1}, t_{i + 1}, \dots, t_{k}}), \end{matrix}

(71)

and, since by assumption g is monotonically increasing,

\begin{matrix} g (\frac{f ({t_{1}, \dots, t_{k}})}{k}) \leq g (\frac{1}{k} \sum_{i = 1}^{k} \frac{f ({t_{1}, \dots, t_{i - 1}, t_{i + 1}, \dots, t_{k}})}{k - 1}) . \end{matrix}

(72)

From (68) and (72), for all

k \in [2 : n]

,

\begin{matrix} t_{k}^{(n)} \leq \frac{1}{(\binom{n}{k})} \sum_{T = {t_{1}, \dots, t_{k}} \subseteq Ω} g (\frac{1}{k} \sum_{i = 1}^{k} \frac{f ({t_{1}, \dots, t_{i - 1}, t_{i + 1}, \dots, t_{k}})}{k - 1}), \end{matrix}

(73)

and

\begin{array}{l} (74a) & t_{k}^{(n)} & \leq \frac{1}{k (\binom{n}{k})} \sum_{i = 1}^{k} \sum_{T = {t_{1}, \dots, t_{k}} \subseteq Ω} g (\frac{f ({t_{1}, \dots, t_{i - 1}, t_{i + 1}, \dots, t_{k}})}{k - 1}) \\ (74b) & = \frac{n - k + 1}{k (\binom{n}{k})} \sum_{i = 1}^{k} \{\sum_{{t_{1}, \dots, t_{i - 1}, t_{i + 1}, \dots, t_{k}} \subseteq Ω} g (\frac{f ({t_{1}, \dots, t_{i - 1}, t_{i + 1}, \dots, t_{k}})}{k - 1})\} \\ (74c) & = \frac{k! (n - k)! (n - k + 1)}{n! k} \sum_{S \subseteq Ω : | S | = k - 1} g (\frac{f (S)}{k - 1}) \\ (74d) & = \frac{(k - 1)! (n - k + 1)!}{n!} \sum_{S \subseteq Ω : | S | = k - 1} g (\frac{f (S)}{k - 1}) \\ (74e) & = \frac{1}{(\binom{n}{k - 1})} \sum_{S \subseteq Ω : | S | = k - 1} g (\frac{f (S)}{k - 1}) \\ (74f) & = t_{k - 1}^{(n)}, \end{array}

where (74a) holds by invoking Jensen’s inequality to the convex function g; (74b) holds since the term of the inner summation in the right-hand side of (74a) does not depend on

t_{i}

, so for every

(k - 1)

-element subset

S = {t_{1}, \dots, t_{i - 1}, t_{i + 1}, \dots, t_{k}} \subseteq Ω

, there are

n - k + 1

possibilities to extend it by a single element

(t_{i})

into a k-element subset

T = {t_{1}, \dots, t_{k}} \subseteq Ω

; (74e) is straightforward, and (74f) holds by the definition in (20). This proves Item (a).

Item (b) follows from Item (a), and similarly Item (d) follows from Item (c), by replacing g with

- g

. Item (c) is next verified. If f is a supermodular set function with

f (Ø) = 0

, then (57) and (58), and (61)–(63) hold with flipped inequality signs. Hence, if g is monotonically decreasing, then inequalities (72) and (73) are reversed; finally, if g is also concave, then (by Jensen’s inequality) (74) holds with a flipped inequality sign, which proves Item (c).

4.2. Proof of Corollary 1

By assumption

f : 2^{Ω} \to R

is a rank function, which implies that

0 \leq f (T) \leq f (Ω)

for every

T \subseteq Ω

. Since (by definition) f is submodular with

f (Ø) = 0

, and (by assumption) the function g is convex and monotonically increasing, then (from (22), while replacing k with

k_{n}

)

\begin{matrix} (\binom{n}{k_{n}}) g (\frac{f (Ω)}{n}) \leq \sum_{T \subseteq Ω : | T | = k_{n}} g (\frac{f (T)}{k_{n}}) \leq (\binom{n}{k_{n}}) g (\frac{f (Ω)}{k_{n}}), n \in N . \end{matrix}

(75)

By the second assumption in Corollary 1, for positive values of x that are sufficiently close to zero, we have

$g (x) \approx g (0) > 0$ if $g (0) > 0$ ;
$g (x)$ scales like $\frac{1}{ℓ!} g^{(ℓ)} (0) x^{ℓ}$ if $g (0) = \dots = g^{(ℓ - 1)} (0) = 0$ with $g^{(ℓ)} (0) > 0$ for some $ℓ \in N$ .

In both cases, it follows that

\begin{matrix} lim_{x \to 0^{+}} x log g (x) = 0 . \end{matrix}

(76)

In light of (75) and (76), and since (by assumption)

k_{n} \underset{n \to \infty}{⟶} \infty

, it follows that

\begin{matrix} lim_{n \to \infty} \frac{1}{n} [log (\sum_{T \subseteq Ω : | T | = k_{n}} g (\frac{f (T)}{k_{n}})) - log (\binom{n}{k_{n}})] = 0 . \end{matrix}

(77)

By the following upper and lower bounds on the binomial coefficient:

\begin{matrix} \frac{1}{n + 1} exp (n H_{b} (\frac{k_{n}}{n})) \leq (\binom{n}{k_{n}}) \leq exp (n H_{b} (\frac{k_{n}}{n})), \end{matrix}

(78)

the combination of equalities (77) and (78) gives equality (23). Equality (24) holds as a special case of (23), under the assumption that

lim_{n \to \infty} \frac{k_{n}}{n} = β \in [0, 1]

.

4.3. Proof of Corollary 2

For

α = 1

, Corollary 2 is proved in (67). Fix

α > 1

, and let

g : R \to R

be

\begin{matrix} g (x) ≜ {\begin{matrix} x^{α}, & x \geq 0, \\ 0, & x < 0, \end{matrix} \end{matrix}

(79)

which is monotonically increasing and convex on the real line. By Theorem 1a,

\begin{matrix} t_{k}^{(n)} \geq t_{n}^{(n)}, k \in [n] . \end{matrix}

(80)

Since by assumption f is nonnegative, it follows from (20) and (79) that

\begin{matrix} t_{k}^{(n)} & = \frac{1}{(\binom{n}{k})} \sum_{T \subseteq Ω : | T | = k} g (\frac{f (T)}{k}) \end{matrix}

(81a)

\begin{matrix} = \frac{1}{k^{α} (\binom{n}{k})} \sum_{T \subseteq Ω : | T | = k} f^{α} (T) . \end{matrix}

(81b)

Combining (80)–(81) and rearranging terms gives, for all

α > 1

,

\begin{array}{l} (82a) & \sum_{T \subseteq Ω : | T | = k} f^{α} (T) & \geq {(\frac{k}{n})}^{α} (\binom{n}{k}) f^{α} (Ω) \\ (82b) & = {(\frac{k}{n})}^{α - 1} (\binom{n - 1}{k - 1}) f^{α} (Ω), \end{array}

where equality (82b) holds by the identity

\frac{k}{n} (\binom{n}{k}) = (\binom{n - 1}{k - 1})

. This further gives

\begin{array}{l} (83a) & \sum_{T \subseteq Ω : | T | = k} (f^{α} (Ω) - f^{α} (T)) & = (\binom{n}{k}) f^{α} (Ω) - \sum_{T \subseteq Ω : | T | = k} f^{α} (T) \\ (83b) & \leq (1 - \frac{k^{α}}{n^{α}}) (\binom{n}{k}) f^{α} (Ω) \\ (83c) & = c_{α} (n, k) f^{α} (Ω), \end{array}

where equality (83c) holds by the definition in (26). This proves (25) for

α > 1

.

We next prove Item (b). The function f is (by assumption) a rank function, which yields its nonnegativity. Hence, the leftmost inequality in (27) holds by (82). The rightmost inequality in (27) also holds since

f : 2^{Ω} \to R

is monotonically increasing, which yields

f (T) \leq f (Ω)

for all

T \subseteq Ω

. For

k \in [n]

and

α \geq 0

(in particular, for

α \geq 1

),

\begin{matrix} \sum_{T \subseteq Ω : | T | = k} f^{α} (T) \leq (\binom{n}{k}) f^{α} (Ω), \end{matrix}

(84)

where (84) holds since there are

(\binom{n}{k})

k-element subsets

T

of the n-element set

Ω

, and every summand

f^{α} (T)

(with

T \subseteq Ω

) is upper bounded by

f^{α} (Ω)

.

5. A Problem in Extremal Graph Theory

This section applies the generalization of Han’s inequality in (28) to the following problem.

5.1. Problem Formulation

Let

A \subseteq {- 1, 1}^{n}

, with

n \in N

, and let

τ \in [n]

. Let

G = G_{A, τ}

be an un-directed simple graph with vertex set

V (G) = A

, and pairs of vertices in G are adjacent (i.e., connected by an edge) if and only if they are represented by vectors in

A

whose Hamming distance is less than or equal to

τ

:

\begin{matrix} {x^{n}, y^{n}} \in E (G) \Leftrightarrow (x^{n}, y^{n} \in A, x^{n} \neq y^{n}, d_{H} (x^{n}, y^{n}) \leq τ) . \end{matrix}

(85)

The question is how large can the size of G be (i.e., how many edges it may have) as a function of the cardinality of the set

A

, and possibly based also on some basic properties of the set

A

?

This problem and its related analysis generalize and refine, in a nontrivial way, the bound in Theorem 4.2 of [6] which applies to the special case where

τ = 1

. The motivation for this extension is next considered.

5.2. Problem Motivation

Constraint coding is common in many data recording systems and data communication systems, where some sequences are more prone to error than others, and a constraint on the sequences that are allowed to be recorded or transmitted is imposed in order to reduce the likelihood of error. Given such a constraint, it is then necessary to encode arbitrary user sequences into sequences that obey the constraint.

From an information–theoretic perspective, this problem can be interpreted as follows. Consider a communication channel W

: X \to Y

with input alphabet

X

and output alphabet

Y

, and suppose that a constraint is imposed on the sequences that are allowed to be transmitted over the channel. As a result of such a constraint, the information sequences are first encoded into codewords by an error-correction encoder, followed by a constrained encoder that maps these codewords into constrained sequences. Let them be binary n-length sequences from the set

A \subseteq {- 1, 1}^{n}

. A channel modulator then modulates these sequences into symbols from

X

, and the received sequences at the channel output, with alphabet

Y

, are first demodulated, and then decoded (in a reverse order of the encoding process) by the constrained decoder and error-correction decoder.

Consider a channel model where pairs of binary n-length sequences from the set

A

whose Hamming distance is less than or equal to a fixed number

τ

share a common output sequence with positive probability, whereas this halts to be the case if the Hamming distance is larger than

τ

. In other words, we assume that by design, pairs of sequences in

A

whose Hamming distance is larger than

τ

cannot be confused in the sense that there does not exist a common output sequence which may be possibly received (with positive probability) at the channel output.

The confusion graph G that is associated with this setup is an undirected simple graph whose vertices represent the n-length binary sequences in

A

, and pairs of vertices are adjacent if and only if the Hamming distance between the sequences that they represent is not larger than

τ

. The size of G (i.e., its number of edges) is equal to the number of pairs of sequences in

A

which may not be distinguishable by the decoder.

Further motivation for studying this problem is considered in the continuation (see Section 5.5).

5.3. Analysis

We next derive an upper bound on the size of the graph G. Let

X^{n} = (X_{1}, \dots, X_{n})

be chosen uniformly at random from the set

A \subseteq {- 1, 1}^{n}

, and let

P_{X^{n}}

be the PMF of

X^{n}

. Then,

\begin{matrix} P_{X^{n}} (x^{n}) = {\begin{matrix} \frac{1}{| A |}, & if x^{n} \in A, \\ 0, & if x^{n} \notin A, \end{matrix} \end{matrix}

(86)

which implies that

\begin{matrix} H (X^{n}) = log | A | . \end{matrix}

(87)

The graph G is an un-directed and simple graph with a vertex set

V (G) = A

(i.e., the vertices of G are in one-to-one correspondence with the binary vectors in the set

A

). Its set of edges

E (G)

are the edges which connect all pairs of vertices in G whose Hamming distance is less than or equal to

τ

. For

d \in [τ]

, let

E_{d} (G)

be the set of edges in G which connect all pairs of vertices in G whose Hamming distance is equal to d, so

| E (G) | = \sum_{d = 1}^{τ} | E_{d} (G) | .

(88)

For

x^{n} \in {- 1, 1}^{n}

,

d \in [n]

, and integers

k_{1}, \dots, k_{d}

such that

1 \leq k_{1} < \dots < k_{d} \leq n

, let

\begin{matrix} {\tilde{x}}^{(k_{1}, \dots, k_{d})} ≜ (x_{1}, \dots, x_{k_{1} - 1}, x_{k_{1} + 1}, \dots, x_{k_{d} - 1}, x_{k_{d} + 1}, \dots, x_{n}) \end{matrix}

(89)

be a subvector of

x^{n}

of length

n - d

, obtained by dropping the bits of

x^{n}

in positions

k_{1}, \dots, k_{d}

; if

d = n

, then

(k_{1}, \dots, k_{n}) = (1, \dots, n)

, and

{\tilde{x}}^{(k_{1}, \dots, k_{d})}

is an empty vector. By the chain rule for the Shannon entropy,

\begin{array}{l} H (X^{n}) & - H ({\tilde{X}}^{(k_{1}, \dots, k_{d})}) \\ (90a) & = & H (X_{k_{1}}, \dots, X_{k_{d}} | {\tilde{X}}^{(k_{1}, \dots, k_{d})}) \\ (90b) & = & - \sum_{x^{n} \in {- 1, 1}^{n}} P_{X^{n}} (x^{n}) log (P_{X_{k_{1}}, \dots, X_{k_{d}} | {\tilde{X}}^{(k_{1}, \dots, k_{d})}} (x_{k_{1}}, \dots, x_{k_{d}} | {\tilde{x}}^{(k_{1}, \dots, k_{d})})) \\ (90c) & = & - \frac{1}{| A |} \sum_{x^{n} \in A} log (P_{X_{k_{1}}, \dots, X_{k_{d}} | {\tilde{X}}^{(k_{1}, \dots, k_{d})}} (x_{k_{1}}, \dots, x_{k_{d}} | {\tilde{x}}^{(k_{1}, \dots, k_{d})})), \end{array}

where equality (90c) holds by (86).

For

x^{n} \in {- 1, 1}^{n}

,

d \in [n]

, and integers

k_{1}, \dots, k_{d}

such that

1 \leq k_{1} < \dots < k_{d} \leq n

, let

\begin{matrix} {\bar{x}}^{(k_{1}, \dots, k_{d})} ≜ (x_{1}, \dots, x_{k_{1} - 1}, - x_{k_{1}}, x_{k_{1} + 1}, \dots, x_{k_{d} - 1}, - x_{k_{d}}, x_{k_{d} + 1}, \dots, x_{n}), \end{matrix}

(91)

where the bits of

x^{n}

in position

k_{1}, \dots, k_{d}

are flipped (in contrast to

{\tilde{x}}^{(k_{1}, \dots, k_{d})}

where the bits of

x^{n}

in these positions are dropped), so

{\bar{x}}^{(k_{1}, \dots, k_{d})} \in {- 1, 1}^{n}

and

d_{H} (x^{n}, {\bar{x}}^{(k_{1}, \dots, k_{d})}) = d

. Likewise, if

x^{n}, y^{n} \in {- 1, 1}^{n}

satisfy

d_{H} (x^{n}, y^{n}) = d

, then there exist integers

k_{1}, \dots, k_{d}

such that

1 \leq k_{1} < \dots < k_{d} \leq n

where

y^{n} = {\bar{x}}^{(k_{1}, \dots, k_{d})}

(i.e., the integers

k_{1}, \dots, k_{d}

are the positions (in increasing order) where the vectors

x^{n}

and

y^{n}

differ).

Let us characterize the set

A

by its cardinality, and the following two natural numbers:

(a): If $x^{n} \in A$ and ${\bar{x}}^{(k_{1}, \dots, k_{d})} \in A$ for any $(k_{1}, \dots, k_{d})$ such that $1 \leq k_{1} < \dots < k_{d} \leq n$ , then there are at least $m_{d} ≜ m_{d} (A)$ vectors $y \in A$ whose subvectors ${\tilde{y}}^{(k_{1}, \dots, k_{d})}$ coincide with ${\tilde{x}}^{(k_{1}, \dots, k_{d})}$ , i.e., the integer $m_{d} \geq 2$ satisfies

$\begin{matrix} m_{d} \leq min_{\begin{matrix} x^{n} \in A, \\ 1 \leq k_{1} < \dots < k_{d} \leq n \end{matrix}} |\{y^{n} \in A : {\tilde{y}}^{(k_{1}, \dots, k_{d})} = {\tilde{x}}^{(k_{1}, \dots, k_{d})}, {\bar{x}}^{(k_{1}, \dots, k_{d})} \in A\}| . \end{matrix}$

(92)

By definition, the integer $m_{d}$ always exists, and

$\begin{matrix} 2 \leq m_{d} \leq min {2^{d}, | A |} . \end{matrix}$

(93)

If no information is available about the value of $m_{d}$ , then it can be taken by default to be equal to 2 (since by assumption the two vectors $x^{n} \in A$ and $y^{n} ≜ {\bar{x}}^{(k_{1}, \dots, k_{d})} \in A$ satisfy the equality ${\tilde{y}}^{(k_{1}, \dots, k_{d})} = {\tilde{x}}^{(k_{1}, \dots, k_{d})}$ ).
(b): If $x^{n} \in A$ and ${\bar{x}}^{(k_{1}, \dots, k_{d})} \notin A$ for any $(k_{1}, \dots, k_{d})$ such that $1 \leq k_{1} < \dots < k_{d} \leq n$ , then there are at least $ℓ_{d} ≜ ℓ_{d} (A)$ vectors $y^{n} \in A$ whose subvectors ${\tilde{y}}^{(k_{1}, \dots, k_{d})}$ coincide with ${\tilde{x}}^{(k_{1}, \dots, k_{d})}$ , i.e., the integer $ℓ_{d} \geq 1$ satisfies

$\begin{matrix} ℓ_{d} \leq min_{\begin{matrix} x^{n} \in A, \\ 1 \leq k_{1} < \dots < k_{d} \leq n \end{matrix}} |\{y^{n} \in A : {\tilde{y}}^{(k_{1}, \dots, k_{d})} = {\tilde{x}}^{(k_{1}, \dots, k_{d})}, {\bar{x}}^{(k_{1}, \dots, k_{d})} \notin A\}| . \end{matrix}$

(94)

By definition, the integer $ℓ_{d}$ always exists, and

$\begin{matrix} 1 \leq ℓ_{d} \leq min {2^{d} - 1, | A | - 1} . \end{matrix}$

(95)

Likewise, if no information is available about the value of $ℓ_{d}$ , then it can be taken by default to be equal to 1 (since $x^{n} \in A$ satisfies the requirement about its subvector ${\tilde{x}}^{(k_{1}, \dots, k_{d})}$ in (94)).

In general, it would be preferable to have the largest possible values of

m_{d}

and

ℓ_{d}

(i.e., those satisfying inequalities (92) and (94) with equalities, for obtaining a better upper bound on the size of G (this point will be clarified in the sequel). If

d = 1

, then

m_{d} = 2

and

ℓ_{d} = 1

are the best possible constants (this holds by the definitions in (92) and (94), which can be also verified by the coincidence of the upper and lower bounds in (93) for

d = 1

, as well as those in (95)).

If

x^{n} \in A

, then we distinguish between the following two cases:

If ${\bar{x}}^{(k_{1}, \dots, k_{d})} \in A$ , then

$\begin{matrix} P_{X_{k_{1}}, \dots, X_{k_{d}} | {\tilde{X}}^{(k_{1}, \dots, k_{d})}} (x_{k_{1}}, \dots, x_{k_{d}} | {\tilde{x}}^{(k_{1}, \dots, k_{d})}) \leq \frac{1}{m_{d}}, \end{matrix}$

(96)

which holds by the way that $m_{d}$ is defined in (92), and since $X^{n}$ is randomly selected to be equiprobable in the set $A$ .
If ${\bar{x}}^{(k_{1}, \dots, k_{d})} \notin A$ , then

$\begin{matrix} P_{X_{k_{1}}, \dots, X_{k_{d}} | {\tilde{X}}^{(k_{1}, \dots, k_{d})}} (x_{k_{1}}, \dots, x_{k_{d}} | {\tilde{x}}^{(k_{1}, \dots, k_{d})}) \leq \frac{1}{ℓ_{d}}, \end{matrix}$

(97)

which holds by the way that $ℓ_{d}$ is defined in (94), and since $X^{n}$ is equiprobable on $A$ .

For

d \in [τ]

and

1 \leq k_{1} < \dots < k_{d} \leq n

, it follows from (90), (96) and (97) that

\begin{matrix} H (X^{n}) - H ({\tilde{X}}^{(k_{1}, \dots, k_{d})}) & \geq \frac{log m_{d}}{| A |} \sum_{x^{n}} 1 {x^{n} \in A, {\bar{x}}^{(k_{1}, \dots, k_{d})} \in A} \\ + \frac{log ℓ_{d}}{| A |} \sum_{x^{n}} 1 {x^{n} \in A, {\bar{x}}^{(k_{1}, \dots, k_{d})} \notin A}, \end{matrix}

(98)

which, by summing on both sides of inequality (98) over all integers

k_{1}, \dots, k_{d}

such that

1 \leq k_{1} < \dots < k_{d} \leq n

, yields

\begin{matrix} \sum_{\begin{matrix} (k_{1}, \dots, k_{d}) : \\ 1 \leq k_{1} < \dots < k_{d} \leq n \end{matrix}} (H (X^{n}) - H ({\tilde{X}}^{(k_{1}, \dots, k_{d})})) \\ \geq \frac{log m_{d}}{| A |} \sum_{\begin{matrix} (k_{1}, \dots, k_{d}) : \\ 1 \leq k_{1} < \dots < k_{d} \leq n \end{matrix}} \sum_{x^{n}} 1 {x^{n} \in A, {\bar{x}}^{(k_{1}, \dots, k_{d})} \in A} \\ + \frac{log ℓ_{d}}{| A |} \sum_{\begin{matrix} (k_{1}, \dots, k_{d}) : \\ 1 \leq k_{1} < \dots < k_{d} \leq n \end{matrix}} \sum_{x^{n}} 1 {x^{n} \in A, {\bar{x}}^{(k_{1}, \dots, k_{d})} \notin A} . \end{matrix}

(99)

Equality holds in (99) if the minima on the RHS of (92) and (94) are attained by any element in these sets, and if (92) and (94) are satisfied with equalities (i.e.,

m_{d}

and

ℓ_{d}

are the maximal integers to satisfy inequalities (92) and (94) for the given set

A

). Hence, this equality holds in particular for

d = 1

, with the constants

m_{d} = 2

and

ℓ_{d} = 1

.

The double sum in the first term on the RHS of (99) is equal to

\begin{matrix} \sum_{\begin{matrix} (k_{1}, \dots, k_{d}) : \\ 1 \leq k_{1} < \dots < k_{d} \leq n \end{matrix}} \sum_{x^{n}} 1 {x^{n} \in A, {\bar{x}}^{(k_{1}, \dots, k_{d})} \in A} = 2 |E_{d} (G)|, \end{matrix}

(100)

since every pair of adjacent vertices in

G

that refer to vectors in

A

whose Hamming distance is equal to d is of the form

x^{n} \in A

and

{\bar{x}}^{(k_{1}, \dots, k_{d})} \in A

, and vice versa, and every edge

{x^{n}, {\bar{x}}^{(k_{1}, \dots, k_{d})}} \in E_{d} (G)

is counted twice in the double summation on the LHS of (100). For calculating the double sum in the second term on the RHS of (99), we first calculate the sum of these two double summations:

\begin{matrix} \sum_{\begin{matrix} (k_{1}, \dots, k_{d}) : \\ 1 \leq k_{1} < \dots < k_{d} \leq n \end{matrix}} \sum_{x^{n}} 1 {x^{n} \in A, {\bar{x}}^{(k_{1}, \dots, k_{d})} \in A} + \sum_{\begin{matrix} (k_{1}, \dots, k_{d}) : \\ 1 \leq k_{1} < \dots < k_{d} \leq n \end{matrix}} \sum_{x^{n}} 1 {x^{n} \in A, {\bar{x}}^{(k_{1}, \dots, k_{d})} \notin A} \end{matrix}

\begin{array}{l} (101a) & = \sum_{\begin{matrix} (k_{1}, \dots, k_{d}) : \\ 1 \leq k_{1} < \dots < k_{d} \leq n \end{matrix}} \sum_{x^{n}} \{1 {x^{n} \in A, {\bar{x}}^{(k_{1}, \dots, k_{d})} \in A} + 1 {x^{n} \in A, {\bar{x}}^{(k_{1}, \dots, k_{d})} \notin A}\} \\ (101b) & = \sum_{\begin{matrix} (k_{1}, \dots, k_{d}) : \\ 1 \leq k_{1} < \dots < k_{d} \leq n \end{matrix}} \sum_{x^{n}} 1 {x^{n} \in A} \\ (101c) & = \sum_{\begin{matrix} (k_{1}, \dots, k_{d}) : \\ 1 \leq k_{1} < \dots < k_{d} \leq n \end{matrix}} | A | \\ (101d) & = (\binom{n}{d}) | A |, \end{array}

so, subtracting (100) from (101d) gives that

\begin{matrix} \sum_{\begin{matrix} (k_{1}, \dots, k_{d}) : \\ 1 \leq k_{1} < \dots < k_{d} \leq n \end{matrix}} \sum_{x^{n}} 1 {x^{n} \in A, {\bar{x}}^{(k_{1}, \dots, k_{d})} \notin A} = (\binom{n}{d}) | A | - 2 |E_{d} (G)| . \end{matrix}

(102)

Substituting (100) and (102) into the RHS of (99) gives that, for all

d \in [τ]

,

\begin{array}{l} \sum_{\begin{matrix} (k_{1}, \dots, k_{d}) : \\ 1 \leq k_{1} < \dots < k_{d} \leq n \end{matrix}} (H (X^{n}) - H ({\tilde{X}}^{(k_{1}, \dots, k_{d})})) \\ (103a) & \geq \frac{2 | E_{d} (G) | log m_{d}}{| A |} + \frac{log ℓ_{d}}{| A |} [(\binom{n}{d}) | A | - 2 |E_{d} (G)|] \\ (103b) & = (\binom{n}{d}) log ℓ_{d} + \frac{2 | E_{d} (G) |}{| A |} log \frac{m_{d}}{ℓ_{d}}, \end{array}

with the same necessary and sufficient condition for equality in (103a) as in (99). (Recall that it is in particular an equality for

d = 1

, where in this case

m_{1} = 2

and

ℓ_{1} = 1

.)

By the generalized Han’s inequality in (28),

\begin{array}{l} (104a) & \sum_{\begin{matrix} (k_{1}, \dots, k_{d}) : \\ 1 \leq k_{1} < \dots < k_{d} \leq n \end{matrix}} (H (X^{n}) - H ({\tilde{X}}^{(k_{1}, \dots, k_{d})})) & \leq (\binom{n - 1}{d - 1}) H (X^{n}) \\ (104b) & = (\binom{n - 1}{d - 1}) log | A |, \end{array}

where equality (104b) holds by (87). Combining (103) and (104) yields

\begin{matrix} (\binom{n - 1}{d - 1}) log | A | \geq (\binom{n}{d}) log ℓ_{d} + \frac{2 | E_{d} (G) |}{| A |} log \frac{m_{d}}{ℓ_{d}}, \end{matrix}

(105)

and, by the identity

(\binom{n}{d}) = \frac{n}{d} (\binom{n - 1}{d - 1})

, we get

\begin{matrix} | E_{d} (G) | \leq \frac{(\binom{n - 1}{d - 1}) | A | (log | A | - \frac{n}{d} log ℓ_{d})}{2 log \frac{m_{d}}{ℓ_{d}}} . \end{matrix}

(106)

This upper bound is specialized, for

d = 1

, to Theorem 4.2 of [6] (where, by definition,

m_{1} = 2

and

ℓ_{1} = 1

). This gives that the number of edges in G, connecting pairs of vertices which refer to binary vectors in

A

whose Hamming distance is 1 from each other, satisfies

\begin{matrix} | E_{1} (G) | \leq \frac{1}{2} | A | {log}_{2} | A | . \end{matrix}

(107)

It is possible to select, by default, the values of the integers

m_{d}

and

ℓ_{d}

to be equal to 2 and 1, respectively, independently of the value of

d \in [τ]

. It therefore follows that the upper bound in (106) can be loosened to

\begin{matrix} | E_{d} (G) | \leq \frac{1}{2} (\binom{n - 1}{d - 1}) | A | {log}_{2} | A | . \end{matrix}

(108)

This shows that the bound in (108) generalizes the result in Theorem 4.2 of [6], based only on the knowledge of the cardinality of

A

. Furthermore, the bound (108) can be tightened by the refined bound (106) if the characterization of the set

A

allows one to assert values for

m_{d}

and

ℓ_{d}

that are larger than the trivial values of 2 and 1, respectively.

In light of (88) and (108), the number of edges in the graph G satisfies

\begin{matrix} | E (G) | \leq \frac{1}{2} \sum_{d = 1}^{τ} (\binom{n - 1}{d - 1}) | A | {log}_{2} | A |, \end{matrix}

(109)

and if

τ \leq \frac{n + 1}{2}

, then it follows that

\begin{matrix} | E (G) | \leq \frac{1}{2} exp ((n - 1) H_{b} (\frac{τ - 1}{n - 1})) | A | {log}_{2} | A | . \end{matrix}

(110)

Indeed, the transition from (109) to (110) holds by the inequality

\begin{matrix} \sum_{k = 0}^{n θ} (\binom{n}{k}) \leq exp (n H_{b} (θ)), θ \in [0, \frac{1}{2}], \end{matrix}

(111)

where the latter bound is asymptotically tight in the exponent of n (for sufficiently large values of n).

5.4. Comparison of Bounds

We next consider the tightness of the refined bound (106) and the loosened bound (108). Since

A

is a subset of the n-dimensional cube

{- 1, 1}^{n}

, every point in

A

has at most

(\binom{n}{d})

neighbors in

A

with Hamming distance d, so

\begin{matrix} | E_{d} (G) | \leq \frac{1}{2} (\binom{n}{d}) | A | . \end{matrix}

(112)

Comparing the bound on the RHS of (106) with the trivial bound in (112) shows that the former bound is useful if and only if

\begin{matrix} \frac{log | A | - \frac{n}{d} log ℓ_{d}}{log \frac{m_{d}}{ℓ_{d}}} \leq \frac{n}{d}, \end{matrix}

(113)

which is obtained by relying on the identity

(\binom{n}{d}) = \frac{n}{d} (\binom{n - 1}{d - 1})

. Rearranging terms in (113) gives the necessary and sufficient condition

\begin{matrix} | A | \leq {(m_{d})}^{\frac{n}{d}}, \end{matrix}

(114)

which is independent of the value of

ℓ_{d}

. Since, by definition,

m_{d} \geq 2

, inequality (114) is automatically satisfied if the stronger condition

\begin{matrix} | A | \leq 2^{\frac{n}{d}} \end{matrix}

(115)

is imposed. The latter also forms a necessary and sufficient condition for the usefulness of the looser bound on the RHS of (108) in comparison to (112).

Example 1.

Suppose that the set

A \subseteq {- 1, 1}^{n}

is characterized by the property that for all

d \in [τ]

, with a fixed integer

τ \in [n]

, if

x^{n} \in A

and

{\bar{x}}^{(k_{1}, \dots, k_{d})} \in A

then all vectors

y^{n} \in {- 1, 1}^{n}

which coincide with

x^{n}

and

{\bar{x}}^{(k_{1}, \dots, k_{d})}

in their

(n - d)

agreed positions are also included in the set

A

. Then, for all

d \in [τ]

, we get by definition that

m_{d} = 2^{d}

, which yields

τ \leq ⌊ {log}_{2} | A | ⌋

. Setting

m_{d} = 2^{d}

and the default value

ℓ_{d} = 1

on the RHS of (106) gives

\begin{array}{l} (116a) & | E_{d} (G) | & \leq \frac{(\binom{n - 1}{d - 1}) | A | (log | A | - \frac{n}{d} log ℓ_{d})}{2 log \frac{m_{d}}{ℓ_{d}}} \\ (116b) & = \frac{(\binom{n - 1}{d - 1}) | A | log | A |}{2 log (2^{d})} \\ (116c) & = \frac{1}{2 d} (\binom{n - 1}{d - 1}) | A | {log}_{2} | A | \\ (116d) & = \frac{1}{2} (\binom{n}{d}) | A | \cdot \frac{{log}_{2} | A |}{n} . \end{array}

Unless

A = {- 1, 1}^{n}

, the upper bound on the RHS of (116d) is strictly smaller than the trivial upper bound on the RHS of (112). This improvement is consistent with the satisfiability of the (necessary and sufficient) condition in (115), which is strictly satisfied since

\begin{matrix} | A | < 2^{n} = {(2^{d})}^{\frac{n}{d}} = {(m_{d})}^{\frac{n}{d}} . \end{matrix}

(117)

On the other hand, the looser upper bound on the RHS of (108) gives

\begin{matrix} | E_{d} (G) | \leq \frac{1}{2} (\binom{n}{d}) | A | \cdot \frac{d {log}_{2} | A |}{n}, \end{matrix}

(118)

which is d times larger than the refined bound on the RHS of (116d) (since it is based on the exact value of

m_{d}

for the set

A

, rather than taking the default value of 2), and it is worse than the trivial bound if and only if

| A | > 2^{\frac{n}{d}}

. The latter finding is consistent with (115).

This exemplifies the utility of the refined upper bound on the RHS of (106) in comparison to the bound on the RHS of (108), where the latter generalizes Theorem 4.2 of [6] from the case where

d = 1

to all

d \in [n]

. As it is explained above, this refinement is irrelevant in the special case where

d = 1

, though it proves to be useful in general for

d \in [2 : n]

(as it is exemplified here).

The following theorem introduces the results of our analysis (so far) in the present section.

Theorem 2.

Let

A \subseteq {- 1, 1}^{n}

, with

n \in N

, and let

τ \in [n]

. Let

G = (V (G), E (G))

be an un-directed, simple graph with vertex set

V (G) = A

, and edges connecting pairs of vertices in G which are represented by vectors in

A

whose Hamming distance is less than or equal to τ. For

d \in [τ]

, let

E_{d} (G)

be the set of edges in G which connect all pairs of vertices that are represented by vectors in

A

whose Hamming distance is equal to d (i.e.,

| E (G) | = \sum_{d = 1}^{τ} | E_{d} (G) |

).

(a): For $d \in [τ]$ , let the integers $m_{d} \in [2 : min {2^{d}, | A |}]$ and $ℓ_{d} \in [min {2^{d} - 1, | A | - 1}]$ (be, preferably, the maximal possible values to) satisfy the requirements in (92) and (94), respectively. Then,

$\begin{matrix} | E_{d} (G) | \leq \frac{(\binom{n - 1}{d - 1}) | A | (log | A | - \frac{n}{d} log ℓ_{d})}{2 log \frac{m_{d}}{ℓ_{d}}} . \end{matrix}$

(119)
(b): A loosened bound, which only depends on the cardinality of the set $A$ , is obtained by setting the default values $m_{d} = 2$ and $ℓ_{d} = 1$ . It is then given by

$\begin{matrix} | E_{d} (G) | \leq \frac{1}{2} (\binom{n - 1}{d - 1}) | A | {log}_{2} | A |, d \in [τ], \end{matrix}$

(120)

and, if $τ \leq \frac{n + 1}{2}$ , then the (overall) number of edges in G satisfies

$\begin{matrix} | E (G) | \leq \frac{1}{2} exp ((n - 1) H_{b} (\frac{τ - 1}{n - 1})) | A | {log}_{2} | A | . \end{matrix}$

(121)
(c): The refined upper bound on the RHS of (119) and the loosened upper bound on the RHS of (120) improve the trivial bound $\frac{1}{2} (\binom{n}{d}) | A |$ , if and only if $| A | < {(m_{d})}^{\frac{n}{d}}$ or $| A | < 2^{\frac{n}{d}}$ , respectively (see Example 1).

5.5. Influence of Fixed-Size Subsets of Bits

The result in Theorem 4.2 of [6], which is generalized and refined in Theorem 2 here, is turned to study the total influence of the n variables of an equiprobable random vector

X^{n} \in {- 1, 1}^{n}

on a subset

A \subset {- 1, 1}^{n}

. To this end, let

{\bar{X}}^{(i)}

denote the vector where the bit at the i-th position of

X^{n}

is flipped, so

{\bar{X}}^{(i)} ≜ (X_{1}, \dots, X_{i - 1}, - X_{i}, X_{i + 1}, \dots, X_{n})

for all

i \in [n]

. Then, the influence of the i-th variable is defined as

\begin{matrix} I_{i} (A) ≜ Pr [1 {X^{n} \in A} \neq 1 \{{\bar{X}}^{(i)} \in A\}], i \in [n], \end{matrix}

(122)

and their total influence is defined to be the sum

\begin{matrix} I (A) ≜ \sum_{i = 1}^{n} I_{i} (A) . \end{matrix}

(123)

As it is shown in Chapters 9 and 10 of [6], influences of subsets of the binary hypercube have far reaching consequences in the study of threshold phenomena, and many other areas. As a corollary of (107), it is obtained in Theorem 4.3 of [6] that, for every subset

A \subset {- 1, 1}^{n}

,

\begin{matrix} I (A) \geq 2 Pr (A) {log}_{2} \frac{1}{Pr (A)}, \end{matrix}

(124)

where

Pr (A) ≜ P [X^{n} \in A] = \frac{| A |}{2^{n}}

by the equiprobable distribution of

X^{n}

over

{- 1, 1}^{n}

.

In light of Theorem 2, the same approach which is used in Section 4.4 of [6] for the transition from (107) to (124) can be also used to obtain, as a corollary, a lower bound on the average total influence over all subsets of d variables. To this end, let

k_{1}, \dots, k_{d}

be integers such that

1 \leq k_{1} < \dots < k_{d} \leq n

, and the influence of the variables in positions

k_{1}, \dots, k_{d}

be given by

\begin{matrix} I_{(k_{1}, \dots, k_{d})} (A) ≜ Pr [1 {X^{n} \in A} \neq 1 \{{\bar{X}}^{(k_{1}, \dots, k_{d})} \in A\}] . \end{matrix}

(125)

Then, let the average influence of subsets of d variables be defined as

\begin{matrix} I^{(n, d)} (A) ≜ \frac{1}{(\binom{n}{d})} \sum_{\begin{matrix} (k_{1}, \dots, k_{d}) : \\ 1 \leq k_{1} < \dots < k_{d} \leq n \end{matrix}} I_{(k_{1}, \dots, k_{d})} (A) . \end{matrix}

(126)

Hence, by (123) and (126),

I^{(n, 1)} (A) = \frac{1}{n} I (A)

for every subset

A \subset {- 1, 1}^{n}

. Let

\begin{matrix} B^{(n, d)} (A) ≜ \{(x^{n}, y^{n}) : x^{n} \in A, y^{n} \in {- 1, 1}^{n} \ A, d_{H} (x^{n}, y^{n}) = d\}, \end{matrix}

(127)

be the set of ordered pairs of sequences

(x^{n}, y^{n})

, where

x^{n}, y^{n} \in {- 1, 1}^{n}

are of Hamming distance d from each other, with

x^{n} \in A

and

y^{n} \notin A

. By the equiprobable distribution of

X^{n}

on

{- 1, 1}^{n}

, we get

\begin{array}{l} (128a) & I^{(n, d)} (A) & = \frac{1}{(\binom{n}{d})} \sum_{\begin{matrix} (k_{1}, \dots, k_{d}) : \\ 1 \leq k_{1} < \dots < k_{d} \leq n \end{matrix}} Pr [1 {X^{n} \in A} \neq 1 \{{\bar{X}}^{(k_{1}, \dots, k_{d})} \in A\}] \\ (128b) & = \frac{2}{(\binom{n}{d})} \sum_{\begin{matrix} (k_{1}, \dots, k_{d}) : \\ 1 \leq k_{1} < \dots < k_{d} \leq n \end{matrix}} Pr [X^{n} \in A, {\bar{X}}^{(k_{1}, \dots, k_{d})} \notin A] \\ (128c) & = \frac{2}{(\binom{n}{d})} \cdot \frac{| B^{(n, d)} (A) |}{2^{n}} \\ (128d) & = \frac{| B^{(n, d)} (A) |}{2^{n - 1} (\binom{n}{d})} . \end{array}

Since every point in

A

has

(\binom{n}{d})

neighbors of Hamming distance d in the set

{- 1, 1}^{n}

, it follows that

\begin{matrix} (\binom{n}{d}) | A | = 2 |E_{d} (G)| + |B^{(n, d)} (A)|, \end{matrix}

(129)

where G is introduced in Theorem 2, and

E_{d} (G)

is the set of edges connecting pairs of vertices in G which are represented by vectors in

A

of Hamming distance d. The multiplication by 2 on the RHS of (129) is because every edge whose two endpoints are in the set

A

is counted twice. Hence, by (106) and (129),

\begin{array}{l} (130a) & |B^{(n, d)} (A)| & = (\binom{n}{d}) | A | - 2 |E_{d} (G)| \\ (130b) & \geq (\binom{n}{d}) | A | - \frac{(\binom{n - 1}{d - 1}) | A | (log | A | - \frac{n}{d} log ℓ_{d})}{log \frac{m_{d}}{ℓ_{d}}} \\ (130c) & = (\binom{n}{d}) | A | (1 - \frac{\frac{d}{n} log | A | - log ℓ_{d}}{log \frac{m_{d}}{ℓ_{d}}}) \\ (130d) & = (\binom{n}{d}) | A | (\frac{log m_{d} - \frac{d}{n} log | A |}{log \frac{m_{d}}{ℓ_{d}}}), \end{array}

and the lower bound on the RHS of (130d) is positive if and only if

| A | < {(m_{d})}^{\frac{n}{d}}

(see also (114)). This gives from (128) that the average influence of subsets of d variables satisfies

\begin{array}{l} (131a) & I^{(n, d)} (A) & \geq \frac{| A |}{2^{n - 1}} (\frac{log m_{d} - \frac{d}{n} log | A |}{log \frac{m_{d}}{ℓ_{d}}}) \\ (131b) & = 2 Pr (A) (\frac{log m_{d} - \frac{d}{n} log (2^{n} Pr (A))}{log \frac{m_{d}}{ℓ_{d}}}) \\ (131c) & = 2 Pr (A) (\frac{\frac{d}{n} log \frac{1}{Pr (A)} - log \frac{2^{d}}{m_{d}}}{log \frac{m_{d}}{ℓ_{d}}}) . \end{array}

Note that by setting

d = 1

, and the default values

m_{d} = 2

and

ℓ_{d} = 1

on the RHS of (131c) gives the total influence of the n variables satisfies, for all

A \subseteq {- 1, 1}^{n}

,

\begin{array}{l} (132a) & I (A) & = n I^{(n, 1)} (A) \\ (132b) & \geq 2 Pr (A) {log}_{2} \frac{1}{Pr (A)}, \end{array}

which is then specialized to the result in (Theorem 4.3 of [6], see (124)). This gives the following result.

Theorem 3.

Let

X^{n}

be an equiprobable random vector over the set

{- 1, 1}^{n}

, let

d \in [n]

and

A \subset {- 1, 1}^{n}

. Then, the average influence of subsets of d variables of

X^{n}

, as it is defined in (126), is lower bounded as follows:

\begin{matrix} I^{(n, d)} (A) & \geq 2 Pr (A) (\frac{\frac{d}{n} log \frac{1}{Pr (A)} - log \frac{2^{d}}{m_{d}}}{log \frac{m_{d}}{ℓ_{d}}}), \end{matrix}

(133)

where

Pr (A) ≜ P [X^{n} \in A] = \frac{| A |}{2^{n}}

, and the integers

m_{d}

and

ℓ_{d}

are introduced in Theorem 2. Similarly to the refined upper bound in Theorem 2, the lower bound on the RHS of (133) is informative (i.e., positive) if and only if

| A | < {(m_{d})}^{\frac{n}{d}}

. The lower bound on the RHS of (133) can be loosened (by setting the default values

m_{d} = 2

and

ℓ_{d} = 1

) to

\begin{matrix} I^{(n, d)} (A) & \geq 2 Pr (A) (\frac{d}{n} {log}_{2} \frac{1}{Pr (A)} + 1 - d) . \end{matrix}

(134)

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The author declares no conflict of interest.

Appendix A. Proof of Proposition 1

For completeness, we prove Proposition 1 which introduces results from [25,28,37].

Let

Ω

be a non-empty finite set, and let

{X_{ω}}_{ω \in Ω}

be a collection of discrete random variables. We first prove Item (a), showing that the entropy set function

f : 2^{Ω} \to R

in (15) is a rank function.

$f (Ø) = 0$ .
Submodularity: If $S, T \subseteq Ω$ , then

$\begin{array}{l} f (T \cup S) + f (T \cap S) \\ (A1a) & = H (X_{T \cup S}) + H (X_{T \cap S}) \\ (A1b) & = H (X_{T \ S}, X_{T \cap S}, X_{S \ T}) + H (X_{T \cap S}) \\ (A1c) & = H (X_{T \ S}, X_{S \ T} | X_{T \cap S}) + 2 H (X_{T \cap S}) \\ = [H (X_{T \ S} | X_{T \cap S}) + H (X_{S \ T} | X_{T \cap S}) - I (X_{T \ S}; X_{S \ T} | X_{T \cap S})] \\ (A1d) & + 2 H (X_{T \cap S}) \\ = [H (X_{T \ S} | X_{T \cap S}) + H (X_{T \cap S})] + [H (X_{S \ T} | X_{T \cap S}) + H (X_{T \cap S})] \\ (A1e) & - I (X_{T \ S}; X_{S \ T} | X_{T \cap S}) \\ (A1f) & = H (X_{T \ S}, X_{T \cap S}) + H (X_{S \ T}, X_{T \cap S}) - I (X_{T \ S}; X_{S \ T} | X_{T \cap S}) \\ (A1g) & = H (X_{T}) + H (X_{S}) - I (X_{T \ S}; X_{S \ T} | X_{T \cap S}) \\ (A1h) & = f (T) + f (S) - I (X_{T \ S}; X_{S \ T} | X_{T \cap S}), \end{array}$

which gives

$\begin{matrix} f (T) + f (S) - [f (T \cup S) + f (T \cap S)] = I (X_{T \ S}; X_{S \ T} | X_{T \cap S}) \geq 0 . \end{matrix}$

(A2)

This proves the submodularity of f, while also showing that

$\begin{matrix} f (T) + f (S) = f (T \cup S) + f (T \cap S) ⟺ X_{T \ S} ⊥ ⊥ X_{S \ T} | X_{T \cap S}, \end{matrix}$

(A3)

i.e., the rightmost side of (A2) holds with equality if and only if $X_{T \ S}$ and $X_{S \ T}$ are conditionally independent given $X_{T \cap S}$ .
Monotonicity: If $S \subseteq T \subseteq Ω$ , then

$\begin{array}{l} (A4a) & f (S) & = H (X_{S}) \\ (A4b) & \leq H (X_{S}) + H (X_{T} | X_{S}) \\ (A4c) & = H (X_{T}) \\ (A4d) & = f (T), \end{array}$

so f is monotonically increasing.

We next prove Item (b). Consider the set function f in (16).

$f (Ø) = 0$ , and $f (Ω) = H (X_{Ω})$ .
Supermodularity: If $S, T \subseteq Ω$ , then

$\begin{array}{l} (A5a) & f (T \cup S) + f (T \cap S) & = H (X_{T \cup S} | X_{T^{c} \cap S^{c}}) + H (X_{T \cap S} | X_{T^{c} \cup S^{c}}) \\ (A5b) & = [H (X_{Ω}) - H (X_{T^{c} \cap S^{c}})] + [H (X_{Ω}) - H (X_{T^{c} \cup S^{c}})] \\ (A5c) & = 2 H (X_{Ω}) - [H (X_{T^{c} \cup S^{c}}) + H (X_{T^{c} \cap S^{c}})] \\ (A5d) & \geq 2 H (X_{Ω}) - [H (X_{T^{c}}) + H (X_{S^{c}})] \\ (A5e) & = [H (X_{Ω}) - H (X_{T^{c}})] + [H (X_{Ω}) - H (X_{S^{c}})] \\ (A5f) & = H (X_{T} | X_{T^{c}}) + H (X_{S} | X_{S^{c}}) \\ (A5g) & = f (T) + f (S), \end{array}$

where inequality (A5d) holds since the entropy function in (15) is submodular (by Item (a)).
Monotonicity: If $S \subseteq T \subseteq Ω$ , then

$\begin{array}{l} (A6a) & f (S) & = H (X_{S} | X_{S^{c}}) \\ (A6b) & \leq H (X_{S} | X_{T^{c}}) (T^{c} \subseteq S^{c}) \\ (A6c) & \leq H (X_{T} | X_{T^{c}}) \\ (A6d) & = f (T), \end{array}$

so f is monotonically increasing.

Item (c) follows easily from Items (a) and (b). Consider the set function

f : 2^{Ω} \to R

in (17). Then, for all

T \in Ω

,

f (T) = I (X_{T}; X_{T^{c}}) = H (X_{T}) - H (X_{T} | X_{T^{c}})

, so f is expressed as a difference of a submodular function and a supermodular function, which gives a submodular function. Furthermore,

f (Ø) = 0

; by the symmetry of the mutual information,

f (T) = f (T^{c})

for all

T \subseteq Ω

, so f is not monotonic.

We next prove Item (d). Consider the set function

f : 2^{V} \to R

in (18), and we need to prove that f is submodular under the conditions in Item (d) where

U, V \subseteq Ω

are disjoint subsets, and the entries of the random vector

X_{V}

are conditionally independent given

X_{U}

.

$f (Ø) = I (X_{U}; X_{Ø}) = 0$ .
Submodularity: If $S, T \subseteq V$ , then

$\begin{array}{l} f (T \cup S) + f (T \cap S) \\ (A7a) & = I (X_{U}; X_{T \cup S}) + I (X_{U}; X_{T \cap S}) \\ (A7b) & = [H (X_{T \cup S}) - H (X_{T \cup S} | X_{U})] + [H (X_{T \cap S}) - H (X_{T \cap S} | X_{U})] \\ (A7c) & = [H (X_{T \cup S}) + H (X_{T \cap S})] - [H (X_{T \cup S} | X_{U}) + H (X_{T \cap S} | X_{U})] \\ = [H (X_{T}) + H (X_{S}) - I (X_{T \ S}; X_{S \ T} | X_{T \cap S})] \\ (A7d) & - [H (X_{T \cup S} | X_{U}) + H (X_{T \cap S} | X_{U})], \end{array}$

where equality (A7d) holds by the proof of Item (a) (see (A2)). By the assumption on the conditional independence of the random variables ${X_{v}}_{v \in V}$ given $X_{U}$ , we get

$\begin{array}{l} (A8a) & H (X_{T \cup S} | X_{U}) + H (X_{T \cap S} | X_{U}) & = \sum_{ω \in T \cup S} H (X_{ω} | X_{U}) + \sum_{ω \in T \cap S} H (X_{ω} | X_{U}) \\ (A8c) & = \sum_{ω \in T} H (X_{ω} | X_{U}) + \sum_{ω \in S} H (X_{ω} | X_{U}) \\ (A8c) & = H (X_{T} | X_{U}) + H (X_{S} | X_{U}) . \end{array}$

Consequently, combining (A7) and (A8) gives

$\begin{array}{l} f (T \cup S) + f (T \cap S) & = [H (X_{T}) + H (X_{S}) - I (X_{T \ S}; X_{S \ T} | X_{T \cap S})] \\ (A9a) & - [H (X_{T} | X_{U}) + H (X_{S} | X_{U})] \\ = [H (X_{T}) - H (X_{T} | X_{U})] + [H (X_{S}) - H (X_{S} | X_{U})] \\ (A9b) & - I (X_{T \ S}; X_{S \ T} | X_{T \cap S}) \\ (A9c) & = I (X_{T}; X_{U}) + I (X_{S}; X_{U}) - I (X_{T \ S}; X_{S \ T} | X_{T \cap S}) \\ (A9d) & = f (T) + f (S) - I (X_{T \ S}; X_{S \ T} | X_{T \cap S}) \\ (A9e) & \leq f (T) + f (S), \end{array}$

where the inequality (A9e) holds with equality if and only if $X_{T \ S}$ and $X_{S \ T}$ are conditionally independent given $X_{T \cap S}$ .
Monotonicity: If $S \subseteq T \subseteq V$ , then

$\begin{matrix} f (S) = I (X_{U}; X_{S}) \leq I (X_{U}; X_{T}) = f (T), \end{matrix}$

(A10)

so f is monotonically increasing.

We finally prove Item (e), where it is needed to show that the entropy of a sum of independent random variables is a rank function. Let

f : 2^{Ω} \to R

be the set function as given in (19).

$f (Ø) = 0$ .
Submodularity: Let $S, T \subseteq Ω$ . Define

$\begin{matrix} U ≜ \sum_{ω \in T \cap S} X_{ω}, V ≜ \sum_{ω \in S \ T} X_{ω}, W ≜ \sum_{ω \in T \ S} X_{ω} . \end{matrix}$

(A11)

From the independence of the random variables ${X_{ω}}_{ω \in Ω}$ , it follows that $U, V$ and W are independent. Hence, we get

$\begin{array}{l} [f (T) + f (S)] - [f (T \cup S) + f (T \cap S)] \\ (A12a) & = [f (T) - f (T \cap S)] - [f (T \cup S) - f (S)] \\ (A12b) & = [H (U + W) - H (U)] - [H (U + V + W) - H (U + V)] \\ (A12c) & = [H (U + W) - H (U + W | W)] - [H (U + V + W) - H (U + V)] \\ (A12d) & = [H (U + W) - H (U + W | W)] - [H (U + V + W) - H (U + V + W | W)] \\ (A12e) & = I (U + W; W) - I (U + V + W; W) \\ (A12f) & \geq I (U + W; W) - I (U + W, V; W), \end{array}$

and

$\begin{array}{l} (A13a) & I (U + W, V; W) & = I (V; W) + I (U + W; W | V) \\ (A13b) & = I (U + W; W | V) \\ (A13c) & = I (U + W; W) . \end{array}$

Combining (A12) and (A13) gives (11).
Monotonicity: If $S \subseteq T \subseteq Ω$ , then since ${X_{ω}}_{ω \in Ω}$ are independent random variables, (A11) implies that U and W are independent and $V = 0$ . Hence,

$\begin{array}{l} (A14a) & f (T) - f (S) & = H (U + W) - H (U) \\ (A14b) & = H (U + W) - H (U + W | W) \\ (A14c) & = I (U + W; W) \geq 0 . \end{array}$

This completes the proof of Proposition 1.

Appendix B. Proof of Proposition 4

Lemma A1.

Let

{B_{j}}_{j = 1}^{ℓ}

(with

ℓ \geq 2

) be a sequence of sets that is not a chain (i.e., there is no permutation

π : [ℓ] \to [ℓ]

such that

B_{π (1)} \subseteq B_{π (2)} \subseteq \dots \subseteq B_{π (ℓ)}

). Consider a recursive process where, at each step, a pair of sets that are not related by inclusion is replaced with their intersection and union. Then, there exists such a recursive process that leads to a chain in a finite number of steps.

Proof.

The lemma is proved by mathematical induction on ℓ. It holds for

ℓ = 2

since

B_{1} \cap B_{2} \subseteq B_{1} \cup B_{2}

, and the process halts in a single step. Suppose that the lemma holds with a fixed

ℓ \geq 2

, and for an arbitrary sequence of ℓ sets which is not a chain. We aim to show that it also holds for every sequence of

ℓ + 1

sets which is not a chain. Let

{B_{j}}_{j = 1}^{ℓ + 1}

be such an arbitrary sequence of sets, and consider the subsequence of the first ℓ sets

B_{1}, \dots, B_{ℓ}

. If it is not a chain, then (by the induction hypothesis) there exists a recursive process as above which enables to transform it into a chain in a finite number of steps, i.e., we get a chain

B_{1}^{'} \subseteq B_{2}^{'} \subseteq \dots \subseteq B_{ℓ}^{'}

. If

B_{ℓ}^{'} \subseteq B_{ℓ + 1}

or

B_{ℓ + 1} \subseteq B_{1}^{'}

, then we get a chain of

ℓ + 1

sets. Otherwise, by proceeding with the recursive process where

B_{ℓ}^{'}

and

B_{ℓ + 1}

are replaced with their intersection and union, consider the sequence

\begin{matrix} B_{1}^{'}, \dots, B_{ℓ - 1}^{'}, B_{ℓ}^{'} \cap B_{ℓ + 1}, B_{ℓ}^{'} \cup B_{ℓ + 1} . \end{matrix}

(A15)

By the induction hypothesis, the first ℓ sets in this sequence can be transformed into a chain (in a finite number of steps) by a recursive process as above; this gives a chain of the form

B_{1}^{″} \subseteq B_{2}^{″} \dots \subseteq B_{ℓ - 1}^{″} \subseteq B_{ℓ}^{″}

. The first ℓ sets in (A15) are all included in

B_{ℓ}^{'}

, so every combination of unions and intersections of these ℓ sets is also included in

B_{ℓ}^{'}

. Hence, the considered recursive process leads to a chain of the form

\begin{matrix} B_{1}^{″} \subseteq B_{2}^{″} \dots \subseteq B_{ℓ - 1}^{″} \subseteq B_{ℓ}^{″} \subseteq B_{ℓ}^{'} \cup B_{ℓ + 1}, \end{matrix}

(A16)

where the last inclusion in (A16) holds since

B_{ℓ}^{″} \subseteq B_{ℓ}^{'}

. The claim thus holds for

ℓ + 1

if it holds for a given ℓ, and it holds for

ℓ = 2

, it therefore holds by mathematical induction for all integers

ℓ \geq 2

. □

We first prove Proposition 4a. Suppose that there is a permutation

π : [M] \to [M]

such that

S_{π (1)} \subseteq S_{π (2)} \subseteq \dots \subseteq S_{π (M)}

is a chain. Since every element in

Ω

is included in at least d of these subsets, then it should be included in (at least) the d largest sets of this chain, so

S_{π (j)} = Ω

for every

j \in [M - d + 1 : M]

. Due to the non-negativity of f, it follows that

\begin{array}{l} (A17a) & \sum_{j = 1}^{M} f (S_{j}) & \geq \sum_{j = M - d + 1}^{M} f (S_{π (j)}) \\ (A17b) & = d f (Ω) . \end{array}

Otherwise, if we cannot get a chain by possibly permuting the subsets in the sequence

{S_{j}}_{j = 1}^{M}

, consider a pair of subsets

S_{n}

and

S_{m}

that are not related by inclusion, and replace them with their intersection and union. By the submodularity of f,

\begin{array}{l} (A18a) & \sum_{j = 1}^{M} f (S_{j}) & = \sum_{j \neq n, m} f (S_{j}) + f (S_{n}) + f (S_{m}) \\ (A18b) & \geq \sum_{j \neq n, m} f (S_{j}) + f (S_{n} \cap S_{m}) + f (S_{n} \cup S_{m}) . \end{array}

For all

ω \in Ω

, let

deg (ω)

be the number of indices

j \in [M]

such that

ω \in S_{j}

. By replacing

S_{n}

and

S_{m}

with

S_{n} \cap S_{m}

and

S_{n} \cup S_{m}

, the set of values

{deg (ω)}_{ω \in Ω}

stays unaffected (indeed, if

ω \in S_{n}

and

ω \in S_{m}

, then it belongs to their intersection and union; if

ω

belongs to only one of the sets

S_{n}

and

S_{m}

, then

ω \notin S_{n} \cap S_{m}

and

ω \in S_{n} \cup S_{m}

; finally, if

ω \notin S_{n}

and

ω \notin S_{m}

, then it does not belong to their intersection and union). Now, consider the recursive process in Lemma A1. Since the profile of the number of inclusions of the elements in

Ω

is preserved in each step of the recursive process in Lemma A1, it follows that every element in

Ω

stays to belong to at least d sets in the chain which is obtained at the end of this recursive process. Moreover, in light of (A18), in every step of the recursive process in Lemma A1, the sum in the LHS of (A18) cannot increase. Inequality (45) therefore finally follows from the earlier part of the proof for a chain (see (A17)).

We next prove Proposition 4b. Let

A \subset Ω

, and suppose that every element in

A

is included in at least

d \geq 1

of the subsets

{S_{j}}_{j = 1}^{M}

. For all

j \in [M]

, define

S_{j}^{'} ≜ S_{j} \cap A

, and consider the sequence

{\{S_{j}^{'}\}}_{j = 1}^{M}

of subsets of

A

. If f is a rank function, then it is monotonically increasing, which yields

\begin{matrix} f (S_{j}^{'}) \leq f (S_{j}), j \in [M] . \end{matrix}

(A19)

Each element of

A

is also included in at least d of the subsets

{\{S_{j}^{'}\}}_{j = 1}^{M}

(by construction, and since (by assumption) each element in

A

is included in at least d of the subsets

{S_{j}}_{j = 1}^{M}

). By the non-negativity and submodularity of f, Proposition 4a gives

\begin{matrix} \sum_{j = 1}^{M} f (S_{j}^{'}) \geq d f (A) . \end{matrix}

(A20)

Combining (A19) and (A20) yields (46). This completes the proof of Proposition 4.

Remark A1.

Lemma A1 is weaker than a claim that, in every recursive process as in Lemma A1, the number of pairs of sets that are not related by inclusion is strictly decreasing at each step. Lemma A1 is, however, sufficient for our proof of Proposition 4a.

References

Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
Dembo, A.; Cover, T.M.; Thomas, J.A. Information theoretic inequalities. IEEE Trans. Inf. Theory 1991, 37, 1501–1518. [Google Scholar] [CrossRef] [Green Version]
Chan, T. Recent progresses in characterising information inequalities. Entropy 2011, 13, 379–401. [Google Scholar] [CrossRef]
Martin, S.; Padró, C.; Yang, A. Secret sharing, rank inequalities, and information inequalities. IEEE Trans. Inf. Theory 2016, 62, 599–610. [Google Scholar] [CrossRef]
Babu, S.A.; Radhakrishnan, J. An entropy-based proof for the Moore bound for irregular graphs. In Perspectives on Computational Complexity; Agrawal, M., Arvind, V., Eds.; Birkhäuser: Cham, Switzerland, 2014; pp. 173–182. [Google Scholar]
Boucheron, S.; Lugosi, G.; Massart, P. Concentration Inequalities - A Nonasymptotic Theory of Independence; Oxford University Press: Oxford, UK, 2013. [Google Scholar]
Chung, F.R.K.; Graham, L.R.; Frankl, P.; Shearer, J.B. Some intersection theorems for ordered sets and graphs. J. Comb. Theory Ser. A 1986, 43, 23–37. [Google Scholar] [CrossRef] [Green Version]
Erdos, P.; Rényi, A. On two problems of information theory. Publ. Math. Inst. Hung. Acad. Sci. 1963, 8, 241–254. [Google Scholar]
Friedgut, E. Hypergraphs, entropy and inequalities. Am. Math. Mon. 2004, 111, 749–760. [Google Scholar] [CrossRef] [Green Version]
Jukna, S. Extremal Combinatorics with Applications in Computer Science, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
Kaced, T.; Romashchenko, A.; Vereshchagin, N. A conditional information inequality and its combinatorial applications. IEEE Trans. Inf. Theory 2018, 64, 3610–3615. [Google Scholar] [CrossRef]
Kahn, J. An entropy approach to the hard-core model on bipartite graphs. Comb. Comput. 2001, 10, 219–237. [Google Scholar] [CrossRef]
Kahn, J. Entropy, independent sets and antichains: A new approach to Dedekind’s problem. Proc. Am. Math. Soc. 2001, 130, 371–378. [Google Scholar] [CrossRef]
Madiman, M.; Marcus, A.W.; Tetali, P. Entropy and set cardinality inequalities for partition-determined functions. Random Struct. Algorithms 2012, 40, 399–424. [Google Scholar] [CrossRef] [Green Version]
Madiman, M.; Marcus, A.W.; Tetali, P. Information–theoretic inequalities in additive combinatorics. In Proceedings of the 2010 IEEE Information Theory Workshop, Cairo, Egypt, 6–8 January 2010. [Google Scholar]
Pippenger, N. An information–theoretic method in combinatorial theory. J. Comb. Ser. A 1977, 23, 99–104. [Google Scholar] [CrossRef] [Green Version]
Pippenger, N. Entropy and enumeration of boolean functions. IEEE Trans. Inf. Theory 1999, 45, 2096–2100. [Google Scholar] [CrossRef]
Radhakrishnan, J. An entropy proof of Bregman’s theorem. J. Comb. Theory Ser. A 1997, 77, 161–164. [Google Scholar] [CrossRef] [Green Version]
Radhakrishnan, J. Entropy and counting. In Computational Mathematics, Modelling and Algorithms; Narosa Publishers: New Delhi, India, 2001; pp. 1–25. [Google Scholar]
Sason, I. A generalized information–theoretic approach for bounding the number of independent sets in bipartite graphs. Entropy 2021, 23, 270. [Google Scholar] [CrossRef]
Sason, I. Entropy-based proofs of combinatorial results on bipartite graphs. In Proceedings of the 2021 IEEE International Symposium on Information Theory, Melbourne, Australia, 12–20 July 2021; pp. 3225–3230. [Google Scholar]
Madiman, M.; Tetali, P. Information inequalities for joint distributions, interpretations and applications. IEEE Trans. Inf. Theory 2010, 56, 2699–2713. [Google Scholar] [CrossRef]
Bach, F. Learning with submodular functions: A convex optimization perspective. Found. Trends Mach. Learn. 2013, 6, 145–373. [Google Scholar] [CrossRef] [Green Version]
Chen, Q.; Cheng, M.; Bai, B. Matroidal entropy functions: A quartet of theories of information, matroid, design and coding. Entropy 2021, 23, 323. [Google Scholar] [CrossRef]
Fujishige, S. Polymatroidal dependence structure of a set of random variables. Inf. Control. 1978, 39, 55–72. [Google Scholar] [CrossRef] [Green Version]
Fujishige, S. Submodular Functions and Optimization, 2nd ed.; Annals of Discrete Mathematics Series; Elsevier: Amsterdam, The Netherlands, 2005; Volume 58. [Google Scholar]
Iyer, R.; Khargonkar, N.; Bilems, J.; Asnani, H. Generalized submodular information measures: Theoretical properties, examples, optimization algorithms, and applications. IEEE Trans. Inf. Theory 2022, 68, 752–781. [Google Scholar] [CrossRef]
Krause, A.; Guestrin, C. Near-optimal nonmyopic value of information in graphical models. In Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (UAI 2005), Edinburgh, UK, 26–29 July 2005; pp. 324–331. [Google Scholar]
Lovász, L. Submodular functions and convexity. In Mathematical Programming The State of the Art; Bachem, A., Korte, B., Grotschel, M., Eds.; Springer: Berlin/Heidelberg, Germany, 1983; pp. 235–257. [Google Scholar]
Tian, C. Inequalities for entropies of sets of subsets of random variables. In Proceedings of the 2011 IEEE International Symposium on Information Theory, Saint Petersburg, Russia, 31 July–5 August 2011; pp. 1950–1954. [Google Scholar]
Kishi, Y.; Ochiumi, N.; Yanagida, M. Entropy inequalities for sums over several subsets and their applications to average entropy. In Proceedings of the 2014 IEEE International Symposium on Information Theory (ISIT 2014), Honolulu, HI, USA, 30 June–4 July 2014; pp. 2824–2828. [Google Scholar]
Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 379–423, 623–656. [Google Scholar] [CrossRef] [Green Version]
Madiman, M.; Mellbourne, J.; Xeng, P. Forward and reverse entropy power inequalities in convex geometry. In Convexity and Concentration; Carlen, E., Madiman, M., Werner, E.M., Eds.; IMA Volumes in Mathematics and Its Applications; Springer: Berlin/Heidelberg, Germany, 2017; Volume 161, pp. 427–485. [Google Scholar]
Han, T.S. Nonnegative entropy measures of multivariate symmetric correlations. Inf. Control. 1978, 36, 133–156. [Google Scholar] [CrossRef] [Green Version]
Polyanskiy, Y.; Wu, Y. Lecture Notes on Information Theory, version 5. Available online: http://people.lids.mit.edu/yp/homepage/data/itlectures_v5.pdf (accessed on 15 May 2019).
Bollobás, B. Extremal Graph Theory; Academic Press: Cambridge, MA, USA, 1978. [Google Scholar]
Madiman, M. On the entropy of sums. In Proceedings of the 2008 IEEE Information Theory Workshop, Porto, Portugal, 5–9 May 2008. [Google Scholar]
Tao, T. Sumset and inverse sumset theory for Shannon entropy. Comb. Comput. 2010, 19, 603–639. [Google Scholar] [CrossRef] [Green Version]
Kontoyiannis, I.; Madiman, M. Sumset and inverse sumset inequalities for differential entropy and mutual information. IEEE Trans. Inf. Theory 2014, 60, 4503–4514. [Google Scholar] [CrossRef] [Green Version]
Artstein, S.; Ball, K.M.; Barthe, F.; Naor, A. Solution of Shannon’s problem on the monotonicity of entropy. J. Am. Soc. 2004, 17, 975–982. [Google Scholar] [CrossRef]
Madiman, M.; Barron, A. Generalized entropy power inequalities and monotonicity properties of information. IEEE Trans. Inf. Theory 2007, 53, 2317–2329. [Google Scholar] [CrossRef] [Green Version]

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.