**1. Introduction**

Information measures and information inequalities are of fundamental importance and wide applicability in the study of feasibility and infeasibility results in information theory, while also offering very useful tools which serve to deal with interesting problems in various fields of mathematics [1,2]. The characterization of information inequalities has been of interest for decades (see, e.g., [3,4] and references therein), mainly triggered by their indispensable role in proving direct and converse results for channel coding and data compression for single and multi-user information systems. Information inequalities, which apply to classical and generalized information measures, have also demonstrated far-reaching consequences beyond the study of the coding theorems and fundamental limits of communication systems. One such remarkable example (among many) is the usefulness of information measures and information inequalities in providing information–theoretic proofs in the field of combinatorics and graph theory (see, e.g., [5–22]).

A basic property that is commonly used for the characterization of information inequalities relies on the nonnegativity of the (conditional and unconditional) Shannon entropy of discrete random variables, the nonnegativity of the (conditional and unconditional) relative entropy and the Shannon mutual information of general random variables, and the chain rules which hold for these classical information measures. A byproduct of these properties is the sub/supermodularity of some classical information measures, which also prove to be useful by taking advantage of the vast literature on sub/supermodular functions and polymatroids [22–31]. Another instrumental information inequality is the entropy power inequality, which dates back to Shannon [32]. It has been extensively generalized for different types of random variables and generalized entropies, studied in regard to its geometrical relations [33], and it has been also ubiquitously used for the analysis of various information–theoretic problems.

**Citation:** Sason, I. Information Inequalities via Submodularity and a Problem in Extremal Graph Theory. *Entropy* **2022**, *24*, 597. https:// doi.org/10.3390/e24050597

Academic Editors: Karagrigoriou Alexandros and Makrides Andreas

Received: 30 March 2022 Accepted: 21 April 2022 Published: 25 April 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Among the most useful information inequalities are Han's inequality [34], its generalized versions (e.g., [15,25,30,31]), and Shearer's lemma [7] with its generalizations and refinements (e.g., [15,31,35]). In spite of their simplicity, these inequalities prove to be useful in information theory, and other diverse fields of mathematics and engineering (see, e.g., [6,35]). More specifically in regard to these inequalities, in Proposition 1 of [22], Madiman and Tetali introduced an information inequality which can be specialized to Han's inequality, and which also refines Shearer's lemma while also providing a counterpart result. In [30], Tian generalized Han's inequality by relying on the sub/supermodularity of the unconditional/conditional Shannon entropy. Likewise, the work in [31] by Kishi et al., relies on the sub/supermodularity properties of Shannon information measures, and it provides refinements of Shearer's lemma and Han's inequality. Apart of the refinements of these classical and widely-used inequalities in [31], the suggested approach in the present work can be viewed in a sense as a (nontrivial) generalization and extension of a result in [31] (to be explained in Section 3.2).

This work is focused on the derivation of information inequalities via submodularity and nonnegativity properties, and on a problem in extremal graph theory whose analysis relies on an information inequality. The field of extremal graph theory, which is a subfield of extremal combinatorics, was among the early and fast developing branches of graph theory during the 20th century. Extremal graph theory explores the relations between properties of graphs such as its order, size, chromatic number or maximal and minimal degrees, under some constraints on the graph (by, e.g., considering graphs of a fixed order, and by also imposing a type of a forbidden subgraph). The interested reader is referred to the comprehensive textbooks [10,36] on the vast field of extremal combinatorics and extremal graph theory.

This paper suggests an approach for the derivation of families of inequalities for set functions, and it applies it to obtain information inequalities with Shannon information measures that satisfy sub/supermodularity and monotonicity properties. Some of the derived information inequalities are new, while some known results (such as the generalized version of Han's inequality [25]) are reproduced as corollaries in a simple and unified way. This paper also applies the generalized Han's inequality to analyze a problem in extremal graph theory, with an information–theoretic proof and interpretation. The analysis leads to some generalized and refined bounds in comparison to the insightful results in Theorems 4.2 and 4.3 of [6]. For the purpose of the suggested problem and analysis, the presentation here is self-contained.

The paper is structured as follows: Section 2 provides essential notation and preliminary material for this paper. Section 3 presents a new methodology for the derivation of families of inequalities for set functions which satisfy sub/supermodularity properties (Theorem 1). The suggested methodology is then applied in Section 3 for the derivation of information inequalities by relying on sub/supermodularity properties of Shannon information measures. Section 3 also considers connections of the suggested approach to a generalized version of Shearer's lemma, and to other results in the literature. Most of the results in Section 3 are proved in Section 4. Section 5 applies the generalized Han's inequality to a problem in extremal graph theory (Theorem 2). A byproduct of Theorem 2, which is of interest in its own right, is also analyzed in Section 5 (Theorem 3). The presentation and analysis in Section 5 is accessible to the reader, independently of the earlier material on information inequalities in Sections 3 and 4. Some additional proofs, mostly for making the paper self-contained or for suggesting an alternative proof, are relegated to the appendices (Appendices A and B).

#### **2. Preliminaries and Notation**

The present section provides essential notation and preliminary material for this paper.


$$\mathbb{H}(X) \stackrel{\triangle}{=} -\sum\_{\mathbf{x} \in \mathcal{X}} \mathbb{P}\_{X}(\mathbf{x}) \log \mathbb{P}\_{X}(\mathbf{x}),\tag{1}$$

where throughout this paper, we take all logarithms to base 2.

• The *binary entropy function* <sup>H</sup><sup>b</sup> : [0, 1] <sup>→</sup> [0, log 2] is given by

$$\mathbb{H}\_{\mathbf{b}}(p) \stackrel{\triangle}{=} -p\log p - (1-p)\log(1-p), \quad p \in [0,1], \tag{2}$$

where, by continuous extension, the convention 0 log 0 = 0 is used.

• Let *<sup>X</sup>* and *<sup>Y</sup>* be discrete random variables with a joint PMF P*XY*, and a conditional PMF of *<sup>X</sup>* given *<sup>Y</sup>* denoted by <sup>P</sup>*X*|*Y*. The *conditional entropy* of *<sup>X</sup>* given *<sup>Y</sup>* is defined as

$$\mathbb{H}(X|Y) \stackrel{\triangle}{=} -\sum\_{(x,y) \in \mathcal{X} \times \mathcal{Y}} \mathbb{P}\_{XY}(x,y) \log \mathbb{P}\_{X|Y}(x|y) \tag{3a}$$

$$=\sum\_{y\in\mathcal{Y}}\mathsf{P}\_{Y}(y)\,\,\mathsf{H}(X|Y=y)\_{\prime}\,\,\tag{3b}$$

and

$$\mathbb{H}(X|Y) = \mathbb{H}(X,Y) - \mathbb{H}(Y). \tag{4}$$

• The *mutual information* between *X* and *Y* is symmetric in *X* and *Y*, and it is given by

$$\mathbf{H}(\mathbf{X};\mathbf{Y}) = \mathbf{H}(\mathbf{X}) + \mathbf{H}(\mathbf{Y}) - \mathbf{H}(\mathbf{X},\mathbf{Y}) \tag{5a}$$

$$=\mathcal{H}(X) - \mathcal{H}(X|Y) \tag{5b}$$

$$\mathbf{H} = \mathbf{H}(\mathbf{Y}) - \mathbf{H}(\mathbf{Y}|X). \tag{5c}$$

• The *conditional mutual information* between two random variables *X* and *Y*, given a third random variable *Z*, is symmetric in *X* and *Y* and it is given by

$$\mathbf{H}(\mathbf{X};\mathbf{Y}|\mathbf{Z}) = \mathbf{H}(\mathbf{X}|\mathbf{Z}) - \mathbf{H}(\mathbf{X}|\mathbf{Y},\mathbf{Z}) \tag{6a}$$

$$\mathbf{H} = \mathbf{H}(\mathbf{X}, \mathbf{Z}) + \mathbf{H}(\mathbf{Y}, \mathbf{Z}) - \mathbf{H}(\mathbf{Z}) - \mathbf{H}(\mathbf{X}, \mathbf{Y}, \mathbf{Z}).\tag{6b}$$


$$\operatorname{IN}(X^n) \stackrel{\triangle}{=} \exp\left(\frac{2}{n} \operatorname{ H}(X^n)\right),\tag{7}$$

where the base of the exponent is identical to the base of the logarithm in (1). We rely on the following basic properties of the Shannon information measures: • Conditioning cannot increase the entropy, i.e.,

$$\mathcal{H}(X|Y) \le \mathcal{H}(X),\tag{8}$$

with equality in (8) if and only if *X* and *Y* are independent.

• Generalizing (4) to *n*-dimensional random vectors gives the chain rule

$$\mathbb{H}(X^{\mathfrak{n}}) = \sum\_{i=1}^{n} \mathbb{H}(X\_i | X^{i-1}). \tag{9}$$

• The *subadditivity property of the entropy* is implied by (8) and (9):

$$\mathbb{H}(X^n) \le \sum\_{i=1}^n \mathbb{H}(X\_i),\tag{10}$$

with equality in (10) if and only if *X*1,..., *Xn* are independent random variables.

• *Nonnegativity of the (conditional) mutual information*: In light of (5) and (8), I(*X*;*Y*) ≥ 0 with equality if and only if *X* and *Y* are independent. More generally, I(*X*;*Y*|*Z*) ≥ 0 with equality if and only if *X* and *Y* are conditionally independent given *Z*.

Let <sup>Ω</sup> be a finite and non-empty set, and let *<sup>f</sup>* : <sup>2</sup><sup>Ω</sup> <sup>→</sup> <sup>R</sup> be a real-valued set function (i.e., *f* is defined for all subsets of Ω). The following definitions are used.

**Definition 1** (Sub/Supermodular function)**.** *The set function f* : 2<sup>Ω</sup> <sup>→</sup> <sup>R</sup> *is* submodular *if*

$$f(\mathcal{T}) + f(\mathcal{S}) \ge f(\mathcal{T} \cup \mathcal{S}) + f(\mathcal{T} \cap \mathcal{S}), \qquad \forall \mathcal{S}, \mathcal{T} \subseteq \Omega$$

*Likewise, f is* supermodular *if* −*f is submodular.*

An identical characterization of submodularity is the diminishing return property (see, e.g., Proposition 2.2 in [23]), where a set function *<sup>f</sup>* : 2<sup>Ω</sup> <sup>→</sup> <sup>R</sup> is submodular if and only if

$$\mathcal{S} \subset \mathcal{T} \subset \Omega,\ \omega \in \mathcal{T}^{\mathbb{C}} \implies f(\mathcal{S} \cup \{\omega\}) - f(\mathcal{S}) \ge f(\mathcal{T} \cup \{\omega\}) - f(\mathcal{T}).\tag{12}$$

This means that the larger is the set, the smaller is the increase in *f* when a new element is added.

**Definition 2** (Monotonic function)**.** *The set function <sup>f</sup>* : <sup>2</sup><sup>Ω</sup> <sup>→</sup> <sup>R</sup> *is* monotonically increasing *if*

$$\mathcal{S} \subseteq \mathcal{T} \subseteq \Omega \implies f(\mathcal{S}) \le f(\mathcal{T}).\tag{13}$$

*Likewise, f is* monotonically decreasing *if* −*f is monotonically increasing.*

**Definition 3** (Polymatroid, ground set and rank function)**.** *Let <sup>f</sup>* : <sup>2</sup><sup>Ω</sup> <sup>→</sup> <sup>R</sup> *be submodular and monotonically increasing set function with f*(∅) = 0*. The pair* (Ω, *f*) *is called a* polymatroid*,* Ω *is called a* ground set*, and f is called a* rank function*.*

**Definition 4** (Subadditive function)**.** *The set function <sup>f</sup>* : <sup>2</sup><sup>Ω</sup> <sup>→</sup> <sup>R</sup> *is* subadditive *if, for all* S, T ⊆ Ω*,*

$$f(\mathcal{S}\cup\mathcal{T}) \le f(\mathcal{S}) + f(\mathcal{T}).\tag{14}$$

A nonnegative and submodular set function is subadditive (this readily follows from (11) and (14)). The next proposition introduces results from [25,28,37]. For the sake of completeness, we provide a proof in Appendix A.

**Proposition 1.** *Let* Ω *be a finite and non-empty set, and let* {*Xω*}*ω*∈<sup>Ω</sup> *be a collection of discrete random variables. Then, the following holds:*

*(a) The set function f* : 2<sup>Ω</sup> <sup>→</sup> <sup>R</sup>*, given by*

$$f(\mathcal{T}) \triangleq \mathcal{H}(\mathcal{X}\_{\mathcal{T}}), \quad \mathcal{T} \subseteq \Omega,\tag{15}$$

*is a rank function.*

*(b) The set function f* : 2<sup>Ω</sup> <sup>→</sup> <sup>R</sup>*, given by*

$$f(\mathcal{T}) \triangleq \mathcal{H}(X\_{\mathcal{T}} | X\_{\mathcal{T}^c}), \quad \mathcal{T} \subseteq \Omega,\tag{16}$$

*is supermodular, monotonically increasing, and f*(∅) = 0*. (c) The set function f* : 2<sup>Ω</sup> <sup>→</sup> <sup>R</sup>*, given by*

$$f(\mathcal{T}) \triangleq \mathcal{I}(X\_{\mathcal{T}}; X\_{\mathcal{T}^c}), \quad \mathcal{T} \subseteq \Omega,\tag{17}$$

*is submodular, f*(∅) = 0*, but f is not a rank function. The latter holds since the equality <sup>f</sup>*(<sup>T</sup> ) = *<sup>f</sup>*(<sup>T</sup> <sup>c</sup>)*, for all* T ⊆ <sup>Ω</sup>*, implies that f is not a monotonic function.*

*(d) Let* U, V ⊆ Ω *be disjoint subsets, and let the entries of the random vector X*<sup>V</sup> *be conditionally independent given X*<sup>U</sup> *. Then, the set function f* : 2<sup>V</sup> <sup>→</sup> <sup>R</sup> *given by*

$$f(\mathcal{T}) \triangleq \mathbf{I}(\mathbf{X}\_{ll'}\mathbf{X}\_{\mathcal{T}}), \quad \mathcal{T} \subseteq \mathcal{V}, \tag{18}$$

*is a rank function.*

*(e) Let X*<sup>Ω</sup> <sup>=</sup> {*Xω*}*ω*∈<sup>Ω</sup> *be independent random variables, and let f* : 2<sup>Ω</sup> <sup>→</sup> <sup>R</sup> *be given by*

$$f(\mathcal{T}) \triangleq \mathcal{H}\left(\sum\_{\omega \in \mathcal{T}} \mathcal{X}\_{\omega}\right), \quad \mathcal{T} \subseteq \Omega. \tag{19}$$

*Then, f is a rank function.*

The following proposition addresses the setting of general alphabets.

**Proposition 2.** *For general alphabets, the set functions f in* (15) *and* (17)*–*(19) *are submodular, and the set function f in* (16) *is supermodular with f*(∅) - 0*. Moreover, the function in* (18) *stays to be a rank function, and the function in* (19) *stays to be monotonically increasing.*

**Proof.** The sub/supermodularity properties in Proposition 1 are preserved due to the nonnegativity of the (conditional) mutual information. The monotonicity property of the functions in (18) and (19) is preserved also in the general alphabet setting due to (A10) and (A14c), and the mutual information in (18) is nonnegative.

**Remark 1.** *In contrast to the entropy of discrete random variables, the differential entropy of continuous random variables is* not *functionally submodular in the sense of Lemma A.2 in [38]. This refers to a different form of submodularity, which was needed by Tao [38] to prove sumset inequalities for the entropy of discrete random variables. A follow-up study in [39] by Kontoyiannis and Madiman required substantially new proof strategies for the derivation of sumset inequalities with the differential entropy of continuous random variables. The basic property which replaces the discrete functional submodularity is the data-processing property of mutual information [39]. In the context of the present work, where the commonly used definition of submodularity is used (see Definition 1), the Shannon entropy of discrete random variables and the differential entropy of continuous random variables are both submodular set functions.*

We rely, in this paper, on the following standard terminology for graphs. An undirected graph *G* is an ordered pair *G* = (*V*, *E*) where *V* = V(*G*) is a set of elements, and *E* = E(*G*) is a set of 2-element subsets (pairs) of *V*. The elements of *V* are called the vertices of *G*, and the elements of *E* are called the edges of *G*. We use the notation *V* = V(*G*) and *E* = E(*G*) for the sets of vertices and edges, respectively, in the graph *G*. The number of vertices in a finite graph *G* is called the order of *G*, and the number of edges is called the size of *G*. Throughout this paper, we assume that the graph *G* is undirected and finite; it is also assumed to be a simple graph, i.e., it has no loops (no edge connects a vertex in *G* to itself) and there are no multiple edges which connect a pair of vertices in *<sup>G</sup>*. If *<sup>e</sup>* <sup>=</sup> {*u*, *<sup>v</sup>*} ∈ <sup>E</sup>(*G*), then the vertices *u* and *v* are the two ends of the edge *e*. The elements *u* and *v* are adjacent vertices (neighbors) if they are connected by an edge in *<sup>G</sup>*, i.e., if *<sup>e</sup>* <sup>=</sup> {*u*, *<sup>v</sup>*} ∈ <sup>E</sup>(*G*).

#### **3. Inequalities via Submodularity**

#### *3.1. A New Methodology*

The present subsection presents a new methodology for the derivation of families of inequalities for set functions, and in particular inequalities with information measures. The suggested methodology relies, to large extent, on the notion of submodularity of set functions, and it is presented in the next theorem.

**Theorem 1.** *Let* <sup>Ω</sup> *be a finite set with* <sup>|</sup>Ω<sup>|</sup> <sup>=</sup> *n. Let <sup>f</sup>* : <sup>2</sup><sup>Ω</sup> <sup>→</sup> <sup>R</sup> *with <sup>f</sup>*(∅) = <sup>0</sup>*, and <sup>g</sup>* : <sup>R</sup> <sup>→</sup> <sup>R</sup>*. Let the sequence t* (*n*) *k n <sup>k</sup>*=<sup>1</sup> *be given by*

$$\mathbf{t}\_k^{(n)} \triangleq \frac{1}{\binom{n}{k}} \sum\_{\substack{\mathcal{T} \subseteq \Omega \colon |\mathcal{T}| = k}} \mathbf{g}\left(\frac{f(\mathcal{T})}{k}\right), \qquad k \in [n]. \tag{20}$$

*(a) If f is submodular, and g is monotonically increasing and convex, then the sequence t* (*n*) *k n k*=1 *is monotonically decreasing, i.e.,*

$$t\_1^{(n)} \ge t\_2^{(n)} \ge \dots \ge t\_n^{(n)} = \operatorname{g} \left( \frac{f(\Omega)}{n} \right). \tag{21}$$

*In particular,*

$$\sum\_{\substack{|\mathcal{T}| \subseteq \Omega \colon |\mathcal{T}| = k}} \lg\left(\frac{f(\mathcal{T})}{k}\right) \ge \binom{n}{k} \lg\left(\frac{f(\Omega)}{n}\right), \qquad k \in [n]. \tag{22}$$


**Proof.** See Section 4.1.

**Corollary 1.** *Let* <sup>Ω</sup> *be a finite set with* <sup>|</sup>Ω<sup>|</sup> <sup>=</sup> *n, <sup>f</sup>* : <sup>2</sup><sup>Ω</sup> <sup>→</sup> <sup>R</sup>*, and <sup>g</sup>* : <sup>R</sup> <sup>→</sup> <sup>R</sup> *be convex and monotonically increasing. If*


*then*

$$\lim\_{n \to \infty} \left\{ \frac{1}{n} \log \left( \sum\_{\substack{\mathcal{T} \subseteq \Omega \colon |\mathcal{T}| = k\_n}} g\left( \frac{f(\mathcal{T})}{k\_n} \right) \right) - \mathsf{H}\_\mathbf{b} \left( \frac{k\_n}{n} \right) \right\} = 0,\tag{23}$$

*and if* lim*n*→<sup>∞</sup> *kn <sup>n</sup>* = *β* ∈ [0, 1]*, then*

$$\lim\_{n \to \infty} \frac{1}{n} \log \left( \sum\_{\substack{\mathcal{T} \subseteq \Omega \colon |\mathcal{T}| = k\_n}} \mathcal{g} \left( \frac{f(\mathcal{T})}{k\_n} \right) \right) = \mathsf{H}\_{\mathsf{b}}(\boldsymbol{\beta}).\tag{24}$$

**Proof.** See Section 4.2.

**Corollary 2.** *Let* <sup>Ω</sup> *be a finite set with* <sup>|</sup>Ω<sup>|</sup> <sup>=</sup> *n, and <sup>f</sup>* : <sup>2</sup><sup>Ω</sup> <sup>→</sup> <sup>R</sup> *be submodular and nonnegative with f*(∅) = 0*. Then,*

*(a) For α* ≥ 1 *and k* ∈ [*n* − 1]

$$\sum\_{\substack{\mathcal{T}\subseteq\Omega\colon|\mathcal{T}|=k}} \left( f^a(\Omega) - f^a(\mathcal{T}) \right) \le c\_a(n,k) \, f^a(\Omega),\tag{25}$$

*with*

$$c\_{\mathfrak{a}}(n,k) \triangleq \left(1 - \frac{k^{\mathfrak{a}}}{n^{\mathfrak{a}}}\right) \binom{n}{k}.\tag{26}$$

*For α* = 1*,* (25) *holds with c*1(*n*, *k*) = ( *n*−1 *<sup>k</sup>* ) *regardless of the nonnegativity of f . (b) If f is also monotonically increasing (i.e., f is a rank function), then for α* ≥ 1

$$\left(\frac{k}{n}\right)^{n-1}\binom{n-1}{k-1}f^a(\Omega) \le \sum\_{\substack{\mathcal{T}\subseteq\Omega:\ |\mathcal{T}|=k}} f^a(\mathcal{T}) \le \binom{n}{k}f^a(\Omega), \qquad k \in [n].\tag{27}$$

**Proof.** See Section 4.3.

Corollary 2 is next specialized to reproduce Han's inequality [34], and a generalized version of Han's inequality (Section 4 of [25]).

Let *<sup>X</sup><sup>n</sup>* = (*X*1, ... , *Xn*) be a random vector with finite entropies <sup>H</sup>(*Xi*) for all *<sup>i</sup>* <sup>∈</sup> [*n*]. The set function *<sup>f</sup>* : <sup>2</sup>[*n*] <sup>→</sup> [0, <sup>∞</sup>), given by *<sup>f</sup>*(<sup>T</sup> ) = <sup>H</sup>(*X*<sup>T</sup> ) for all T ⊆ [*n*], is submodular [25] (see Proposition 1a and Proposition 2). From (25), the following holds:

(a) Setting *α* = 1 in (25) implies that, for all *k* ∈ [*n* − 1],

$$\sum\_{1 \le i\_1 < \dots < i\_k \le n} \left( \mathcal{H}(X^n) - \mathcal{H}(X\_{i\_1}, \dots, X\_{i\_k}) \right) \le \left( 1 - \frac{k}{n} \right) \binom{n}{k} \mathcal{H}(X^n) \tag{28a}$$

$$= \binom{n-1}{k} \text{ } \text{H}(X^n),\tag{28b}$$

(b) Consequently, setting *k* = *n* − 1 in (28) gives

$$\sum\_{i=1}^{n} \left( \mathcal{H}(X^n) - \mathcal{H}(X\_1, \dots, X\_{i-1}, X\_{i+1}, \dots, X\_n) \right) \le \mathcal{H}(X^n),\tag{29}$$

which gives Han's inequality.

Further applications of Theorem 1 lead to the next corollary, which partially introduces some known results that have been proved on a case-by-case basis in Theorems 17.6.1– 17.6.3 of [1] and Section 2 of [2]. In particular, the monotonicity properties of the sequences in (30) and (32)–(34) were proved in Theorems 1 and 2, and Corollaries 1 and 2 of [2]. Both

known and new results are readily obtained here, in a unified way, from Theorem 1. The utility of one of these inequalities in extremal combinatorics is discussed in the continuation to this subsection (see Proposition 3), providing a natural generalization of a beautiful combinatorial result in Section 3.2 of [19].

**Corollary 3.** *Let* {*Xi*}*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *be random variables with finite entropies. Then, the following holds: (a) The sequences*

$$h\_k^{(n)} \stackrel{\Delta}{=} \frac{1}{\binom{n}{k}} \sum\_{\substack{\mathcal{T} \subseteq [n] \colon |\mathcal{T}| = k}} \frac{\mathcal{H}(X\_{\mathcal{T}})}{k}, \qquad k \in [n], \tag{30}$$

$$\ell\_k^{(n)} \triangleq \frac{1}{\binom{n}{k}} \sum\_{\substack{\mathcal{T} \subseteq [n] \colon |\mathcal{T}| = k}} \frac{\mathcal{I}(X\_{\mathcal{T}}; X\_{\mathcal{T}^c})}{k}, \quad k \in [n] \tag{31}$$

*are monotonically decreasing in k. If* {*Xi*}*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *are independent, then also the sequence*

$$m\_k^{(n)} \triangleq \frac{1}{\binom{n-1}{k-1}} \sum\_{\substack{\mathcal{T} \subseteq [n] : |\mathcal{T}| = k}} \mathcal{H} \left( \sum\_{\omega \in \mathcal{T}} X\_{\omega} \right), \quad k \in [n] \tag{32}$$

*is monotonically decreasing in k.*

*(b) The sequence*

$$r\_k^{(n)} \triangleq \frac{1}{\binom{n}{k}} \sum\_{\substack{\mathcal{T} \subseteq [n] \colon |\mathcal{T}| = k}} \frac{\mathcal{H}(X\_{\mathcal{T}} | X\_{\mathcal{T}^c})}{k}, \quad k \in [n] \tag{33}$$

*is monotonically increasing in k.*

*(c) For every r* > 0*, the sequences*

$$s\_k^{(n)}(r) \stackrel{\triangle}{=} \frac{1}{\binom{n}{k}} \sum\_{\substack{\mathcal{T} \subseteq [n] \colon |\mathcal{T}| = k}} \mathsf{N}^r(X\_{\mathcal{T}})\_{\prime} \qquad k \in [n]\_{\prime} \tag{34}$$

$$\mu\_k^{(n)}(r) \triangleq \frac{1}{\binom{n}{k}} \sum\_{\substack{\mathcal{T} \subseteq [n] \colon |\mathcal{T}| = k}} \exp\left(-\frac{r \operatorname{H}(X\_{\mathcal{T}} | X\_{\mathcal{T}^c})}{k}\right), \quad k \in [n], \tag{35}$$

$$w\_k^{(n)}(r) \triangleq \frac{1}{\binom{n}{k}} \sum\_{\substack{\mathcal{T} \subseteq [n] : |\mathcal{T}| = k}} \exp\left(\frac{r \operatorname{ I}(X\_{\mathcal{T}}; X\_{\mathcal{T}^c})}{k}\right), \quad k \in [n] \tag{36}$$

*are monotonically decreasing in k. If* {*Xi*}*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *are independent, then also the sequence*

$$w\_k^{(n)}(r) \triangleq \frac{1}{\binom{n}{k}} \sum\_{\substack{\mathcal{T} \subseteq [n] \colon |\mathcal{T}| = k}} \mathcal{N}' \left( \sum\_{\omega \in \mathcal{T}} X\_{\omega} \right), \quad k \in [n] \tag{37}$$

*is monotonically decreasing in k.*

**Proof.** The finite entropies of {*Xi*}*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> assure that the entropies involved in the sequences (30)–(37) are finite. Item (a) follows from Theorem 1a, where the submodular set functions *f* which correspond to (30)–(32) are given in (15), (17) and (19), respectively, and *g* is the identity function on the real line. The identity *k*( *n <sup>k</sup>*) = *n*( *n*−1 *<sup>k</sup>*−1) is used for(32). Item (b) follows from Theorem 1c, where *f* is the supermodular function in (16) and *g* is the identity function on the real line. We next prove Item (c). The sequence (34) is monotonically decreasing by Theorem 1a, where *<sup>f</sup>* is the submodular function in (15), and *<sup>g</sup>* : <sup>R</sup> <sup>→</sup> <sup>R</sup> is the monotonically increasing and convex function defined as *<sup>g</sup>*(*x*) = exp(2*rx*) for *<sup>x</sup>* <sup>∈</sup> <sup>R</sup> (with *<sup>r</sup>* <sup>&</sup>gt; 0). The sequence (35) is monotonically decreasing by Theorem 1d, where *f* is the supermodular function in (16), and *<sup>g</sup>* : <sup>R</sup> <sup>→</sup> <sup>R</sup> is the monotonically decreasing and convex function defined as *<sup>g</sup>*(*x*) = exp(−*rx*) for *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>. The sequence (36) is monotonically decreasing by Theorem 1a, where *f* is the submodular function in (17) and *g* is the monotonically increasing and convex function defined as *<sup>g</sup>*(*x*) = exp(*rx*) for *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>. Finally, the sequence (37) is monotonically decreasing by Theorem 1a, where *f* is the submodular function in (19) and *g* is the monotonically increasing and convex function defined as *g*(*x*) = exp(2*rx*) for *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>.

**Remark 2.** *From Proposition 2, since the proof of Corollary 3 only relies on the sub/supermodularity property of <sup>f</sup> , the random variables* {*Xi*}*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *do not need to be discrete in Corollary 3. In the reproduction of Han's inequality as an application of Corollary 2, the random variables* {*Xi*}*<sup>n</sup> i*=1 *do not need to be discrete as well since f is not required to be nonnegative if α* = 1 *(only the submodularity of f in* (15) *is required, which holds due to Proposition 2).*

The following result exemplifies the utility of the monotonicity result of the sequence (30) in extremal combinatorics. It also generalizes the result in Section 3.2 of [19] for an achievable upper bound on the cardinality of a finite set in the three-dimensional Euclidean space, expressed as a function of its number of projections on each of the planes *XY*, *XZ* and *YZ*. The next result provides an achievable upper bound on the cardinality of a finite set of points in an *n*-dimensional Euclidean space, expressed as a function of its number of projections on each of the *k*-dimensional Euclidean subspaces with an arbitrary *k* < *n*.

**Proposition 3.** *Let* P ⊆ <sup>R</sup>*<sup>n</sup> be a finite set of points in the n-dimensional Euclidean space with* |P| - *M. Let k* ∈ [*n* − 1]*, and* - ( *n <sup>k</sup>*)*. Let* R1, ... , R *be the projections of* P *on each of the k-dimensional subspaces of* <sup>R</sup>*n, and let* |R*j*<sup>|</sup> <sup>=</sup> *Mj for all j* <sup>∈</sup> []*. Then,*

$$|\mathcal{P}| \le \left(\prod\_{j=1}^{\binom{n}{k}} M\_j\right)^{\frac{1}{\binom{n-1}{k-1}}}.\tag{38}$$

*Let R* log *M <sup>n</sup> , and Rj* log *Mj <sup>k</sup> for all j* ∈ []*. An equivalent form of* (38) *is given by the inequality*

$$R \le \frac{1}{\ell} \sum\_{j=1}^{\ell} R\_j. \tag{39}$$

*Moreover, if <sup>M</sup>*<sup>1</sup> <sup>=</sup> ... <sup>=</sup> *<sup>M</sup> and* <sup>√</sup>*<sup>k</sup> <sup>M</sup>*<sup>1</sup> <sup>∈</sup> <sup>N</sup>*, then* (38) *and* (39) *are satisfied with equality if* <sup>P</sup> *is a grid of points in* <sup>R</sup>*<sup>n</sup> with* <sup>√</sup>*<sup>k</sup> <sup>M</sup>*<sup>1</sup> *points on each dimension (so, M* <sup>=</sup> *<sup>M</sup><sup>n</sup> k* 1 *).*

**Proof.** Pick uniformly at random a point *<sup>X</sup><sup>n</sup>* = (*X*1,..., *Xn*) ∈ P. Then,

$$\mathbb{H}(X^n) = \log |\mathcal{P}|.\tag{40}$$

The sequence in (30) is monotonically decreasing, so *h* (*n*) *<sup>k</sup>* ≥ *h* (*n*) *<sup>n</sup>* , which is equivalent to

$$\binom{n-1}{k-1} \operatorname{H}(X^n) \le \sum\_{\substack{\mathcal{T} \subseteq [n] \colon |\mathcal{T}| = k}} \operatorname{H}(X\_{\mathcal{T}}).\tag{41}$$

Let S1, ... , S be the *k*-subsets of the set [*n*], ordered in a way such that *Mj* is the cardinality of the projection of the set P on the *k*-dimensional subspace whose coordinates are the elements of the subset S*j*. Then, (41) can be expressed in the form

$$\binom{n-1}{k-1} \text{ H}(X^n) \le \sum\_{j=1}^{\ell} \text{H}(X\_{\mathcal{S}\_j})\_\prime \tag{42}$$

and also

$$\mathbb{H}(X\_{\mathcal{S}\_j}) \le \log M\_{\mathfrak{j}}, \quad j \in [\ell], \tag{43}$$

since the entropy of a random variable is upper bounded by the logarithm of the number of its possible values. Combining (40), (42) and (43) gives

$$\binom{n-1}{k-1} \log |\mathcal{P}| \le \sum\_{j=1}^{\ell} \log M\_j. \tag{44}$$

Exponentiating both sides of (44) gives (38). In addition, using the identity ( *n <sup>k</sup>*) <sup>=</sup> *<sup>n</sup> k* ( *n*−1 *<sup>k</sup>*−1) gives (39) from (44). Finally, the sufficiency condition for equalities in (38) or (39) can be easily verified, which is obtained if <sup>P</sup> is a grid of points in <sup>R</sup>*<sup>n</sup>* with the same finite number of projections on each dimension.

*3.2. Connections to a Generalized Version of Shearer's Lemma and Other Results in the Literature* The next proposition is a known generalized version of Shearer's Lemma.

**Proposition 4.** *Let* <sup>Ω</sup> *be a finite set, let* {S*j*}*<sup>M</sup> <sup>j</sup>*=<sup>1</sup> *be a finite collection of subsets of* <sup>Ω</sup> *(with <sup>M</sup>* <sup>∈</sup> <sup>N</sup>*), and let f* : 2<sup>Ω</sup> <sup>→</sup> <sup>R</sup> *be a set function.*

*(a) If f is non-negative and submodular, and every element in* Ω *is included in at least d* ≥ 1 *of the subsets* {S*j*}*<sup>M</sup> <sup>j</sup>*=1*, then*

$$\sum\_{j=1}^{M} f(\mathcal{S}\_j) \ge df(\Omega). \tag{45}$$

*(b) If f is a rank function,* A ⊂ Ω*, and every element in* A *is included in at least d* ≥ 1 *of the subsets* {S*j*}*<sup>M</sup> <sup>j</sup>*=1*, then*

$$\sum\_{j=1}^{M} f(\mathcal{S}\_{j}) \ge df(\mathcal{A}).\tag{46}$$

The first part of Proposition 4 was pointed out in Section 1.5 of [35], and the second part of Proposition 4 is a generalization of Remark 1 and inequality (47) in [20]. We provide a (somewhat different) proof of Proposition 4a, as well as a self-contained proof of Proposition 4b in Appendix B.

Let {*Xi*}*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> be discrete random variables, and consider the set function *<sup>f</sup>* : <sup>2</sup>[*n*] <sup>→</sup> <sup>R</sup><sup>+</sup> which is defined as *<sup>f</sup>*(A) = <sup>H</sup>(*X*A) for all A ⊆ [*n*]. Since *<sup>f</sup>* is a rank function [25], Proposition 4 then specializes to Shearer's Lemma [7] and a modified version of this lemma (see Remark 1 of [20]).

In light of Proposition 1e and Proposition 4b, Corollaries 4 and 5 are obtained as follows.

**Corollary 4.** *Let* {*Xi*}*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *be independent discrete random variables,* {S*j*}*<sup>M</sup> <sup>j</sup>*=<sup>1</sup> *be subsets of* [*n*]*, and* A ⊆ [*n*]*. If each element in* <sup>A</sup> *belongs to at least d* <sup>≥</sup> <sup>1</sup> *of the sets* {S*j*}*<sup>M</sup> <sup>j</sup>*=1*, then*

$$d\ H\left(\sum\_{i\in\mathcal{A}} X\_i\right) \le \sum\_{j=1}^{M} \mathcal{H}\left(\sum\_{i\in\mathcal{S}\_j} X\_i\right). \tag{47}$$

*In particular , if every i* <sup>∈</sup> [*n*] *is included in at least d* <sup>≥</sup> <sup>1</sup> *of the subsets* {S*j*}*<sup>M</sup> <sup>j</sup>*=1*, then*

$$d\ H\left(\sum\_{i=1}^{n} X\_i\right) \le \sum\_{j=1}^{M} \mathcal{H}\left(\sum\_{i \in S\_j} X\_i\right). \tag{48}$$

**Remark 3.** *Inequality* (48) *is also a special case of [37] (Theorem 2), and they coincide if every element i* <sup>∈</sup> [*n*] *is included in a fixed number (d) of the subsets* {S*j*}*<sup>M</sup> <sup>j</sup>*=1*.*

A specialization of Corollary 4 gives the next result.

**Corollary 5.** *Let* {*Xi*}*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *be independent and discrete random variables with finite variances. Then, the following holds:*

*(a) For every k* ∈ [*n* − 1]*,*

$$\mathbb{H}\left(\sum\_{i=1}^{n}X\_{i}\right) \le \frac{1}{\binom{n-1}{k-1}} \sum\_{\substack{\mathcal{T} \subseteq [n]: \ |\mathcal{T}|=k}} \mathbb{H}\left(\sum\_{\omega \in \mathcal{T}} X\_{\omega}\right),\tag{49}$$

*and equivalently,*

$$\mathrm{N}\left(\sum\_{i=1}^{n}\mathrm{X}\_{i}\right)\leq\left\{\prod\_{\mathcal{T}\subseteq[n]:|\mathcal{T}|=k}\mathrm{N}\left(\sum\_{\omega\in\mathcal{T}}\mathrm{X}\_{\omega}\right)\right\}^{\frac{1}{\binom{n}{k-1}}}.\tag{50}$$

*(b) For every k* ∈ [*n* − 1]*,*

$$\mathrm{N}\left(\sum\_{i=1}^{n} X\_i\right) \le \frac{1}{\binom{n}{k}} \sum\_{\substack{\mathcal{T} \subseteq [n]: \ |\mathcal{T}| = k}} \mathrm{N}^{\mathbb{R}}\left(\sum\_{\omega \in \mathcal{T}} X\_{\omega}\right),\tag{51}$$

*where* (51) *is in general looser than* (50)*, with equivalence if* {*Xi*}*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *are i.i.d.; in particular,*

$$\mathcal{N}\left(\sum\_{i=1}^{n}\mathcal{X}\_{i}\right)\leq\left\{\prod\_{j=1}^{n}\mathcal{N}\left(\sum\_{i\neq j}\mathcal{X}\_{i}\right)\right\}^{\frac{1}{n-1}}\tag{52a}$$

$$\leq \frac{1}{n} \sum\_{j=1}^{n} \left\{ \mathbf{N} \left( \sum\_{i \neq j} X\_i \right) \right\}^{\frac{n}{n-1}}.\tag{52b}$$

**Proof.** Let {S*j*}*<sup>M</sup> <sup>j</sup>*=<sup>1</sup> be all the *k*-element subsets of Ω = [*n*] (with *M* = ( *n <sup>k</sup>*)). Then, every element *<sup>i</sup>* <sup>∈</sup> [*n*] belongs to *<sup>d</sup>* <sup>=</sup> *kM <sup>n</sup>* = ( *n*−1 *<sup>k</sup>*−1) such subsets, which then gives (49) as a special case of (48). Alternatively, (49) follows from Corollary 3b, which yields *m*(*n*) *<sup>k</sup>* <sup>≥</sup> *<sup>m</sup>*(*n*) *<sup>n</sup>* for all *k* ∈ [*n* − 1]. Exponentiating both sides of (49) gives (50). Inequality (51) is a loosened version of (50), which follows by invoking the AM-GM inequality (i.e., the geometric mean of nonnegative real numbers is less than or equal to their arithmetic mean, with equality between these two means if and only if these numbers are all equal), in conjunction with the identity *<sup>k</sup> n* ( *n <sup>k</sup>*) = ( *n*−1 *<sup>k</sup>*−1). Inequalities (50) and (51) are consequently equivalent if {*Xi*}*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> are i.i.d. random variables, and (52) is a specialized version of (50) and the loosened inequality (51) by setting *k* = *n* − 1.

The next remarks consider information inequalities in Corollaries 3–5, in light of Theorem 1 here, and some known results in the literature.

**Remark 4.** *Inequality* (49) *was derived by Madiman as a special case of Theorem 2 in [37]. The proof of Corollary 5a shows that* (49) *can be also derived in two different ways as special cases of both Theorem 1a and Proposition 4a.*

**Remark 5.** *Inequality* (51) *can be also derived as a special case of Theorem 1a, where f is the rank function in* (19)*, and <sup>g</sup>* : <sup>R</sup> <sup>→</sup> <sup>R</sup> *is given by <sup>g</sup>*(*x*) exp(2*nx*) *for all <sup>x</sup>* <sup>∈</sup> <sup>R</sup>*. It also follows from the monotonicity property in Corollary 3c, which yields w*(*n*) *<sup>k</sup>* (*n*) <sup>≥</sup> *<sup>w</sup>*(*n*) *<sup>n</sup>* (*n*) *for all k* <sup>∈</sup> [*<sup>n</sup>* <sup>−</sup> <sup>1</sup>]*.*

**Remark 6.** *The result in Theorem 8 of [31] is a special case of Theorem 1a here, which follows by taking the function g in Theorem 1a to be the identity function. The flexibility in selecting the function g in Theorem 1 enables to obtain a larger collection of information inequalities. This is in part reflected from a comparison of Corollary 3 here with Corollary 9 of [31]. More specifically, the findings about the monotonicity properties in* (30)*,* (31) *and* (33) *were obtained in Corollary 9 of [31], while relying on Theorem 8 of [31] and the sub/supermodularity properties of the considered Shannon information measures. It is noted, however, that the monotonicity results of the sequences* (34)*–*(37) *(Corollary 3c) are not implied by Theorem 8 of [31].*

**Remark 7.** *Inequality* (52) *forms a counterpart of an entropy power inequality by Artstein et al., (Theorem 3 of [40]), where for independent random variables* {*Xi*}*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *with finite variances:*

$$\mathbb{N}\left(\sum\_{i=1}^{n}\mathbf{X}\_{i}\right)\geq\frac{1}{n-1}\sum\_{j=1}^{n}\mathbb{N}\left(\sum\_{i\neq j}\mathbf{X}\_{i}\right).\tag{53}$$

*Inequality* (50)*, and also its looser version in* (51)*, form counterparts of the generalized inequality by Madiman and Barron, which reads (see inequality (4) in [41]):*

$$\mathrm{N}\left(\sum\_{i=1}^{n}X\_{i}\right) \ge \frac{1}{\binom{n-1}{k-1}} \sum\_{\substack{\mathcal{T}\subseteq[n]\colon|\mathcal{T}|=k}} \mathrm{N}\left(\sum\_{\omega\in\mathcal{T}} X\_{\omega}\right), \quad k \in [n-1].\tag{54}$$
