Abstract
The present paper offers, in its first part, a unified approach for the derivation of families of inequalities for set functions which satisfy sub/supermodularity properties. It applies this approach for the derivation of information inequalities with Shannon information measures. Connections of the considered approach to a generalized version of Shearer’s lemma, and other related results in the literature are considered. Some of the derived information inequalities are new, and also known results (such as a generalized version of Han’s inequality) are reproduced in a simple and unified way. In its second part, this paper applies the generalized Han’s inequality to analyze a problem in extremal graph theory. This problem is motivated and analyzed from the perspective of information theory, and the analysis leads to generalized and refined bounds. The two parts of this paper are meant to be independently accessible to the reader.
1. Introduction
Information measures and information inequalities are of fundamental importance and wide applicability in the study of feasibility and infeasibility results in information theory, while also offering very useful tools which serve to deal with interesting problems in various fields of mathematics [1,2]. The characterization of information inequalities has been of interest for decades (see, e.g., [3,4] and references therein), mainly triggered by their indispensable role in proving direct and converse results for channel coding and data compression for single and multi-user information systems. Information inequalities, which apply to classical and generalized information measures, have also demonstrated far-reaching consequences beyond the study of the coding theorems and fundamental limits of communication systems. One such remarkable example (among many) is the usefulness of information measures and information inequalities in providing information–theoretic proofs in the field of combinatorics and graph theory (see, e.g., [5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22]).
A basic property that is commonly used for the characterization of information inequalities relies on the nonnegativity of the (conditional and unconditional) Shannon entropy of discrete random variables, the nonnegativity of the (conditional and unconditional) relative entropy and the Shannon mutual information of general random variables, and the chain rules which hold for these classical information measures. A byproduct of these properties is the sub/supermodularity of some classical information measures, which also prove to be useful by taking advantage of the vast literature on sub/supermodular functions and polymatroids [22,23,24,25,26,27,28,29,30,31]. Another instrumental information inequality is the entropy power inequality, which dates back to Shannon [32]. It has been extensively generalized for different types of random variables and generalized entropies, studied in regard to its geometrical relations [33], and it has been also ubiquitously used for the analysis of various information–theoretic problems.
Among the most useful information inequalities are Han’s inequality [34], its generalized versions (e.g., [15,25,30,31]), and Shearer’s lemma [7] with its generalizations and refinements (e.g., [15,31,35]). In spite of their simplicity, these inequalities prove to be useful in information theory, and other diverse fields of mathematics and engineering (see, e.g., [6,35]). More specifically in regard to these inequalities, in Proposition 1 of [22], Madiman and Tetali introduced an information inequality which can be specialized to Han’s inequality, and which also refines Shearer’s lemma while also providing a counterpart result. In [30], Tian generalized Han’s inequality by relying on the sub/supermodularity of the unconditional/conditional Shannon entropy. Likewise, the work in [31] by Kishi et al., relies on the sub/supermodularity properties of Shannon information measures, and it provides refinements of Shearer’s lemma and Han’s inequality. Apart of the refinements of these classical and widely-used inequalities in [31], the suggested approach in the present work can be viewed in a sense as a (nontrivial) generalization and extension of a result in [31] (to be explained in Section 3.2).
This work is focused on the derivation of information inequalities via submodularity and nonnegativity properties, and on a problem in extremal graph theory whose analysis relies on an information inequality. The field of extremal graph theory, which is a subfield of extremal combinatorics, was among the early and fast developing branches of graph theory during the 20th century. Extremal graph theory explores the relations between properties of graphs such as its order, size, chromatic number or maximal and minimal degrees, under some constraints on the graph (by, e.g., considering graphs of a fixed order, and by also imposing a type of a forbidden subgraph). The interested reader is referred to the comprehensive textbooks [10,36] on the vast field of extremal combinatorics and extremal graph theory.
This paper suggests an approach for the derivation of families of inequalities for set functions, and it applies it to obtain information inequalities with Shannon information measures that satisfy sub/supermodularity and monotonicity properties. Some of the derived information inequalities are new, while some known results (such as the generalized version of Han’s inequality [25]) are reproduced as corollaries in a simple and unified way. This paper also applies the generalized Han’s inequality to analyze a problem in extremal graph theory, with an information–theoretic proof and interpretation. The analysis leads to some generalized and refined bounds in comparison to the insightful results in Theorems 4.2 and 4.3 of [6]. For the purpose of the suggested problem and analysis, the presentation here is self-contained.
The paper is structured as follows: Section 2 provides essential notation and preliminary material for this paper. Section 3 presents a new methodology for the derivation of families of inequalities for set functions which satisfy sub/supermodularity properties (Theorem 1). The suggested methodology is then applied in Section 3 for the derivation of information inequalities by relying on sub/supermodularity properties of Shannon information measures. Section 3 also considers connections of the suggested approach to a generalized version of Shearer’s lemma, and to other results in the literature. Most of the results in Section 3 are proved in Section 4. Section 5 applies the generalized Han’s inequality to a problem in extremal graph theory (Theorem 2). A byproduct of Theorem 2, which is of interest in its own right, is also analyzed in Section 5 (Theorem 3). The presentation and analysis in Section 5 is accessible to the reader, independently of the earlier material on information inequalities in Section 3 and Section 4. Some additional proofs, mostly for making the paper self-contained or for suggesting an alternative proof, are relegated to the appendices (Appendix A and Appendix B).
2. Preliminaries and Notation
The present section provides essential notation and preliminary material for this paper.
- denotes the set of natural numbers.
- denotes the set of real numbers, and denotes the set of nonnegative real numbers.
- Ø denotes the empty set.
- denotes the power set of a set (i.e., the set of all subsets of ).
- denotes the complement of a subset in .
- is an indicator of E; it is 1 if event E is satisfied, and zero otherwise.
- for every ;
- denotes an n-dimensional random vector;
- is a random vector for a nonempty subset ; if , then is an empty set, and conditioning on is void.
- Let X be a discrete random variable that takes its values on a set , and let be the probability mass function (PMF) of X. The Shannon entropy of X is given bywhere throughout this paper, we take all logarithms to base 2.
- The binary entropy function is given bywhere, by continuous extension, the convention is used.
- Let X and Y be discrete random variables with a joint PMF , and a conditional PMF of X given Y denoted by . The conditional entropy of X given Y is defined asand
- The mutual information between X and Y is symmetric in X and Y, and it is given by
- The conditional mutual information between two random variables X and Y, given a third random variable Z, is symmetric in X and Y and it is given by
- For continuous random variables, the sums in (1) and (3) are replaced with integrals, and the PMFs are replaced with probability densities. The entropy of a continuous random variable is named differential entropy.
- For an n-dimensional random vector , the entropy power of is given bywhere the base of the exponent is identical to the base of the logarithm in (1).
We rely on the following basic properties of the Shannon information measures:
- Conditioning cannot increase the entropy, i.e.,with equality in (8) if and only if X and Y are independent.
- Nonnegativity of the (conditional) mutual information: In light of (5) and (8), with equality if and only if X and Y are independent. More generally, with equality if and only if X and Y are conditionally independent given Z.
Let be a finite and non-empty set, and let be a real-valued set function (i.e., f is defined for all subsets of ). The following definitions are used.
Definition 1.
(Sub/Supermodular function). The set function is submodular if
Likewise, f is supermodular if is submodular.
An identical characterization of submodularity is the diminishing return property (see, e.g., Proposition 2.2 in [23]), where a set function is submodular if and only if
This means that the larger is the set, the smaller is the increase in f when a new element is added.
Definition 2.
(Monotonic function). The set function is monotonically increasing if
Likewise, f is monotonically decreasing if is monotonically increasing.
Definition 3.
(Polymatroid, ground set and rank function). Let be submodular and monotonically increasing set function with . The pair is called a polymatroid, Ω is called a ground set, and f is called a rank function.
Definition 4.
(Subadditive function). The set function is subadditive if, for all ,
A nonnegative and submodular set function is subadditive (this readily follows from (11) and (14)). The next proposition introduces results from [25,28,37]. For the sake of completeness, we provide a proof in Appendix A.
Proposition 1.
Let Ω be a finite and non-empty set, and let be a collection of discrete random variables. Then, the following holds:
- (a)
- The set function , given byis a rank function.
- (b)
- The set function , given byis supermodular, monotonically increasing, and .
- (c)
- The set function , given byis submodular, , but f is not a rank function. The latter holds since the equality , for all , implies that f is not a monotonic function.
- (d)
- Let be disjoint subsets, and let the entries of the random vector be conditionally independent given . Then, the set function given byis a rank function.
- (e)
- Let be independent random variables, and let be given byThen, f is a rank function.
The following proposition addresses the setting of general alphabets.
Proposition 2.
Proof.
The sub/supermodularity properties in Proposition 1 are preserved due to the nonnegativity of the (conditional) mutual information. The monotonicity property of the functions in (18) and (19) is preserved also in the general alphabet setting due to (A10) and (A14c), and the mutual information in (18) is nonnegative. □
Remark 1.
In contrast to the entropy of discrete random variables, the differential entropy of continuous random variables is not functionally submodular in the sense of Lemma A.2 in [38]. This refers to a different form of submodularity, which was needed by Tao [38] to prove sumset inequalities for the entropy of discrete random variables. A follow-up study in [39] by Kontoyiannis and Madiman required substantially new proof strategies for the derivation of sumset inequalities with the differential entropy of continuous random variables. The basic property which replaces the discrete functional submodularity is the data-processing property of mutual information [39]. In the context of the present work, where the commonly used definition of submodularity is used (see Definition 1), the Shannon entropy of discrete random variables and the differential entropy of continuous random variables are both submodular set functions.
We rely, in this paper, on the following standard terminology for graphs. An undirected graph G is an ordered pair where is a set of elements, and is a set of 2-element subsets (pairs) of V. The elements of V are called the vertices of G, and the elements of E are called the edges of G. We use the notation and for the sets of vertices and edges, respectively, in the graph G. The number of vertices in a finite graph G is called the order of G, and the number of edges is called the size of G. Throughout this paper, we assume that the graph G is undirected and finite; it is also assumed to be a simple graph, i.e., it has no loops (no edge connects a vertex in G to itself) and there are no multiple edges which connect a pair of vertices in G. If , then the vertices u and v are the two ends of the edge e. The elements u and v are adjacent vertices (neighbors) if they are connected by an edge in G, i.e., if .
3. Inequalities via Submodularity
3.1. A New Methodology
The present subsection presents a new methodology for the derivation of families of inequalities for set functions, and in particular inequalities with information measures. The suggested methodology relies, to large extent, on the notion of submodularity of set functions, and it is presented in the next theorem.
Theorem 1.
Let Ω be a finite set with . Let with , and . Let the sequence be given by
- (a)
- If f is submodular, and g is monotonically increasing and convex, then the sequence is monotonically decreasing, i.e.,In particular,
- (b)
- If f is submodular, and g is monotonically decreasing and concave, then the sequence is monotonically increasing.
- (c)
- If f is supermodular, and g is monotonically increasing and concave, then the sequence is monotonically increasing.
- (d)
- If f is supermodular, and g is monotonically decreasing and convex, then the sequence is monotonically decreasing.
Proof.
See Section 4.1. □
Corollary 1.
Let Ω be a finite set with , , and be convex and monotonically increasing. If
- f is a rank function,
- or there is such that with ,
- is a sequence such that for all with ,
then
and if , then
Proof.
See Section 4.2. □
Corollary 2.
Let Ω be a finite set with , and be submodular and nonnegative with . Then,
- (a)
- For andwithFor , (25) holds with regardless of the nonnegativity of f.
- (b)
- If f is also monotonically increasing (i.e., f is a rank function), then for
Proof.
See Section 4.3. □
Corollary 2 is next specialized to reproduce Han’s inequality [34], and a generalized version of Han’s inequality (Section 4 of [25]).
Let be a random vector with finite entropies for all . The set function , given by for all , is submodular [25] (see Proposition 1a and Proposition 2). From (25), the following holds:
- (a)
- Setting in (25) implies that, for all ,
- (b)
- Consequently, setting in (28) giveswhich gives Han’s inequality.
Further applications of Theorem 1 lead to the next corollary, which partially introduces some known results that have been proved on a case-by-case basis in Theorems 17.6.1–17.6.3 of [1] and Section 2 of [2]. In particular, the monotonicity properties of the sequences in (30) and (32)–(34) were proved in Theorems 1 and 2, and Corollaries 1 and 2 of [2]. Both known and new results are readily obtained here, in a unified way, from Theorem 1. The utility of one of these inequalities in extremal combinatorics is discussed in the continuation to this subsection (see Proposition 3), providing a natural generalization of a beautiful combinatorial result in Section 3.2 of [19].
Corollary 3.
Let be random variables with finite entropies. Then, the following holds:
- (a)
- The sequencesare monotonically decreasing in k. If are independent, then also the sequenceis monotonically decreasing in k.
- (b)
- The sequenceis monotonically increasing in k.
- (c)
- For every , the sequencesare monotonically decreasing in k. If are independent, then also the sequenceis monotonically decreasing in k.
Proof.
The finite entropies of assure that the entropies involved in the sequences (30)–(37) are finite. Item (a) follows from Theorem 1a, where the submodular set functions f which correspond to (30)–(32) are given in (15), (17) and (19), respectively, and g is the identity function on the real line. The identity is used for (32). Item (b) follows from Theorem 1c, where f is the supermodular function in (16) and g is the identity function on the real line. We next prove Item (c). The sequence (34) is monotonically decreasing by Theorem 1a, where f is the submodular function in (15), and is the monotonically increasing and convex function defined as for (with ). The sequence (35) is monotonically decreasing by Theorem 1d, where f is the supermodular function in (16), and is the monotonically decreasing and convex function defined as for . The sequence (36) is monotonically decreasing by Theorem 1a, where f is the submodular function in (17) and g is the monotonically increasing and convex function defined as for . Finally, the sequence (37) is monotonically decreasing by Theorem 1a, where f is the submodular function in (19) and g is the monotonically increasing and convex function defined as for . □
Remark 2.
From Proposition 2, since the proof of Corollary 3 only relies on the sub/supermodularity property of f, the random variables do not need to be discrete in Corollary 3. In the reproduction of Han’s inequality as an application of Corollary 2, the random variables do not need to be discrete as well since f is not required to be nonnegative if (only the submodularity of f in (15) is required, which holds due to Proposition 2).
The following result exemplifies the utility of the monotonicity result of the sequence (30) in extremal combinatorics. It also generalizes the result in Section 3.2 of [19] for an achievable upper bound on the cardinality of a finite set in the three-dimensional Euclidean space, expressed as a function of its number of projections on each of the planes and . The next result provides an achievable upper bound on the cardinality of a finite set of points in an n-dimensional Euclidean space, expressed as a function of its number of projections on each of the k-dimensional Euclidean subspaces with an arbitrary .
Proposition 3.
Let be a finite set of points in the n-dimensional Euclidean space with . Let , and . Let be the projections of on each of the k-dimensional subspaces of , and let for all . Then,
Let , and for all . An equivalent form of (38) is given by the inequality
Moreover, if and , then (38) and (39) are satisfied with equality if is a grid of points in with points on each dimension (so, ).
Proof.
Pick uniformly at random a point . Then,
The sequence in (30) is monotonically decreasing, so , which is equivalent to
Let be the k-subsets of the set , ordered in a way such that is the cardinality of the projection of the set on the k-dimensional subspace whose coordinates are the elements of the subset . Then, (41) can be expressed in the form
and also
since the entropy of a random variable is upper bounded by the logarithm of the number of its possible values. Combining (40), (42) and (43) gives
Exponentiating both sides of (44) gives (38). In addition, using the identity gives (39) from (44). Finally, the sufficiency condition for equalities in (38) or (39) can be easily verified, which is obtained if is a grid of points in with the same finite number of projections on each dimension. □
3.2. Connections to a Generalized Version of Shearer’s Lemma and Other Results in the Literature
The next proposition is a known generalized version of Shearer’s Lemma.
Proposition 4.
Let Ω be a finite set, let be a finite collection of subsets of Ω (with ), and let be a set function.
- (a)
- If f is non-negative and submodular, and every element in Ω is included in at least of the subsets , then
- (b)
- If f is a rank function, , and every element in is included in at least of the subsets , then
The first part of Proposition 4 was pointed out in Section 1.5 of [35], and the second part of Proposition 4 is a generalization of Remark 1 and inequality (47) in [20]. We provide a (somewhat different) proof of Proposition 4a, as well as a self-contained proof of Proposition 4b in Appendix B.
Let be discrete random variables, and consider the set function which is defined as for all . Since f is a rank function [25], Proposition 4 then specializes to Shearer’s Lemma [7] and a modified version of this lemma (see Remark 1 of [20]).
In light of Proposition 1e and Proposition 4b, Corollaries 4 and 5 are obtained as follows.
Corollary 4.
Let be independent discrete random variables, be subsets of , and . If each element in belongs to at least of the sets , then
In particular, if every is included in at least of the subsets , then
Remark 3.
Inequality (48) is also a special case of [37] (Theorem 2), and they coincide if every element is included in a fixed number (d) of the subsets .
A specialization of Corollary 4 gives the next result.
Corollary 5.
Let be independent and discrete random variables with finite variances. Then, the following holds:
- (a)
- For every ,and equivalently,
- (b)
- For every ,where (51) is in general looser than (50), with equivalence if are i.i.d.; in particular,
Proof.
Let be all the k-element subsets of (with ). Then, every element belongs to such subsets, which then gives (49) as a special case of (48). Alternatively, (49) follows from Corollary 3b, which yields for all . Exponentiating both sides of (49) gives (50). Inequality (51) is a loosened version of (50), which follows by invoking the AM-GM inequality (i.e., the geometric mean of nonnegative real numbers is less than or equal to their arithmetic mean, with equality between these two means if and only if these numbers are all equal), in conjunction with the identity . Inequalities (50) and (51) are consequently equivalent if are i.i.d. random variables, and (52) is a specialized version of (50) and the loosened inequality (51) by setting . □
The next remarks consider information inequalities in Corollaries 3–5, in light of Theorem 1 here, and some known results in the literature.
Remark 4.
Inequality (49) was derived by Madiman as a special case of Theorem 2 in [37]. The proof of Corollary 5a shows that (49) can be also derived in two different ways as special cases of both Theorem 1a and Proposition 4a.
Remark 5.
Remark 6.
The result in Theorem 8 of [31] is a special case of Theorem 1a here, which follows by taking the function g in Theorem 1a to be the identity function. The flexibility in selecting the function g in Theorem 1 enables to obtain a larger collection of information inequalities. This is in part reflected from a comparison of Corollary 3 here with Corollary 9 of [31]. More specifically, the findings about the monotonicity properties in (30), (31) and (33) were obtained in Corollary 9 of [31], while relying on Theorem 8 of [31] and the sub/supermodularity properties of the considered Shannon information measures. It is noted, however, that the monotonicity results of the sequences (34)–(37) (Corollary 3c) are not implied by Theorem 8 of [31].
Remark 7.
Inequality (52) forms a counterpart of an entropy power inequality by Artstein et al., (Theorem 3 of [40]), where for independent random variables with finite variances:
Inequality (50), and also its looser version in (51), form counterparts of the generalized inequality by Madiman and Barron, which reads (see inequality (4) in [41]):
4. Proofs
The present section provides proofs of (most of the) results in Section 3.
4.1. Proof of Theorem 1
We prove Item (a), and then readily prove Items (b)–(d). Define the auxiliary sequence
averaging f over all k-element subsets of the n-element set . Let the permutation be arbitrary. For , let
which are k-element subsets of with elements in common. Then,
which holds by the submodularity of f (by assumption), i.e.,
Averaging the terms on both sides of (58) over all the permutations of gives
and similarly
with since by assumption . Combining (58)–(60) gives
which is rewritten as
Consequently, it follows that
where equality (63a) holds since , and inequality (63d) holds by (62). The sequence is therefore monotonically decreasing, and in particular
Combining (65) and (66) gives
Since there are n subsets with , rearranging terms in (67) gives (25) for ; it is should be noted that, for , the set function f does not need to be nonnegative for the satisfiability of (25) (however, this will be required for ).
We next prove Item (a). By (20), for ,
Fix , and let be the restriction of the function f to the subsets of . Then, is a submodular set function with ; similarly to (55), (65) and (66) with f replaced by , and n replaced by k, the sequence is monotonically decreasing. Hence, for ,
where
Combining (69) and (70) gives
and, since by assumption g is monotonically increasing,
From (68) and (72), for all ,
and
where (74a) holds by invoking Jensen’s inequality to the convex function g; (74b) holds since the term of the inner summation in the right-hand side of (74a) does not depend on , so for every -element subset , there are possibilities to extend it by a single element into a k-element subset ; (74e) is straightforward, and (74f) holds by the definition in (20). This proves Item (a).
Item (b) follows from Item (a), and similarly Item (d) follows from Item (c), by replacing g with . Item (c) is next verified. If f is a supermodular set function with , then (57) and (58), and (61)–(63) hold with flipped inequality signs. Hence, if g is monotonically decreasing, then inequalities (72) and (73) are reversed; finally, if g is also concave, then (by Jensen’s inequality) (74) holds with a flipped inequality sign, which proves Item (c).
4.2. Proof of Corollary 1
By assumption is a rank function, which implies that for every . Since (by definition) f is submodular with , and (by assumption) the function g is convex and monotonically increasing, then (from (22), while replacing k with )
By the second assumption in Corollary 1, for positive values of x that are sufficiently close to zero, we have
- if ;
- scales like if with for some .
In both cases, it follows that
In light of (75) and (76), and since (by assumption) , it follows that
By the following upper and lower bounds on the binomial coefficient:
the combination of equalities (77) and (78) gives equality (23). Equality (24) holds as a special case of (23), under the assumption that .
4.3. Proof of Corollary 2
For , Corollary 2 is proved in (67). Fix , and let be
which is monotonically increasing and convex on the real line. By Theorem 1a,
Since by assumption f is nonnegative, it follows from (20) and (79) that
Combining (80)–(81) and rearranging terms gives, for all ,
where equality (82b) holds by the identity . This further gives
where equality (83c) holds by the definition in (26). This proves (25) for .
We next prove Item (b). The function f is (by assumption) a rank function, which yields its nonnegativity. Hence, the leftmost inequality in (27) holds by (82). The rightmost inequality in (27) also holds since is monotonically increasing, which yields for all . For and (in particular, for ),
where (84) holds since there are k-element subsets of the n-element set , and every summand (with ) is upper bounded by .
5. A Problem in Extremal Graph Theory
This section applies the generalization of Han’s inequality in (28) to the following problem.
5.1. Problem Formulation
Let , with , and let . Let be an un-directed simple graph with vertex set , and pairs of vertices in G are adjacent (i.e., connected by an edge) if and only if they are represented by vectors in whose Hamming distance is less than or equal to :
The question is how large can the size of G be (i.e., how many edges it may have) as a function of the cardinality of the set , and possibly based also on some basic properties of the set ?
This problem and its related analysis generalize and refine, in a nontrivial way, the bound in Theorem 4.2 of [6] which applies to the special case where . The motivation for this extension is next considered.
5.2. Problem Motivation
Constraint coding is common in many data recording systems and data communication systems, where some sequences are more prone to error than others, and a constraint on the sequences that are allowed to be recorded or transmitted is imposed in order to reduce the likelihood of error. Given such a constraint, it is then necessary to encode arbitrary user sequences into sequences that obey the constraint.
From an information–theoretic perspective, this problem can be interpreted as follows. Consider a communication channel W with input alphabet and output alphabet , and suppose that a constraint is imposed on the sequences that are allowed to be transmitted over the channel. As a result of such a constraint, the information sequences are first encoded into codewords by an error-correction encoder, followed by a constrained encoder that maps these codewords into constrained sequences. Let them be binary n-length sequences from the set . A channel modulator then modulates these sequences into symbols from , and the received sequences at the channel output, with alphabet , are first demodulated, and then decoded (in a reverse order of the encoding process) by the constrained decoder and error-correction decoder.
Consider a channel model where pairs of binary n-length sequences from the set whose Hamming distance is less than or equal to a fixed number share a common output sequence with positive probability, whereas this halts to be the case if the Hamming distance is larger than . In other words, we assume that by design, pairs of sequences in whose Hamming distance is larger than cannot be confused in the sense that there does not exist a common output sequence which may be possibly received (with positive probability) at the channel output.
The confusion graph G that is associated with this setup is an undirected simple graph whose vertices represent the n-length binary sequences in , and pairs of vertices are adjacent if and only if the Hamming distance between the sequences that they represent is not larger than . The size of G (i.e., its number of edges) is equal to the number of pairs of sequences in which may not be distinguishable by the decoder.
Further motivation for studying this problem is considered in the continuation (see Section 5.5).
5.3. Analysis
We next derive an upper bound on the size of the graph G. Let be chosen uniformly at random from the set , and let be the PMF of . Then,
which implies that
The graph G is an un-directed and simple graph with a vertex set (i.e., the vertices of G are in one-to-one correspondence with the binary vectors in the set ). Its set of edges are the edges which connect all pairs of vertices in G whose Hamming distance is less than or equal to . For , let be the set of edges in G which connect all pairs of vertices in G whose Hamming distance is equal to d, so
For , , and integers such that , let
be a subvector of of length , obtained by dropping the bits of in positions ; if , then , and is an empty vector. By the chain rule for the Shannon entropy,
where equality (90c) holds by (86).
For , , and integers such that , let
where the bits of in position are flipped (in contrast to where the bits of in these positions are dropped), so and . Likewise, if satisfy , then there exist integers such that where (i.e., the integers are the positions (in increasing order) where the vectors and differ).
Let us characterize the set by its cardinality, and the following two natural numbers:
- (a)
- If and for any such that , then there are at least vectors whose subvectors coincide with , i.e., the integer satisfiesBy definition, the integer always exists, andIf no information is available about the value of , then it can be taken by default to be equal to 2 (since by assumption the two vectors and satisfy the equality ).
- (b)
- If and for any such that , then there are at least vectors whose subvectors coincide with , i.e., the integer satisfiesBy definition, the integer always exists, andLikewise, if no information is available about the value of , then it can be taken by default to be equal to 1 (since satisfies the requirement about its subvector in (94)).
In general, it would be preferable to have the largest possible values of and (i.e., those satisfying inequalities (92) and (94) with equalities, for obtaining a better upper bound on the size of G (this point will be clarified in the sequel). If , then and are the best possible constants (this holds by the definitions in (92) and (94), which can be also verified by the coincidence of the upper and lower bounds in (93) for , as well as those in (95)).
If , then we distinguish between the following two cases:
- If , thenwhich holds by the way that is defined in (92), and since is randomly selected to be equiprobable in the set .
- If , thenwhich holds by the way that is defined in (94), and since is equiprobable on .
For and , it follows from (90), (96) and (97) that
which, by summing on both sides of inequality (98) over all integers such that , yields
Equality holds in (99) if the minima on the RHS of (92) and (94) are attained by any element in these sets, and if (92) and (94) are satisfied with equalities (i.e., and are the maximal integers to satisfy inequalities (92) and (94) for the given set ). Hence, this equality holds in particular for , with the constants and .
The double sum in the first term on the RHS of (99) is equal to
since every pair of adjacent vertices in that refer to vectors in whose Hamming distance is equal to d is of the form and , and vice versa, and every edge is counted twice in the double summation on the LHS of (100). For calculating the double sum in the second term on the RHS of (99), we first calculate the sum of these two double summations:
so, subtracting (100) from (101d) gives that
Substituting (100) and (102) into the RHS of (99) gives that, for all ,
with the same necessary and sufficient condition for equality in (103a) as in (99). (Recall that it is in particular an equality for , where in this case and .)
By the generalized Han’s inequality in (28),
where equality (104b) holds by (87). Combining (103) and (104) yields
and, by the identity , we get
This upper bound is specialized, for , to Theorem 4.2 of [6] (where, by definition, and ). This gives that the number of edges in G, connecting pairs of vertices which refer to binary vectors in whose Hamming distance is 1 from each other, satisfies
It is possible to select, by default, the values of the integers and to be equal to 2 and 1, respectively, independently of the value of . It therefore follows that the upper bound in (106) can be loosened to
This shows that the bound in (108) generalizes the result in Theorem 4.2 of [6], based only on the knowledge of the cardinality of . Furthermore, the bound (108) can be tightened by the refined bound (106) if the characterization of the set allows one to assert values for and that are larger than the trivial values of 2 and 1, respectively.
In light of (88) and (108), the number of edges in the graph G satisfies
and if , then it follows that
Indeed, the transition from (109) to (110) holds by the inequality
where the latter bound is asymptotically tight in the exponent of n (for sufficiently large values of n).
5.4. Comparison of Bounds
We next consider the tightness of the refined bound (106) and the loosened bound (108). Since is a subset of the n-dimensional cube , every point in has at most neighbors in with Hamming distance d, so
Comparing the bound on the RHS of (106) with the trivial bound in (112) shows that the former bound is useful if and only if
which is obtained by relying on the identity . Rearranging terms in (113) gives the necessary and sufficient condition
which is independent of the value of . Since, by definition, , inequality (114) is automatically satisfied if the stronger condition
is imposed. The latter also forms a necessary and sufficient condition for the usefulness of the looser bound on the RHS of (108) in comparison to (112).
Example 1.
Suppose that the set is characterized by the property that for all , with a fixed integer , if and then all vectors which coincide with and in their agreed positions are also included in the set . Then, for all , we get by definition that , which yields . Setting and the default value on the RHS of (106) gives
Unless , the upper bound on the RHS of (116d) is strictly smaller than the trivial upper bound on the RHS of (112). This improvement is consistent with the satisfiability of the (necessary and sufficient) condition in (115), which is strictly satisfied since
On the other hand, the looser upper bound on the RHS of (108) gives
which is d times larger than the refined bound on the RHS of (116d) (since it is based on the exact value of for the set , rather than taking the default value of 2), and it is worse than the trivial bound if and only if . The latter finding is consistent with (115).
This exemplifies the utility of the refined upper bound on the RHS of (106) in comparison to the bound on the RHS of (108), where the latter generalizes Theorem 4.2 of [6] from the case where to all . As it is explained above, this refinement is irrelevant in the special case where , though it proves to be useful in general for (as it is exemplified here).
The following theorem introduces the results of our analysis (so far) in the present section.
Theorem 2.
Let , with , and let . Let be an un-directed, simple graph with vertex set , and edges connecting pairs of vertices in G which are represented by vectors in whose Hamming distance is less than or equal to τ. For , let be the set of edges in G which connect all pairs of vertices that are represented by vectors in whose Hamming distance is equal to d (i.e., ).
- (a)
- For , let the integers and (be, preferably, the maximal possible values to) satisfy the requirements in (92) and (94), respectively. Then,
- (b)
- A loosened bound, which only depends on the cardinality of the set , is obtained by setting the default values and . It is then given byand, if , then the (overall) number of edges in G satisfies
- (c)
- The refined upper bound on the RHS of (119) and the loosened upper bound on the RHS of (120) improve the trivial bound , if and only if or , respectively (see Example 1).
5.5. Influence of Fixed-Size Subsets of Bits
The result in Theorem 4.2 of [6], which is generalized and refined in Theorem 2 here, is turned to study the total influence of the n variables of an equiprobable random vector on a subset . To this end, let denote the vector where the bit at the i-th position of is flipped, so for all . Then, the influence of the i-th variable is defined as
and their total influence is defined to be the sum
As it is shown in Chapters 9 and 10 of [6], influences of subsets of the binary hypercube have far reaching consequences in the study of threshold phenomena, and many other areas. As a corollary of (107), it is obtained in Theorem 4.3 of [6] that, for every subset ,
where by the equiprobable distribution of over .
In light of Theorem 2, the same approach which is used in Section 4.4 of [6] for the transition from (107) to (124) can be also used to obtain, as a corollary, a lower bound on the average total influence over all subsets of d variables. To this end, let be integers such that , and the influence of the variables in positions be given by
Then, let the average influence of subsets of d variables be defined as
Hence, by (123) and (126), for every subset . Let
be the set of ordered pairs of sequences , where are of Hamming distance d from each other, with and . By the equiprobable distribution of on , we get
Since every point in has neighbors of Hamming distance d in the set , it follows that
where G is introduced in Theorem 2, and is the set of edges connecting pairs of vertices in G which are represented by vectors in of Hamming distance d. The multiplication by 2 on the RHS of (129) is because every edge whose two endpoints are in the set is counted twice. Hence, by (106) and (129),
and the lower bound on the RHS of (130d) is positive if and only if (see also (114)). This gives from (128) that the average influence of subsets of d variables satisfies
Note that by setting , and the default values and on the RHS of (131c) gives the total influence of the n variables satisfies, for all ,
which is then specialized to the result in (Theorem 4.3 of [6], see (124)). This gives the following result.
Theorem 3.
Let be an equiprobable random vector over the set , let and . Then, the average influence of subsets of d variables of , as it is defined in (126), is lower bounded as follows:
where , and the integers and are introduced in Theorem 2. Similarly to the refined upper bound in Theorem 2, the lower bound on the RHS of (133) is informative (i.e., positive) if and only if . The lower bound on the RHS of (133) can be loosened (by setting the default values and ) to
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Conflicts of Interest
The author declares no conflict of interest.
Appendix A. Proof of Proposition 1
For completeness, we prove Proposition 1 which introduces results from [25,28,37].
Let be a non-empty finite set, and let be a collection of discrete random variables. We first prove Item (a), showing that the entropy set function in (15) is a rank function.
- .
- Submodularity: If , thenwhich givesThis proves the submodularity of f, while also showing thati.e., the rightmost side of (A2) holds with equality if and only if and are conditionally independent given .
- Monotonicity: If , thenso f is monotonically increasing.
We next prove Item (b). Consider the set function f in (16).
- , and .
- Supermodularity: If , thenwhere inequality (A5d) holds since the entropy function in (15) is submodular (by Item (a)).
- Monotonicity: If , thenso f is monotonically increasing.
Item (c) follows easily from Items (a) and (b). Consider the set function in (17). Then, for all , , so f is expressed as a difference of a submodular function and a supermodular function, which gives a submodular function. Furthermore, ; by the symmetry of the mutual information, for all , so f is not monotonic.
We next prove Item (d). Consider the set function in (18), and we need to prove that f is submodular under the conditions in Item (d) where are disjoint subsets, and the entries of the random vector are conditionally independent given .
- .
- Submodularity: If , thenwhere equality (A7d) holds by the proof of Item (a) (see (A2)). By the assumption on the conditional independence of the random variables given , we getConsequently, combining (A7) and (A8) giveswhere the inequality (A9e) holds with equality if and only if and are conditionally independent given .
- Monotonicity: If , thenso f is monotonically increasing.
We finally prove Item (e), where it is needed to show that the entropy of a sum of independent random variables is a rank function. Let be the set function as given in (19).
- .
- Submodularity: Let . DefineFrom the independence of the random variables , it follows that and W are independent. Hence, we getandCombining (A12) and (A13) gives (11).
- Monotonicity: If , then since are independent random variables, (A11) implies that U and W are independent and . Hence,
This completes the proof of Proposition 1.
Appendix B. Proof of Proposition 4
Lemma A1.
Let (with ) be a sequence of sets that is not a chain (i.e., there is no permutation such that ). Consider a recursive process where, at each step, a pair of sets that are not related by inclusion is replaced with their intersection and union. Then, there exists such a recursive process that leads to a chain in a finite number of steps.
Proof.
The lemma is proved by mathematical induction on ℓ. It holds for since , and the process halts in a single step. Suppose that the lemma holds with a fixed , and for an arbitrary sequence of ℓ sets which is not a chain. We aim to show that it also holds for every sequence of sets which is not a chain. Let be such an arbitrary sequence of sets, and consider the subsequence of the first ℓ sets . If it is not a chain, then (by the induction hypothesis) there exists a recursive process as above which enables to transform it into a chain in a finite number of steps, i.e., we get a chain . If or , then we get a chain of sets. Otherwise, by proceeding with the recursive process where and are replaced with their intersection and union, consider the sequence
By the induction hypothesis, the first ℓ sets in this sequence can be transformed into a chain (in a finite number of steps) by a recursive process as above; this gives a chain of the form . The first ℓ sets in (A15) are all included in , so every combination of unions and intersections of these ℓ sets is also included in . Hence, the considered recursive process leads to a chain of the form
where the last inclusion in (A16) holds since . The claim thus holds for if it holds for a given ℓ, and it holds for , it therefore holds by mathematical induction for all integers . □
We first prove Proposition 4a. Suppose that there is a permutation such that is a chain. Since every element in is included in at least d of these subsets, then it should be included in (at least) the d largest sets of this chain, so for every . Due to the non-negativity of f, it follows that
Otherwise, if we cannot get a chain by possibly permuting the subsets in the sequence , consider a pair of subsets and that are not related by inclusion, and replace them with their intersection and union. By the submodularity of f,
For all , let be the number of indices such that . By replacing and with and , the set of values stays unaffected (indeed, if and , then it belongs to their intersection and union; if belongs to only one of the sets and , then and ; finally, if and , then it does not belong to their intersection and union). Now, consider the recursive process in Lemma A1. Since the profile of the number of inclusions of the elements in is preserved in each step of the recursive process in Lemma A1, it follows that every element in stays to belong to at least d sets in the chain which is obtained at the end of this recursive process. Moreover, in light of (A18), in every step of the recursive process in Lemma A1, the sum in the LHS of (A18) cannot increase. Inequality (45) therefore finally follows from the earlier part of the proof for a chain (see (A17)).
We next prove Proposition 4b. Let , and suppose that every element in is included in at least of the subsets . For all , define , and consider the sequence of subsets of . If f is a rank function, then it is monotonically increasing, which yields
Each element of is also included in at least d of the subsets (by construction, and since (by assumption) each element in is included in at least d of the subsets ). By the non-negativity and submodularity of f, Proposition 4a gives
Combining (A19) and (A20) yields (46). This completes the proof of Proposition 4.
Remark A1.
Lemma A1 is weaker than a claim that, in every recursive process as in Lemma A1, the number of pairs of sets that are not related by inclusion is strictly decreasing at each step. Lemma A1 is, however, sufficient for our proof of Proposition 4a.
References
- Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
- Dembo, A.; Cover, T.M.; Thomas, J.A. Information theoretic inequalities. IEEE Trans. Inf. Theory 1991, 37, 1501–1518. [Google Scholar] [CrossRef] [Green Version]
- Chan, T. Recent progresses in characterising information inequalities. Entropy 2011, 13, 379–401. [Google Scholar] [CrossRef]
- Martin, S.; Padró, C.; Yang, A. Secret sharing, rank inequalities, and information inequalities. IEEE Trans. Inf. Theory 2016, 62, 599–610. [Google Scholar] [CrossRef]
- Babu, S.A.; Radhakrishnan, J. An entropy-based proof for the Moore bound for irregular graphs. In Perspectives on Computational Complexity; Agrawal, M., Arvind, V., Eds.; Birkhäuser: Cham, Switzerland, 2014; pp. 173–182. [Google Scholar]
- Boucheron, S.; Lugosi, G.; Massart, P. Concentration Inequalities - A Nonasymptotic Theory of Independence; Oxford University Press: Oxford, UK, 2013. [Google Scholar]
- Chung, F.R.K.; Graham, L.R.; Frankl, P.; Shearer, J.B. Some intersection theorems for ordered sets and graphs. J. Comb. Theory Ser. A 1986, 43, 23–37. [Google Scholar] [CrossRef] [Green Version]
- Erdos, P.; Rényi, A. On two problems of information theory. Publ. Math. Inst. Hung. Acad. Sci. 1963, 8, 241–254. [Google Scholar]
- Friedgut, E. Hypergraphs, entropy and inequalities. Am. Math. Mon. 2004, 111, 749–760. [Google Scholar] [CrossRef] [Green Version]
- Jukna, S. Extremal Combinatorics with Applications in Computer Science, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
- Kaced, T.; Romashchenko, A.; Vereshchagin, N. A conditional information inequality and its combinatorial applications. IEEE Trans. Inf. Theory 2018, 64, 3610–3615. [Google Scholar] [CrossRef]
- Kahn, J. An entropy approach to the hard-core model on bipartite graphs. Comb. Comput. 2001, 10, 219–237. [Google Scholar] [CrossRef]
- Kahn, J. Entropy, independent sets and antichains: A new approach to Dedekind’s problem. Proc. Am. Math. Soc. 2001, 130, 371–378. [Google Scholar] [CrossRef]
- Madiman, M.; Marcus, A.W.; Tetali, P. Entropy and set cardinality inequalities for partition-determined functions. Random Struct. Algorithms 2012, 40, 399–424. [Google Scholar] [CrossRef] [Green Version]
- Madiman, M.; Marcus, A.W.; Tetali, P. Information–theoretic inequalities in additive combinatorics. In Proceedings of the 2010 IEEE Information Theory Workshop, Cairo, Egypt, 6–8 January 2010. [Google Scholar]
- Pippenger, N. An information–theoretic method in combinatorial theory. J. Comb. Ser. A 1977, 23, 99–104. [Google Scholar] [CrossRef] [Green Version]
- Pippenger, N. Entropy and enumeration of boolean functions. IEEE Trans. Inf. Theory 1999, 45, 2096–2100. [Google Scholar] [CrossRef]
- Radhakrishnan, J. An entropy proof of Bregman’s theorem. J. Comb. Theory Ser. A 1997, 77, 161–164. [Google Scholar] [CrossRef] [Green Version]
- Radhakrishnan, J. Entropy and counting. In Computational Mathematics, Modelling and Algorithms; Narosa Publishers: New Delhi, India, 2001; pp. 1–25. [Google Scholar]
- Sason, I. A generalized information–theoretic approach for bounding the number of independent sets in bipartite graphs. Entropy 2021, 23, 270. [Google Scholar] [CrossRef]
- Sason, I. Entropy-based proofs of combinatorial results on bipartite graphs. In Proceedings of the 2021 IEEE International Symposium on Information Theory, Melbourne, Australia, 12–20 July 2021; pp. 3225–3230. [Google Scholar]
- Madiman, M.; Tetali, P. Information inequalities for joint distributions, interpretations and applications. IEEE Trans. Inf. Theory 2010, 56, 2699–2713. [Google Scholar] [CrossRef]
- Bach, F. Learning with submodular functions: A convex optimization perspective. Found. Trends Mach. Learn. 2013, 6, 145–373. [Google Scholar] [CrossRef] [Green Version]
- Chen, Q.; Cheng, M.; Bai, B. Matroidal entropy functions: A quartet of theories of information, matroid, design and coding. Entropy 2021, 23, 323. [Google Scholar] [CrossRef]
- Fujishige, S. Polymatroidal dependence structure of a set of random variables. Inf. Control. 1978, 39, 55–72. [Google Scholar] [CrossRef] [Green Version]
- Fujishige, S. Submodular Functions and Optimization, 2nd ed.; Annals of Discrete Mathematics Series; Elsevier: Amsterdam, The Netherlands, 2005; Volume 58. [Google Scholar]
- Iyer, R.; Khargonkar, N.; Bilems, J.; Asnani, H. Generalized submodular information measures: Theoretical properties, examples, optimization algorithms, and applications. IEEE Trans. Inf. Theory 2022, 68, 752–781. [Google Scholar] [CrossRef]
- Krause, A.; Guestrin, C. Near-optimal nonmyopic value of information in graphical models. In Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (UAI 2005), Edinburgh, UK, 26–29 July 2005; pp. 324–331. [Google Scholar]
- Lovász, L. Submodular functions and convexity. In Mathematical Programming The State of the Art; Bachem, A., Korte, B., Grotschel, M., Eds.; Springer: Berlin/Heidelberg, Germany, 1983; pp. 235–257. [Google Scholar]
- Tian, C. Inequalities for entropies of sets of subsets of random variables. In Proceedings of the 2011 IEEE International Symposium on Information Theory, Saint Petersburg, Russia, 31 July–5 August 2011; pp. 1950–1954. [Google Scholar]
- Kishi, Y.; Ochiumi, N.; Yanagida, M. Entropy inequalities for sums over several subsets and their applications to average entropy. In Proceedings of the 2014 IEEE International Symposium on Information Theory (ISIT 2014), Honolulu, HI, USA, 30 June–4 July 2014; pp. 2824–2828. [Google Scholar]
- Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 379–423, 623–656. [Google Scholar] [CrossRef] [Green Version]
- Madiman, M.; Mellbourne, J.; Xeng, P. Forward and reverse entropy power inequalities in convex geometry. In Convexity and Concentration; Carlen, E., Madiman, M., Werner, E.M., Eds.; IMA Volumes in Mathematics and Its Applications; Springer: Berlin/Heidelberg, Germany, 2017; Volume 161, pp. 427–485. [Google Scholar]
- Han, T.S. Nonnegative entropy measures of multivariate symmetric correlations. Inf. Control. 1978, 36, 133–156. [Google Scholar] [CrossRef] [Green Version]
- Polyanskiy, Y.; Wu, Y. Lecture Notes on Information Theory, version 5. Available online: http://people.lids.mit.edu/yp/homepage/data/itlectures_v5.pdf (accessed on 15 May 2019).
- Bollobás, B. Extremal Graph Theory; Academic Press: Cambridge, MA, USA, 1978. [Google Scholar]
- Madiman, M. On the entropy of sums. In Proceedings of the 2008 IEEE Information Theory Workshop, Porto, Portugal, 5–9 May 2008. [Google Scholar]
- Tao, T. Sumset and inverse sumset theory for Shannon entropy. Comb. Comput. 2010, 19, 603–639. [Google Scholar] [CrossRef] [Green Version]
- Kontoyiannis, I.; Madiman, M. Sumset and inverse sumset inequalities for differential entropy and mutual information. IEEE Trans. Inf. Theory 2014, 60, 4503–4514. [Google Scholar] [CrossRef] [Green Version]
- Artstein, S.; Ball, K.M.; Barthe, F.; Naor, A. Solution of Shannon’s problem on the monotonicity of entropy. J. Am. Soc. 2004, 17, 975–982. [Google Scholar] [CrossRef]
- Madiman, M.; Barron, A. Generalized entropy power inequalities and monotonicity properties of information. IEEE Trans. Inf. Theory 2007, 53, 2317–2329. [Google Scholar] [CrossRef] [Green Version]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).