1. Introduction
The entropy power inequality (EPI) dates back to Shannon’s seminal paper [
1] and has a long history [
2]. The link with the Rényi entropy was first made by Dembo, Cover and Thomas [
3] in connection with Young’s convolutional inequality with sharp constants, where Shannon’s EPI is obtained by letting the Rényi entropy orders tend to one ([
4] Theorem 17.8.3).
The Rényi entropy [
5] was first defined as a generalization of Shannon’s entropy for
discrete variables, when looking for the most general definition of information measures that would preserve the additivity for independent events. It has found many applications such as source coding [
6], hypothesis testing [
7], channel coding [
8] and guessing [
9]. The (differential) Rényi entropy considered in this paper (Definition 2) generalizes the (differential) Shannon’s entropy for continuous variables. It was first considered in [
3] to make the transition between the entropy-power and the Brunn–Minkowski inequalities. It has also been applied to deconvolution problems [
10]. A definition of the Renyi entropy-power itself appears in [
11], which is essentially Definition 5 below.
Recently, there has been significant interest in Rényi entropy power inequalities for several independent variables (the survey [
12] is recommended to the reader for recent developments on forward and reverse EPIs). Bobkov and Chistyakov [
13] extended the classical Shannon’s EPI to the Rényi entropy by incorporating a multiplicative constant that depends on the order of the Rényi entropy. Ram and Sason [
14] improved the value of the constant by making it depend also on the number of variables. Bobkov and Marsiglietti [
15] proved another modification of the EPI for the Rényi entropy for two independent variables, with a power exponent parameter
whose value was further improved by Li [
16]. All these EPIs were found for Rényi entropies of orders >1. The
-modification of the Rényi EPI was extended to orders <1 for two independent variables having log-concave densities by Marsiglietti and Melbourne [
17]. The starting point of all the above works was Young’s strengthened convolutional inequality.
Recently, Shannon’s original EPI was given a simple proof [
18] using a transport argument from normal variables and a change of variable by rotation. In this paper, we exploit these ingredients, described in the following lemmas, to establish all the above-mentioned Rényi EPIs and derive new ones.
Notation 1. Throughout this article, the considered n-dimensional zero-mean random variables admit a density which is implicitly assumed continuous inside its support. We write if X has density f and write if is normally distributed with covariance matrix .
Lemma 1 (Normal Transport).
Let . There exists a diffeomorphism such that . Moreover, T can be chosen such that its Jacobian matrix is (lower) triangular with positive diagonal elements.
Lemma 1 is known in optimal transport theory as an application of the Knothe–Rosenblatt map [
20]. Two different proofs are given in [
18]. The proof is very simple for one-dimensional variables [
21], where
T is just an increasing function with continuous derivative
Lemma 2 (Normal Rotation [
If are i.i.d. (independent and identically distributed) and normal, then for any , the rotationyields i.i.d. normal variables . Notice that the starred variables can be expressed in terms of the tilde variables by the inverse rotation
The proof of Lemma 2 is trivial considering covariance matrices. A deeper result states that this property of remaining i.i.d. by rotation characterizes the normal distribution—this is known as Bernstein’s lemma (see, e.g., ([
21] [Lemma 4]) and ([
22] [Chapter 5])). This explains why one obtains equality in the EPI only for normal variables (see [
21] for more details).
This article is a revised, full version of what was presented in part in a previous conference communication [
23]. It is organized as follows. Preliminary definitions and known properties are presented in
Section 2.
Section 3 derives a crucial “information inequality” for Rényi entropies that enjoys a transformational invariance. The central result is in
Section 4, where the first version of the Rényi EPI by Dembo, Cover and Thomas is proven using the ingredients of Lemmas 1 and 2. All previously known Rényi EPIs for finite orders—and new ones—are then derived using a simple method in
Section 5.
Section 6 concludes.
2. Preliminary Definitions and Properties
Throughout this article, we consider exponents with . The following definition is well known and used, e.g., in Hölder’s inequality.
Definition 1 (Conjugate Exponent).
The conjugate exponent of p isthat is, the number such that Remark 1. There are two situations depending on whether is positive or negative, as summarized in the following table: | or | |
| |
Definition 2 (Rényi Entropy).
If X has density , its Rényi entropy of order p is defined bywhere denotes the norm of f. It is known that the limit as
is the Shannon entropy
The Rényi entropy enjoys well-known properties similar to those of the Shannon entropy, which are recalled here for completeness.
Lemma 3 (Scaling Property).
Proof. Making a change of variables, . ☐
One recovers the usual scaling property for the Shannon entropy by letting .
Lemma 4 (Rényi Entropy of the Normal).
If for some nonsingular covariance matrix , thenwhere denotes the determinant. In particular, for , Proof. By direct calculation, . ☐
Again, one recovers the Shannon entropy of a normal variable by letting (then, ).
The following notion of escort distribution [
25] is useful in the sequel.
Definition 3 (Escort Density ([
25] § 2.2)).
If , its escort density of exponent p is the density defined byIn other words, , where denotes the norm of f. We also use the notation to denote the corresponding escort random variable with density . Lemma 5 (Monotonicity Property).
If , then with equality if and only if X is uniformly distributed.
Proof. Let
and assume that
for all
q in a neighborhood of
p so that one can freely differentiate under the integral sign:
denotes the Kullback–Leibler divergence. Equality
can hold only if
a.e., which, since
, implies that
f is constant over some measurable subset
, and is zero elsewhere. It follows that
for any
X is not uniformly distributed. Conversely, if
X is uniformly distributed over some measurable subset
, its density can be written as
for all
elsewhere. Then,
is independent of
p. ☐
Remark 2. Notice the identity established in the proof: A similar formula for discrete variables can be found in ([26] § 5.3). 3. An Information Inequality
The Shannon entropy satisfies a fundamental “information inequality” ([
4] Theorem 2.6.3) from which many classical information-theoretic inequalities can be derived. This can be written as
for any density
, with equality if and only if
a.e. The following Theorem 1 can be seen as the natural extension of the information inequality to Rényi entropies and is central in the following derivations of this paper. Bercher [
27] pointed out to the author that it is similar to an inequality for discrete distributions established by Campbell [
6] in the context of source coding (see also [
Theorem 1 (Information Inequality).
For any density φ,with equality if and only if a.e. By letting
, one recovers the classical information inequality in Equation (
13) for the Shannon entropy.
Proof. By definition in Equation (
where the inequality follows from Jensen’s inequality applied to the function
, which is strictly concave if
(that is,
) and strictly convex if
(that is,
). Equality holds if and only if
is constant a.e., which means that
are proportional a.e. Normalizing gives the announced condition
a.e. ☐
Remark 3. An alternate proof is obtained using Hölder’s inequality or its reverse applied to f and . Notice that the equality case for givesas can be easily checked directly. The following conditional version of Theorem 1 involves a more complicated relation for dependent variables.
Corollary 1 (Conditional Information Inequality)
For any two random variables ,where denotes the Rényi entropy of X knowing and the expectation on the l.h.s. is taken over Y (the expectation in the r.h.s. is taken over ). In particular, when X and Y are independent,with equality if and only if does not depend on y and equals a.e. Proof. From Equation (
14) for fixed
y, one has
) and the opposite inequality for
), with equality if and only if
a.e. Taking the expectation over
Y yields
) and the opposite inequality for
). The result follows by taking the logarithm and multiplying by
. When
X and
Y are independent, equality holds if and only if
a.e. for all
y. ☐
For the Shannon entropy, the difference between the two sides of the information inequality in Equation (
13) is the Kullback–Leibler divergence:
which can also be noted
. It is known (and easy to check) that the divergence is invariant by reversible transformations
T. This means that, when
, one has
. A natural extension to Rényi entropies can be obtained on the difference
between the two sides of the information inequality in Equation (
Theorem 2 (Transformational Invariance).
Let be a diffeomorphism and suppose thatwhere and . Then, Note that, from Equation (
5), this identity can be rewritten as
Proof. Proceed to prove Equation (
23). Let
be the respective densities of
and recall that
. By the transformation
T, the densities are related by
denotes the Jacobian determinant of
T. Using these relations and Definition 3,
Remark 4. φ is a density and is not used in the proof of Theorem 2. Therefore, Equation (22) holds more generally for any function φ satisfying Equation (25). 4. First Version of the Rényi EPI
For two independent random variables
X and
Y, the Shannon entropy power inequality can be expressed as follows [
3]: For any
with equality if and only if
are i.i.d. normal—more precisely (given the translation invariance of the entropy) when they are normal with identical covariance matrices; since it was assumed in Notation 1 that all considered variables have zero mean, both statements are equivalent. This means that the difference
is minimum (zero) for i.i.d. normal
. In this section, we study the natural generalization for Rényi entropies ([
3] Theorem 12), namely that the quantity
is minimum for
i.i.d. normal. Here, the triple
and its associated
satisfy the following condition, which is used, e.g., in Young’s convolutional inequality.
Definition 4 (Exponent Triple).
An exponent triple has conjugates of the same sign and such that The corresponding coefficient is defined by In other words, the exponents
are such that
and fulfill one the following two conditions:
| or | |
| |
, | | , |
, | | , |
The key argument used in this section is the following. If
, then for the escort variables,
. By normal transport (Lemma 1), one can write
for two diffeomorphims
T and
U, where
are, say, i.i.d. standard normal
. (It follows that
.) We then have the following straightforward extension of Theorem 2.
Lemma 6 (Transformational Invariance for Two Independent Variables).
For a two-dimensional ,where Proof. From Equation (
5) and the definition of
, Equation (
34) can be rewritten as
By the transformations
T and
U, the densities of the escort variables are related by
. Now, by the same calculation as in the proof of Theorem 2,
Lemma 7. Let φ be the density of . Then, Proof. By the equality case of Theorem 1 (see (
16)), one has
Now, by Lemma 6 applied to
, we have
, and, therefore,
Since from Lemma 1,
T and
U can be chosen such that
are (lower) triangular with positive diagonal elements, it follows easily from the arithmetic-geometric mean inequality that
The result follows at once (for either positive or negative ). ☐
We can now use the normal rotation Lemma 2 to conclude by proving the following,
Theorem 3 (Rényi EPI [
For independent and exponent triple ,with equality if and only if are i.i.d. normal. Proof. If
are i.i.d. normal, then
is also identically distributed as
X and
Y, and from Lemma 4, it is immediate to check that equality holds (irrespective of their covariances). Therefore, the inequality in Equation (
42) is equivalent to
are, say, i.i.d. standard normal
To prove Equation (
43), consider the normal rotation of Lemma 2 and write
in terms of
using Equation (
2) in the first term of the r.h.s. of Equation (
38) (Lemma 7). One obtains:
Making the change of variable
, one obtains
is a density. Hence,
is also a density in
for fixed
. Now, since by Lemma 2,
are independent, by the conditional information inequality in Equation (
18) of Corollary 1, one has
Combining with Equation (
44) yields the announced inequality in Equation (
It remains to settle the equality case in Equation (
43). From the above proof, equality holds in Equation (
43) if and only if both Equation (
41) and Equation (
47) are equalities. Equality in Equation (
41) holds if and only if for all
are independent normal variables, this implies that
are constant and equal. In particular the Jacobian
in (
45) is constant.
From Corollary 1 equality in Equation (
47) holds if and only if
does not depend on
, which implies that
does not depend on the value of
. Taking derivatives with respect to
for all
which implies
for all
. Therefore,
T and
U are linear transformations, equal up to an additive constant (equal to 0 since all variables are assumed of zero mean). It follows that
are normal with respective distributions
. Hence,
X and
Y are i.i.d. normal
. ☐
A straightforward generalization to several independent variables is the following.
Corollary 2 (Rényi EPI for Several Variables).
Let be exponents; those conjugates are of the same sign and satisfyand let be defined by Then, for independent ,with equality if and only if the are i.i.d. normal. Proof. By induction on m: The result for is Theorem 3. Suppose the result satisfied at order and let and be such that . Notice that , hence . By Theorem 3, with equality if and only if are i.i.d. normal. Now, by the induction hypothesis, with equality if and only if the ()—and hence —are i.i.d. normal. The result at order m follows by combining the two inequalities since . ☐
5. Recent Versions of the Rényi EPI
Definition 5 (Rényi Entropy Power [
The Rényi entropy power of order r is defined by Up to a multiplicative constant,
is the (average) power of a white normal variable having the same Rényi entropy as
X—hence the name “entropy power”. In fact, if
has the same Rényi entropy
, then by Lemma 4,
The Renyi entropy power enjoys the same scaling property as for the usual power: By Lemma 3, for any
For independent
, Rényi entropy power inequalities take either the form [
for some positive constant
c, or the form [
for some positive exponent
. The constants
c and
may depend on the order
r, the number
m of variables and the dimension
n. What is desired is:
a maximum possible value of
c in Equation (
57) since the inequality is automatically satisfied for all positive constants
a minimum possible value of
in Equation (
58) since the inequality is automatically satisfied for all positive exponents
; in fact, since Equation (
58) is homogeneous by scaling the variables
as in Equation (
56), one may suppose without loss of generality that the r.h.s. of Equation (
58) is
; then,
, hence
for all
i and
The following useful characterization, which generalizes ([
16] Lemma 2.1), makes the link between the various versions (Equations (
53), (
57), and (
58)) of the Rényi entropy power inequality.
Lemma 8. For independent , the Rényi EPI in the general formfor some constant and exponent is equivalent to the following inequalityfor any positive such that , where denotes the discrete entropy Proof. Suppose Equation (
59) holds. Then,
where the scaling property in Equation (
56) is used in Equation (64) and the concavity of the logarithm is used in Equation (
65). Conversely, suppose that Equation (
60) is satisfied for all
such that
. Set
. Then,
5.1. Rényi Entropy Power Inequalities for Orders >1
From Lemma 8 and Corollary 2, it is easy to recover known Rényi EPIs and obtain new ones for orders
. In fact, if
, then
and all
are positive and
. Therefore, all
and by monotonicity (Lemma 5),
Plugging this into Equation (
53), one obtains
. For future reference, define
(The absolute value
is needed in the next subsection where
is negative.)
This function is strictly convex in because is strictly convex. Note that vanishes in the limiting cases where tends to one of the standard unit vectors , , …, and since every is a convex combination of these vectors and is strictly convex, one has .
Theorem 4 (Ram and Sason [
The Rényi EPI (57) holds for and . Proof. By Lemma 8 for
we only need to check that the r.h.s. of Equation (
74) is greater than
for any choice of the
’s, that is, for any choice of exponents
such that
. Thus, Equation (
57) will hold for
. Now, by the log-sum inequality ([
4] Theorem 2.7.1),
with equality if and only if all
are equal, that is, the
are equal to
. Thus,
which yields
. ☐
An alternate proof is to argue that is convex and symmetrical in and is, therefore, minimized when all are equal.
Remark 5. The above constant c is certainly not optimal since equality in Equation (73) holds if and only if the are uniformly distributed (Lemma 5) while equality in Equation (53) holds if and only if the are identically normally distributed (Corollary 2). Ram and Sason [14] tightened Equation (57) further using optimization techniques, resulting in a constant that depends on the relative values of the entropy powers themselves. Remark 6. It can be noted that decreases (and tends to ) as m increases; in fact . Thus, a universal constant independent of m is obtained by taking This was the constant established by Bobkov and Chistyakov [13]. Theorem 5. The Rényi EPI (58) holds for and and this value of α cannot be improved using the method of this paper by making it depend on m. Proof. By Lemma 8, for
, we only need to check that the r.h.s. of Equation (
74) is greater than
for any choice of
s, that is, for any choice of exponents
such that
. Thus, Equation (
58) will hold for
. By the proof of the preceding theorem, the numerator is minimized when all
are equal and this also maximizes the entropy
in the denominator. However, one cannot conclude yet since the minimum in the numerator is negative.
A stationary point is easily obtained by the Lagrangian method, which implies that is constant independent of i. This gives that is constant, hence a stationary point is obtained when all are equal (to ) and the corresponding value of is .
However, the boundary of the domain of
is the simplex
where on each vertex joining two standard unit vectors,
has the same expression as for
. Li [
16] showed—this is also easily proved using [
17, Lemma 8]—that, for
, the minimum is obtained when
. The corresponding value of
, which is easily seen to be less than
for any
Therefore, the minimum of is attained at the boundary when all are zero except two of them equal to . This gives . ☐
Remark 7. The case yields which was found by Li [16] who remarked that this value of α is strictly smaller (better) than the value obtained by Bobkov and Marsiglietti [15]. Interestingly, for , the exponent of Theorem 5 cannot be improved by this method. In fact, in the above proof, it is easily seen that is negative and increases toward 0 as m increases. Therefore, the exponent α cannot be decreased (improved) as m increases.
The above value of is . However, using the same method, it is easy to obtain Rényi EPIs with exponent values . This is given by the following Theorem 6.
Theorem 6. The Rényi EPI (59) holds for , with . Proof. By Lemma 8 we only need to check that the r.h.s. of Equation (
74) is greater than
, that is,
for any choice of
s, that is, for any choice of exponents
such that
. Thus, for a given
, Equation (
59) will hold for
. From the preceding proofs (since both
are convex functions of
), the minimum is attained when all
s are equal. This gives
. ☐
5.2. Rényi Entropy Power Inequalities for Orders <1 and Log-Concave Densities
, then
and all
are negative and
. Therefore, all
and by monotonicity (Lemma 5), the opposite inequality of Equation (
73) holds and the method of the preceding subsection fails. For log-concave densities, however, Equation (
73) can be replaced by a similar inequality in the right direction.
Definition 6 (Log-Concave Density).
A density f is log-concave if is concave in it support, i.e., for all , Lemma 9. If X has a log-concave density, then is concave in p.
As noted below in Corollary 3, this is essentially a result obtained by Fradelizi, Madiman and Wang [
28]. The following alternate proof uses the transport properties seen in
Section 4.
Proof. Define
. By Lemma 1 there exists two diffeomorphisms
such that one can write
. Then,
has density
Now, by log-concavity, Equation (
79) with
Using the arithmetic-geometric mean inequality in Equation (
41) and integrating the density in Equation (
80) over
, one obtains
Taking the logarithm yields the announced concavity. ☐
As a side result, it is interesting to note that we have obtained a simple transportation proof of the following varentropy bound:
Corollary 3 (Varentropy Bound [
One has , that is, . Proof. Since
is concave, one has
, that is,
Differentiating twice using Leibniz’s product rule and plugging the identity in Equation (
12), the l.h.s. of this inequality becomes
As another easy consequence of Lemma 9, since is concave and vanishes for , the slopes are non-increasing in p. In other words, is nondecreasing. Therefore:
Corollary 4 (Marsiglietti and Melbourne [
If then for any X with log-concave density, . We can now use Lemma 8 and Corollary 2 to obtain Rényi EPIs for orders
. Since all
, by Corollary 4,
Plugging this into Equation (
53), one obtains
where we have used that
. Notice that the r.h.s. of Equation (89) for
) is the opposite of that of Equation (
74) for
). However, since
is now negative, the r.h.s. is exactly equal to
which is still convex and negative. For this reason, the proofs of the following theorems for
are such repeats of the theorems obtained previously for
Theorem 7. The Rényi EPI (57) for log-concave densities holds for and . Proof. It is identical to that of Theorem 4 except for the change in the expression of . ☐
Theorem 8. The Rényi EPI (58) for log-concave densities holds for and and this value of α cannot be improved using the method of this paper by making it depend on m. Proof. It is identical to that of Theorem 5 except for the change in the expression of . ☐
Remark 8. The case yields which was found by Marsiglietti and Melbourne [17]. Again, the exponent of the theorem is not improved for . Theorem 9. The Rényi EPI in Equation (59) for log-concave densities holds for where and . Proof. It is identical to that of Theorem 6 except for the change in the expression of . ☐
6. Conclusions
This article provides a comprehensive framework to derive known Rényi entropy power inequalities (with shorter proofs), and prove new ones. The framework is based on a transport argument from normal densities and a change of variable by rotation. Only basic properties of Rényi entropies are used in the proofs.
In particular, the -modification of the EPI is recovered for two or more independent variables for Rényi entropy orders >1 as well as for orders <1, and the Rényi EPI with multiplicative constant c is extended to Rényi entropy orders <1. In addition, a more general formulation with both exponent and constant c is obtained for all orders. In passing, a simple proof using normal transport of a recent sharp varentropy bound was obtained for log-concave densities.
As a perspective, the methods developed in this paper can perhaps be generalized to obtain reverse Rényi entropy power inequalities (see, e.g., the discussion in [