A View of Information-Estimation Relations in Gaussian Networks

Dytso, Alex; Bustin, Ronit; Poor, H. Vincent; Shamai (Shitz), Shlomo

doi:10.3390/e19080409

Open AccessFeature PaperReview

A View of Information-Estimation Relations in Gaussian Networks

¹

Department of Electrical Engineering, Princeton University, Princeton, NJ 08544, USA

²

Department of Electrical Engineering, Technion-Israel Institute of Technology, Haifa 32000, Israel

^*

Authors to whom correspondence should be addressed.

Entropy 2017, 19(8), 409; https://doi.org/10.3390/e19080409

Submission received: 31 May 2017 / Revised: 31 July 2017 / Accepted: 1 August 2017 / Published: 9 August 2017

(This article belongs to the Special Issue Network Information Theory)

Download

Browse Figures

Versions Notes

Abstract

:

Relations between estimation and information measures have received considerable attention from the information theory community. One of the most notable such relationships is the I-MMSE identity of Guo, Shamai and Verdú that connects the mutual information and the minimum mean square error (MMSE). This paper reviews several applications of the I-MMSE relationship to information theoretic problems arising in connection with multi-user channel coding. The goal of this paper is to review the different techniques used on such problems, as well as to emphasize the added-value obtained from the information-estimation point of view.

Keywords:

network information theory; estimation theory; I-MMSE

1. Introduction

The connections between information theory and estimation theory go back to the late 1950s in the work of Stam in which he uses the de Bruijn identity [1], attributed to his PhD advisor, which connects the differential entropy and the Fisher information of a random variable contaminated by additive white Gaussian noise. In 1968 Esposito [2] and then in 1971 Hatsell and Nolte [3] identified connections between the Laplacian and the gradient of the log-likelihood ratio and the conditional mean estimate. Information theoretic measure can indeed be put in terms of log-likelihood ratios, however, these works did not make this additional connecting step. In the early 1970s continuous-time signals observed in white Gaussian noise received specific attention in the work of Duncan [4] and Kadota et al. [5] who investigated connections between the mutual information and causal filtering. In particular, Duncan and Zakai (Duncan’s theorem was independently obtained by Zakai in the general setting of inputs that may depend causally on the noisy output in a 1969 unpublished Bell Labs Memorandum (see [6])) [4,7] showed that the input-output mutual information can be expressed as a time integral of the causal minimum mean square error (MMSE). It was only in 2005 that Guo, Shamai and Verdú revealed the I-MMSE relationship [8], which similarly to the de Bruijn identity, relates information theoretic quantities to estimation theoretic quantities over the additive white Gaussian noise channel. Moreover, the fact that the I-MMSE relationship connects the mutual information with the MMSE has made it considerably more applicable, specifically to information theoretic problems.

The I-MMSE type relationships have received considerable attention from the information theory community and a number of extensions have been found. In [9], in the context of multiple-input multiple-output (MIMO) Gaussian channels, it was shown that the gradient of the mutual information with respect to the channel matrix is equal to the channel matrix times the MMSE matrix. In [10] a version of the I-MMSE identity has been shown for Gaussian channels with feedback. An I-MMSE type relationship has been found for additive non-Gaussian channels in [11] and non-additive channels with a well-defined notion of the signal-to noise ratio (SNR) in [12,13,14,15]. A relationship between the MMSE and the relative entropy has been established in [16], and between the score function and Rényi divergence and f-divergence in [17]. The I-MMSE relationship has been extended to continuous time channels in [8] and generalized in [18] by using Malliavin calculus. For other continuous time generalizations the reader is referred to [19,20,21]. Finally, Venkat and Weissman [22] dispensed with the expectation and provided a point-wise identity that has given additional insight into this relationship. For a comprehensive summary of results on the interplay between estimation and information measures the interested reader is referred to [23].

In this survey we provide an overview of several applications of the I-MMSE relationship to multi-user information theoretic problems. We consider three types of applications:

Capacity questions, including both converse proofs and bounds given additional constraints such as discrete inputs;
The MMSE SNR-evolution, meaning the behavior of the MMSE as a function of SNR for asymptotically optimal code sequences (code sequences that approach capacity as $n \to \infty$ ); and
Finite blocklength effects on the SNR-evolution of the MMSE and hence effects on the rate as well.

Our goal in this survey is both to show the strength of the I-MMSE relationship as a tool to tackle network information theory problems, and to overview the set of tools used in conjunction with the I-MMSE relationship such as the “single crossing point” property. As will be seen such tools lead to alternative and, in many cases, simpler proofs of information theoretic converses.

We are also interested in using estimation measures in order to upper or lower bound information measures. Such bounds lead to simple yet powerful techniques that are used to find “good” capacity approximations. At the heart of this technique is a generalization of the Ozarow-Wyner bound [24] based on minimum mean p-th error (MMPE). We hope that this overview will enable future application of these properties in additional multi-user information theoretic problems.

The outline of the paper is as follows:

In Section 2 we review information and estimation theoretic tools that are necessary for the presentation of the main results.
In Section 3 we go over point-to-point information theory and give the following results:
- In Section 3.1, using the I-MMSE and a basic MMSE bound, a simple converse is shown for the Gaussian point-to-point channel;
- In Section 3.2, a lower bound, termed the Ozarow-Wyner bound, on the mutual information achieved by a discrete input on an AWGN channel, is presented. The bound holds for vector discrete inputs and yields the sharpest known version of this bound; and
- In Section 3.3, it is shown that the MMSE can be used to identify optimal point-to-point codes. In particular, it is shown that an optimal point-to-point code has a unique SNR-evolution of the MMSE.
In Section 4 we focus on the wiretap channel and give the following results:
- In Section 4.1, using estimation theoretic properties a simple converse is shown for the Gaussian wiretap channel that avoids the use of the entropy power inequality (EPI); and
- In Section 4.2, some results on the SNR-evolution of the code sequences for the Gaussian wiretap channel are provided, showing that for the secrecy capacity achieving sequences of codes the SNR-evolution is unique.
In Section 5 we study a communication problem in which the transmitter wishes to maximize its communication rate, while subjected to a constraint on the disturbance it inflicts on the secondary receiver. We refer to such scenarios as communication with a disturbance constraint and give the following results:
- In Section 5.1 it is argued that an instance of a disturbance constraint problem, when the disturbance is measured by the MMSE, has an important connection to the capacity of a two-user Gaussian interference channel;
- In Section 5.2 the capacity is characterized for the disturbance problem when the disturbance is measured by the MMSE;
- In Section 5.3 the capacity is characterized for the disturbance problem when the disturbance is measured by the mutual information. The MMSE and the mutual information disturbance results are compared. It is argued that the MMSE disturbance constraint is a more natural measure in the case when the disturbance measure is chosen to model the unintended interference;
- In Section 5.4 new bounds on the MMSE are derived and are used to show upper bounds on the disturbance constraint problem with the MMSE constraint when the block length is finite; and
- In Section 5.5 a notion of mixed inputs is defined and is used to show lower bounds on the rates of the disturbance constraint problem when the block length is finite.
In Section 6 we focus on the broadcast channel and give the following results:
- In Section 6.1, the converse for a scalar Gaussian broadcast channel, which is based only on the estimation theoretic bounds and avoids the use of the EPI, is derived; and
- In Section 6.2, similarly to the Gaussian wiretap channel, we examine the SNR-evolution of asymptotically optimal code sequences for the Gaussian broadcast channel, and show that any such sequence has a unique SNR-evolution of the MMSE.
In Section 7 the SNR-evolution of the MMSE is derived for the K-user broadcast channel.
In Section 8, building on the MMSE disturbance problem in Section 5.1, it is shown that for the two-user Gaussian interference channel a simple transmission strategy of treating interference as noise is approximately optimal.

Section 9 concludes the survey by pointing out interesting future directions.

1.1. Notation

Throughout the paper we adopt the following notational conventions:

Random variables and vectors are denoted by upper case and bold upper case letters, respectively, where r.v. is short for either random variable or random vector, which should be clear from the context. The dimension of these random vectors is n throughout the survey. Matrices are denoted by bold upper case letters;
If A is an r.v. we denote the support of its distribution by $supp (A)$ ;
The symbol $| \cdot |$ may denote different things: $| A |$ is the determinant of the matrix $A$ , $| A |$ is the cardinality of the set $A$ , $| X |$ is the cardinality of $supp (X)$ , or $| x |$ is the absolute value of the real-valued x;
The symbol $∥ \cdot ∥$ denotes the Euclidian norm;
$E [\cdot]$ denotes the expectation;
$N (m_{X}, K_{X})$ denotes the density of a real-valued Gaussian r.v. $X$ with mean vector $m_{X}$ and covariance matrix $K_{X}$ ;
$X \sim PAM (N, d_{\min (X)})$ denotes the uniform probability mass function over a zero-mean pulse amplitude modulation (PAM) constellation with $| supp (X) | = N$ points, minimum distance $d_{\min (X)}$ , and therefore average energy $E [X^{2}] = d_{\min (X)}^{2} \frac{N^{2} - 1}{12}$ ;
The identity matrix is denoted by $I$ ;
The reflection of the matrix $A$ along its main diagonal, or the transpose operation, is denoted by $A^{T}$ ;
The trace operation on the matrix $A$ is denoted by $Tr (A)$ ;
The order notation $A ⪰ B$ implies that $A - B$ is a positive semidefinite matrix;
$\log (\cdot)$ denotes the logarithm to the base $e$ ;
$[n_{1} : n_{2}]$ is the set of integers from $n_{1}$ to $n_{2} \geq n_{1}$ ;
For $x \in R$ we let $⌊x⌋$ denote the largest integer not greater than x;
For $x \in R$ we let ${[x]}^{+} : = \max (x, 0)$ and $\log^{+} (x) : = {[\log (x)]}^{+}$ ;
Let $f (x), g (x)$ be two real-valued functions. We use the Landau notation $f (x) = O (g (x))$ to mean that for some $c > 0$ there exists an $x_{0}$ such that $f (x) \leq c g (x)$ for all $x \geq x_{0}$ , and $f (x) = o (g (x))$ to mean that for every $c > 0$ there exists an $x_{0}$ such that $f (x) < c g (x)$ for all $x \geq x_{0}$ ; and
We denote the upper incomplete gamma function and the gamma function by

$\begin{matrix} Γ (x; a) & : = \int_{a}^{\infty} t^{x - 1} e^{- t} d t, x \in R, a \in R^{+}, \end{matrix}$

(1a)

$\begin{matrix} Γ (x) & : = Γ (x; 0) . \end{matrix}$

(1b)

2. Estimation and Information Theoretic Tools

In this section, we overview relevant information and estimation theoretic tools. The specific focus is to show how estimation theoretic measures can be used to represent or bound information theoretic measures such as entropy and mutual information.

2.1. Estimation Theoretic Measures

Of central interest to us is the following estimation measure constructed from the

L_{p}

norm.

Definition 1.

For the random vector

V \in R^{n}

and

p > 0

let

\begin{matrix} {∥ V ∥}_{p} : = {(\frac{1}{n} E [{∥ V ∥}^{p}])}^{\frac{1}{p}} = {(\frac{1}{n} E [{(Tr (V V^{T}))}^{\frac{p}{2}}])}^{\frac{1}{p}} . \end{matrix}

(2a)

We define the minimum mean p-th error (MMPE) of estimating

X

from

Y

as

\begin{matrix} mmpe (X | Y; p) = \inf_{f} {∥ X - f (Y) ∥}_{p}^{p}, \end{matrix}

(2b)

where the minimization is over all possible Borel measurable functions

f (Y)

. Whenever the optimal MMPE estimator exists, we shall denote it by

f_{p} (X | Y)

.

In particular, for

Z \sim N (0, I)

the norm in (2a) is given by

\begin{matrix} {n ∥ Z ∥}_{p}^{p} = E [{(\sum_{i = 1}^{n} Z_{i}^{2})}^{\frac{p}{2}}] = 2^{\frac{p}{2}} \frac{Γ (\frac{n}{2} + \frac{p}{2})}{Γ (\frac{n}{2})}, for n \in N, p \geq 0, \end{matrix}

(3)

and for

V

uniform over the n dimensional ball of radius r the norm in (2a) is given by

\begin{matrix} {n ∥ V ∥}_{p}^{p} = \frac{1}{Vol (B (r))} \frac{π^{\frac{n}{2}}}{Γ (\frac{n}{2})} \int_{0}^{r} ρ^{p} ρ^{n - 1} d ρ = \frac{n}{2 p + 2 n} r^{p}, for n \in N, p \geq 0 . \end{matrix}

(4)

We shall denote

\begin{matrix} mmpe (X | Y; p) = mmpe (X, snr, p), \end{matrix}

(5)

if

Y

and

X

are related as

\begin{matrix} Y = \sqrt{snr} X + Z, \end{matrix}

(6)

where

Z, X, Y \in R^{n}

,

Z \sim N (0, I)

is independent of

X

, and

snr \geq 0

is the SNR. When it will be necessary to emphasize the SNR at the output

Y

, we will denote it by

Y_{snr}

. Since the distribution of the noise is fixed

mmpe (X | Y; p)

is completely determined by the distribution of

X

and

snr

and there is no ambiguity in using the notation

mmpe (X, snr, p)

. Applications to the Gaussian noise channel will be the main focus of this paper.

In the special, case when

p = 2

, we refer to the MMPE as the minimum mean square error (MMSE) and use the notation

\begin{matrix} mmpe (X, snr, 2) = mmse (X, snr), \end{matrix}

(7)

in which case we also have that

f_{2} (X | Y) = E [X | Y]

.

Remark 1.

The notation

f_{p} (X | Y)

, for the optimal estimator in (2) is inspired by the conditional expectation

E [X | Y]

, and

f_{p} (X | Y)

should be thought of as an operator on

X

and a function of

Y

. Indeed, for

p = 2

, the MMPE reduces to the MMSE; that is,

mmpe (X | Y; 2) = mmse (X | Y)

and

f_{2} (X | Y) = E [X | Y]

.

Finally, similarly to the conditional expectation, the notation

f_{p} (X | Y = y)

should be understood as an evaluation for a realization of a random variable

Y

, while

f_{p} (X | Y)

should be understood as a function of a random variable

Y

which itself is a random variable.

Lemma 1.

(Existence of the Optimal Estimator [25]) For any

X

and

Y

given by (6) an optimal estimator exists and the infimum in (2) can be attained.

In certain cases the optimal estimator might not be unique and the interested reader is referred to [25] for such examples. In general we do not have a closed form solution for the MMPE optimal estimator in (2). Interestingly, the optimal estimator for Gaussian inputs can be found and is linear for all

p \geq 1

Proposition 1.

(MMPE of a Gaussian Input [25,26,27]) For

X_{G} \sim N (0, I)

and

p \geq 1

\begin{matrix} mmpe (X_{G}, snr, p) = \frac{{∥ Z ∥}_{p}^{p}}{{(1 + snr)}^{\frac{p}{2}}}, \end{matrix}

(8a)

with the optimal estimator given by

\begin{matrix} f_{p} (X_{G} | Y = y) = \frac{\sqrt{snr} y}{1 + snr} . \end{matrix}

(8b)

Note that unlike the Gaussian case in general the estimator will be a function of the order p. For

X = \pm 1

equally likely (i.e., binary phase shift keying—BPSK) the optimal estimator is given by

\begin{matrix} f_{p} (X | Y = y) = \tanh (\frac{y \sqrt{snr}}{p - 1}) . \end{matrix}

(9)

Often the MMPE is difficult to compute, even for

p = 2

(MMSE), and one instead is interested in deriving upper bounds on the MMPE. One of the most useful upper bounds on the MMPE can be obtained by restricting the optimization in (2) to linear functions.

Proposition 2.

(Asymptotically Gaussian is the “hardest” to estimate [25]) For

snr \geq 0

,

p \geq 1

, and a random variable

X

such that

{∥ X ∥}_{p}^{p} \leq σ^{p} {∥ Z ∥}_{p}^{p}

, we have

\begin{matrix} mmpe (X, snr, p) & \leq κ_{p, σ^{2} snr} \cdot \frac{σ^{p} {∥ Z ∥}_{p}^{p}}{{(1 + snr σ^{2})}^{\frac{p}{2}}}, \end{matrix}

(10a)

where

\begin{matrix} for p = 2 : & κ_{p, σ^{2} snr}^{\frac{1}{p}} = 1, \end{matrix}

(10b)

\begin{matrix} for p \neq 2 : & 1 \leq κ_{p, σ^{2} snr}^{\frac{1}{p}} = \frac{1 + \sqrt{σ^{2} snr}}{\sqrt{1 + σ^{2} snr}} \leq 1 + \frac{1}{\sqrt{1 + σ^{2} snr}} . \end{matrix}

(10c)

Moreover, a Gaussian

X

with per-dimension variance

σ^{2}

(i.e.,

X \sim N (0, σ^{2} I)

) asymptotically achieves the bound in (10a), since

\lim_{snr \to \infty} κ_{p, σ^{2} snr} = 1

.

For the case of

p = 2

, the bound in (10a) is achieved with a Gaussian input for all SNR’s. Moreover, this special case of the bound in (10a), namely

\begin{matrix} mmse (X, snr) \leq \frac{σ^{2}}{1 + σ^{2} snr}, \end{matrix}

(11)

for all

{∥ X ∥}_{2}^{2} \leq σ^{2}

, is referred to as the linear minimum mean square error (LMMSE) upper bound.

2.2. Mutual Information and the I-MMSE

For two random variables

(X, Y)

distributed according to

P_{X Y}

the mutual information is defined as

\begin{matrix} I (X; Y) = E [\log \frac{d P_{X Y}}{d (P_{X} \times P_{Y})}], \end{matrix}

(12)

where

\frac{d P_{X Y}}{d (P_{X} \times P_{Y})}

is the Radon-Nikodym derivative. For the channel in (6) the mutual information between

X

and

Y

takes the following form:

\begin{matrix} I (X; Y) = E [\log (\frac{f_{Y | X} (Y | X)}{f_{Y} (Y)})], \end{matrix}

(13)

and it will be convenient to use the normalized mutual information

\begin{matrix} I_{n} (X, snr) = \frac{1}{n} I (X; Y) . \end{matrix}

(14)

The basis for much of our analysis is the fundamental relationship between information theory and estimation theory, also known as the Guo, Shamai and Verdú I-MMSE relationship [8].

Theorem 1.

(I-MMSE [8]) For any

X

(independent of

snr

) we have that

\begin{matrix} \frac{d}{d snr} I_{n} (X, snr) & = \frac{1}{2} mmse (X, snr), \end{matrix}

(15a)

\begin{matrix} I_{n} (X, snr) & = \frac{1}{2} \int_{0}^{snr} mmse (X, t) d t . \end{matrix}

(15b)

In [28] the I-MMSE relationship has been partially extended to the limit as

n \to \infty

. This result was then extended in [29] under the assumption that the mutual information sequence converges.

Proposition 3.

(I-MMSE limiting expression [29]) Suppose that

{∥ X ∥}_{2}^{2} \leq σ^{2} < \infty

and

\begin{matrix} \lim_{n \to \infty} I_{n} (X, snr) = I_{\infty} (X, snr), \end{matrix}

(16)

exists. (The limit here is taken with respect to a sequence of input distributions over

{X_{n}}_{n \geq 1}

which induces a sequence of input-output joint distributions. The second moment constraint

∥ X_{n} ∥_{2}^{2}

should be understood in a similar manner, as a constraint for every n in the sequence.) Then,

\begin{matrix} \lim_{n \to \infty} mmse (X, snr) = {mmse}_{\infty} (X, snr), \end{matrix}

(17)

and the I-MMSE relationship holds for the following limiting expression:

\begin{matrix} I_{\infty} (X, snr) = \frac{1}{2} \int_{0}^{snr} {mmse}_{\infty} (X, t) d t . \end{matrix}

(18)

Proof.

The proof is given in Appendix A. ☐

Properties of the MMSE, with the specific focus on the I-MMSE identity, as a function of the input distribution and the noise distribution have been thoroughly studied and the interested reader is referred to [17,30,31,32]. For the derivation of the I-MMSE and a comprehensive summary of various extension we refer the reader to [23].

For a continuous random vector

X

with the density

f_{X}

the differential entropy is defined as

\begin{matrix} h (X) = - E [\log f_{X} (X)] . \end{matrix}

(19)

Moreover, for a discrete random vector

X

the discrete entropy is defined as

\begin{matrix} H (X) = - \sum_{x \in supp (X)} p_{x} \log p_{x}, where p_{x} = P [X = x] . \end{matrix}

(20)

2.3. Single Crossing Point Property

Upper bounds on the MMSE are useful, thanks to the I-MMSE relationship, as tools to derive information theoretic converse results, and have been used in [23,30,33,34] to name a few. The key MMSE upper bound that will be used in conjunction with the I-MMSE to derive information theoretic converses is the single crossing point property (SCPP).

Proposition 4.

(SCPP [30,33]) Let

{∥ X ∥}_{2} \leq 1

. Then for any fixed

{snr}_{0}

there exists a unique

α \in [0, 1]

such that

\begin{matrix} mmse (X, {snr}_{0}) = \frac{α}{1 + α {snr}_{0}} . \end{matrix}

(21a)

Moreover, for every

snr > {snr}_{0}

\begin{matrix} mmse (X, snr) \leq \frac{α}{1 + α snr}, \end{matrix}

(21b)

and for every

snr \leq {snr}_{0}

\begin{matrix} mmse (X, snr) \geq \frac{α}{1 + α snr} . \end{matrix}

(21c)

Even though the statement of Proposition 4 seems quite simple it turns out that it is sufficient to show a special case of the EPI [33]:

\begin{matrix} e^{\frac{2}{n} h (X + Z)} \geq e^{\frac{2}{n} h (X)} + e^{\frac{2}{n} h (Z)}, \end{matrix}

(22)

where

Z \sim N (0, R_{Z})

. Interestingly, the I-MMSE appears to be a very powerful tool in deriving EPI type inequalities; the interested reader is referred to [35,36,37,38].

In [25] it has been pointed out that the SCPP upper bound can also be shown for the MMPE as follows.

Proposition 5.

(Generalized SCPP upper bound [25]) Let

{mmpe}^{\frac{2}{p}} (X, {snr}_{0}, p) = \frac{{β ∥ Z ∥}_{p}^{2}}{1 + β {snr}_{0}}

for some

β \geq 0

. Then,

\begin{matrix} {mmpe}^{\frac{2}{p}} (X, snr, p) \leq c_{p} \cdot \frac{{β ∥ Z ∥}_{p}^{2}}{1 + β snr}, for snr \geq {snr}_{0}, \end{matrix}

(23a)

where

\begin{matrix} c_{p} = \{\begin{matrix} 2 & p \neq 2 \\ 1 & p = 2 \end{matrix} . \end{matrix}

(23b)

Proof.

The proof of Propositions 4 and 5 uses a clever choice of a sub-optimal estimator. The interested reader is referred to Appendix B for the proof. ☐

2.4. Complementary SCPP Bounds

Note that the SCPP allows us to upper bound the MMSE for all values of

snr \geq {snr}_{0}

, and as will be shown later this is a very powerful tool in showing information theoretic converses. Another interesting question is whether we can produce a complementary upper bound to that of the SCPP. That is, can we show an upper bounds on the MMSE for

snr \leq {snr}_{0}

? As will be demonstrated in Section 5, such complementary SCPP bounds are useful in deriving information theoretic converses for problems of communication with a disturbance constraint.

The next result shows that this is indeed possible.

Proposition 6.

(Complementary SCPP bound [25]) For

0 < snr \leq {snr}_{0}

,

X

and

p \geq 0

, we have

\begin{matrix} mmpe (X, snr, p) \leq κ_{n, t} {mmpe}^{\frac{1 - t}{1 + t}} (X, {snr}_{0}, \frac{1 + t}{1 - t} \cdot p), \\ where κ_{n, t} : = {(\frac{2^{n}}{n^{2}})}^{\frac{t}{t + 1}} {(\frac{1}{1 - t})}^{\frac{n t}{t + 1} - \frac{1}{2}}, t = \frac{{snr}_{0} - snr}{{snr}_{0}} . \end{matrix}

An interesting property of the bound in Proposition 6 is that the right hand side of the inequality keeps the channel SNR fixed and only varies the order of the MMPE while the left hand side of the inequality keeps the order fixed and changes the SNR value.

2.5. Bounds on Differential Entropy

Another common application of estimation theoretic measures is to bound information measures. Next, we presented one such bound.

For any random vector

V

such that

| Cov (V) | < \infty, h (V) < \infty,

and any random vector

Y

, the following inequality is considered to be a continuous analog of Fano’s inequality [39]:

\begin{matrix} h (V | Y) & \leq \frac{n}{2} {\log (2 π e | Cov (V | Y) |}^{\frac{1}{n}}) \end{matrix}

(24)

\begin{matrix} \leq \frac{n}{2} \log (2 π e mmse (V | Y)), \end{matrix}

(25)

where the inequality in (25) is a consequence of the arithmetic-mean geometric-mean inequality, that is, for any

0 ⪯ A

we have used

{| A |}^{\frac{1}{n}} = {(\prod_{i = 1}^{n} λ_{i})}^{\frac{1}{n}} \leq \frac{\sum_{i = 1}^{n} λ_{i}}{n} = \frac{Tr (A)}{n}

where

λ_{i}

’s are the eigenvalues of

A

.

The inequality in (25) can be generalized in the following way.

Theorem 2.

(Conditional Entropy Bound [25]) Let

V \in R^{n}

be such that

h (V) < \infty

and

{∥ V ∥}_{p} < \infty

. Then, for any

p \in (0, \infty)

and for any

Y \in R^{n}

, we have

\begin{matrix} h (V | Y) \leq \frac{n}{2} \log (k_{n, p}^{2} \cdot n^{\frac{2}{p}} \cdot {mmpe}^{\frac{2}{p}} (V | Y; p)), \end{matrix}

(26)

where

k_{n, p} = \frac{\sqrt{π} {(\frac{p e}{n})}^{\frac{1}{p}} Γ^{\frac{1}{n}} (\frac{n}{p} + 1)}{Γ^{\frac{1}{n}} (\frac{n}{2} + 1)}

.

While the MMPE is still a relatively new tool it has already found several applications:

The MMPE can be used to bound the conditional entropy (see Theorem 2 in Section 2.5). These bounds are generally tighter than the MMSE based bound especially for highly non-Gaussian statistics;
The MMPE can be used to develop bounds on the mutual information of discrete inputs via the generalized Ozarow-Wyner bound (see Theorem 4 in Section 3.2); The MMPE and the Ozarow-Wyner bound can be used to give tighter bounds on the gap to capacity achieved by PAM input constellations (see Figure 2);
The MMPE can be used as a key tool in finding complementary bounds on the SCPP (see Theorem 10 in Section 5.4). Note that using the MMPE as a tool produces the correct phase transition behavior; and
While not mentioned, another application is to use the MMPE to bound the derivatives of the MMSE; see [25] for further details.

3. Point-to-Point Channels

In this section, we review Shannon’s basic theorem for point-to-point communication and introduce relevant definitions used throughout the paper. The point-to-point channel is also a good starting point for introducing many of the techniques that will be used in this survey.

A classical point-to-point channel is shown in Figure 1. The transmitter wishes to reliably communicate a message W at a rate R bits per transmission to a receiver over a noisy channel. To that end, the transmitter encodes the message W into a signal

X

and transmits it over a channel in n time instances. Upon receiving a sequence

Y

, a corrupted version of

X

, the receiver decodes it to obtain the estimate

\hat{W}

.

Definition 2.

A memoryless channel (MC), assuming no feedback,

(X, P_{Y | X}, Y)

(in short

P_{Y | X}

) consists of an input set

X

, an output set

Y

, and a collection of transition probabilities

P_{Y | X}

on

Y

for every

x \in X

. The transition of a length-n vector

X

through such a channel then has the following conditional distribution:

\begin{matrix} P_{Y | X} (y | x) = \prod_{i = 1}^{n} P_{Y | X} (y_{i} | x_{i}) . \end{matrix}

(27)

Definition 3.

A code of length n and rate R, denoted by

(2^{n R}, n)

, of the channel

P_{Y | X}

consist of the following:

A message set ${1, 2, . . ., 2^{n R}}$ . We assume that the message W is chosen uniformly over the message set.
An encoding function $X : {1, 2, . . ., 2^{n R}} \to X^{n}$ that maps messages W to codewords $X (W)$ . The set of all codewords is called the codebook and is denoted by $C$ ; and
A decoding function $g : Y^{n} \to {1, 2, . . ., 2^{n R}}$ that assigns an estimate $\hat{W}$ to each received sequence.

The average probability of error for a

(2^{n R}, n)

code is defined as

\begin{matrix} P [\hat{W} \neq W] = \frac{1}{2^{n R}} \sum_{w = 1}^{2^{n R}} P [\hat{W} \neq w ∣ W = w] . \end{matrix}

Definition 4.

A rate R is said to be achievable over a point-to-point channel if there exists a sequence of codes,

(2^{n R}, n)

, such that

\begin{matrix} \lim_{n \to \infty} P [\hat{W} \neq W] = 0 . \end{matrix}

(28)

The capacity C of a point-to-point channel is the supremum over all achievable rates.

A crowning achievement of Shannon’s 1948 paper [40] is a simple characterization of the capacity of a point-to-point channel.

Theorem 3.

(Channel Coding Theorem [40]) The capacity of the channel

P_{Y | X}

is given by

\begin{matrix} C = \max_{P_{X}} I (X; Y) . \end{matrix}

(29)

For a formal derivation of the capacity expression in (29) the reader is referred to classical texts such as [39,41,42].

3.1. A Gaussian Point-to-Point Channel

In this section we consider the practically relevant Gaussian point-to-point channel shown in Figure 1b and given by

\begin{matrix} Y = \sqrt{snr} X + Z, \end{matrix}

(30)

where Z is standard Gaussian noise and there is an additional input power constraint

E [X^{2}] \leq 1

. The capacity in this setting was solved in the original paper by Shannon and is given by

\begin{matrix} C = \frac{1}{2} \log (1 + snr) . \end{matrix}

(31)

To show the converse proof of the capacity (the upper bound on the capacity) in (31), Shannon used the maximum entropy principle. In contrast to Shannon’s proof, we show the converse can be derived by using the I-MMSE and the LMMSE upper bound in (11)

\begin{matrix} I (X; Y) & = \frac{1}{2} \int_{0}^{snr} mmse (X, γ) d γ \\ \leq \frac{1}{2} \int_{0}^{snr} \frac{1}{1 + γ} d γ \\ = \frac{1}{2} \log (1 + snr) . \end{matrix}

(32)

It is well know that the upper bound in (32) is achievable if and only if the input is

X \sim N (0, 1)

.

The main idea behind the upper bounding technique in (32) is to find an upper bound on the MMSE that holds for all SNR’s and integrate it to get an upper bound on the mutual information. This simple, yet powerful, idea will be used many times throughout this paper to show information theoretic converses for multi-user channels.

3.2. Generalized Ozarow-Wyner Bound

In practice, Gaussian inputs are seldom used and it is important to assess the performance of more practical discrete constellations (or inputs). Another reason is that discrete inputs often outperform Gaussian inputs in competitive multi-user scenarios, such as the interference channel, as will be demonstrated in Section 7. For other examples of discrete inputs being useful in multi-user settings, the interested readers is referred to [43,44,45,46].

However, computing an exact expression for the mutual information between the channel input and output when the inputs are discrete is often impractical or impossible. Therefore, the goal is to derive a good computable lower bound on the mutual information that is not too far away from the true value of the mutual information. As we will see shortly, estimation measures such as the MMSE and the MMPE will play an important role in establishing good lower bounds on the mutual information.

The idea of finding good capacity approximations can be traced back to Shannon. Shannon showed, in his unpublished work in 1948 [47], the asymptotic optimality of a PAM input for the point-to-point power-constrained Gaussian noise channel. Another such observation about approximate optimality of a PAM input was made by Ungerboeck in [48] who, through numerical methods, observed that the rate of a properly chosen PAM input is always a constant away from the AWGN capacity.

Shannon’s and Ungerboeck’s arguments were solidified by Ozarow and Wyner in [24] where firm lower bounds on the achievable rate with a PAM input were derived and used to show optimality of PAM to within 0.41 bits [24].

In [24] the following “Ozarow-Wyner lower bound” on the mutual information achieved by a discrete input

X_{D}

transmitted over an AWGN channel was shown:

\begin{matrix} {[H (X_{D}) - gap]}^{+} \leq I (X_{D}; Y) \leq H (X_{D}), \end{matrix}

(33a)

\begin{matrix} gap \leq \frac{1}{2} \log (\frac{π e}{6}) + \frac{1}{2} \log (1 + \frac{lmmse (X, snr)}{d_{\min} {(X_{D})}^{2}}), \end{matrix}

(33b)

where

lmmse (X | Y)

is the LMMSE. The advantage of the bound in (33) compared to the existing bounds is its computational simplicity. The bound depends only on the entropy, the LMMSE, and the minimum distance, which are usually easy to compute.

The bound in (33) has also been proven to be useful for other problems such as two-user Gaussian interference channels [45,49], communication with a disturbance constraint [50], energy harvesting problems [51,52], and information-theoretic security [53].

The bound on the

gap

in (33) has been sharpened in [45] to

\begin{matrix} gap \leq \frac{1}{2} \log (\frac{π e}{6}) + \frac{1}{2} \log (1 + \frac{mmse (X, snr)}{d_{\min} {(X_{D})}^{2}}), \end{matrix}

(34)

since

lmmse (X, snr) \geq mmse (X, snr)

.

Finally, the following generalization of the bound in (34) to discrete vector input, which is the sharpest known bound on the

gap

term, was derived in [25].

Theorem 4.

(Generalized Ozarow-Wyner Bound [25]) Let

X_{D}

be a discrete random vector with finite entropy, and let

K_{p}

be a set of continuous random vectors, independent of

X_{D}

, such that for every

V \in K_{p}

,

h (V),

{∥ V ∥}_{p} < \infty

, and

\begin{matrix} H (X_{D} | X_{D} + V) = 0, \forall V \in K_{p} . \end{matrix}

(35a)

Then for any

p > 0

\begin{matrix} {[H (X_{D}) - {gap}_{p}]}^{+} \leq I (X_{D}; Y) \leq H (X_{D}), \end{matrix}

(35b)

where

\begin{matrix} n^{- 1} {gap}_{p} & \leq \inf_{V \in K_{p}} (G_{1, p} (V, X_{D}) + G_{2, p} (V)), \end{matrix}

\begin{matrix} G_{1, p} (V, X_{D}) & = \log (\frac{∥ V + X_{D} - f_{p} (X_{D} | Y) ∥_{p}}{{∥ V ∥}_{p}}) \leq \{\begin{matrix} \log (1 + \frac{{mmpe}^{\frac{1}{p}} (X_{D}, snr, p)}{{∥ V ∥}_{p}}), p \neq 2 \\ \frac{1}{2} \log (1 + \frac{mmse (X_{D}, snr)}{{∥ V ∥}_{2}^{2}}), p = 2 \end{matrix}, \end{matrix}

(35c)

\begin{matrix} G_{2, p} (V) & = \log (\frac{k_{n, p} \cdot n^{\frac{1}{p}} \cdot {∥ V ∥}_{p}}{e^{\frac{1}{n} h_{e} (V)}}) . \end{matrix}

(35d)

Remark 2.

The condition in (35a) can be enforced by, for example, selecting the support of

V

to satisfy a non-overlap condition given by

\begin{matrix} supp (V + x_{i}) \cap supp (V + x_{j}) = \emptyset, \forall x_{i}, x_{j} \in supp (X_{D}), i \neq j, \end{matrix}

(36)

as was done in [54].

It is interesting to note that the lower bound in (35b) resembles the bound for lattice codes in [55], where

V

can be thought of as a dither,

G_{2, p}

corresponds to the log of the normalized p-moment of a compact region in

R^{n}

,

G_{1, p}

corresponds to the log of the normalized MMSE term, and

H (X_{D})

corresponds to the capacity C.

In order to show the advantage of Theorem 4 over the original Ozarow-Wyner bound (case of

n = 1

and with LMMSE instead of MMPE), we consider

X_{D}

uniformly distributed with the number of points equal to

N = ⌊ \sqrt{1 + snr} ⌋

, that is, we choose the number of points such that

H (X_{D}) \approx \frac{1}{2} \log (1 + snr)

. Figure 2 shows:

The solid cyan line is the “shaping loss” $\frac{1}{2} \log (\frac{π e}{6})$ for a one-dimensional infinite lattice and is the limiting gap if the number of points N grows faster than $\sqrt{snr}$ ;
The solid magenta line is the gap in the original Ozarow-Wyner bound in (33); and
The dashed purple, dashed-dotted blue and dotted green lines are the new gap given by Theorem 4 for values of $p = 2, 4, 6$ , respectively, and where we choose $V \sim U [- \frac{d_{\min (X_{D})}}{2}, \frac{d_{\min (X_{D})}}{2}]$ .

We note that the version of the Ozarow-Wyner bound in Theorem 4 provides the sharpest bound for the gap term. An open question, for

n = 1

, is what value of p provide the smallest gap and whether it coincides with the ultimate “shaping loss”.

For the AWGN channel there exist a number of other bounds that use discrete inputs as well (see [46,56,57,58] and references therein). The advantage of using Ozarow-Wyner type bounds, however, lies in their simplicity as they only depend on the number of signal constellation points and the minimum distance of the constellation.

The Ozarow-Wyner bound will play a key role in Section 5 and Section 8 where we examine achievable schemes for a point-to-point channel with a disturbance constraint and for a two-user Gaussian interference channel.

For recent applications of the bound in Theorem 4 to non-Gaussian and MIMO channels the reader is referred to [59,60,61].

3.3. SNR Evolution of Optimal Codes

The I-MMSE can also be used in the analysis of the MMSE SNR-evolution of asymptotically optimal code sequences (code sequences that approach capacity in the limit of blocklength). In particular, using the I-MMSE relationship one can exactly identify the MMSE SNR-evolution of asymptotically optimal code sequences for the Gaussian point-to-point channel.

Theorem 5.

(SNR evolution of the MMSE [62,63]) Any code sequence for the Gaussian point-to-point channel attains capacity if and only if

\begin{matrix} {mmse}_{\infty} (X, γ) = \{\begin{matrix} \frac{1}{1 + γ} & γ \leq snr, \\ 0 & γ \geq snr . \end{matrix} \end{matrix}

(37)

Figure 3 depicts the SNR evolution of the MMSE as described in Theorem 5. The discontinuity of the MMSE at

snr

is ofter referred to as the phase transition. From Theorem 5 it is clear that the optimal point-to-point code must have the same MMSE profile as the Gaussian distribution for all SNR’s before

snr

and experience a phase transition at

snr

. Intuitively, the phase transition happens because an optimal point-to-point code designed to operate at

snr

can be reliably decoded at

snr

and SNR’s larger than

snr

, and both the decoding and estimation errors can be driven to zero. It is also important to point out that the area under (37) is twice the capacity.

4. Applications to the Wiretap Channel

In this section, by focusing on the wiretap channel, it is shown how estimation theoretic techniques can be applied to multi-user information theory. The wiretap channel, introduced by Wyner in [64], is a point-to-point channel with an additional eavesdropper (see Figure 4a). The input is denoted by

X

, the output of the legitimate user is denoted by

Y

, and the output of an eavesdropper is denoted by

Y_{e}

. The transmitter of

X

, commonly referred to as Alice, wants to reliably communicate a message W to the legitimate receiver

Y

, commonly referred to as Bob, while keeping the message W secure to some extent from the eavesdropper

Y_{e}

, commonly referred to as Eve.

Definition 5.

A rate-equivocation pair

(R, d)

is said to be achievable over a wiretap channel if there exists a sequence of

(2^{n R}, n)

codes such that

\begin{matrix} \lim_{n \to \infty} P [\hat{W} \neq W] = 0, & r e l i a b i l i t y c o n s t r a i n t, \end{matrix}

(38a)

\begin{matrix} \lim_{n \to \infty} \frac{1}{n} I (W; Y_{e}) \leq R - d, & i n f o r m a t i o n l e a k a g e o r s e c r e c y c o n s t r a i n t . \end{matrix}

(38b)

The rate-equivocation region

R_{s}

is defined as the closure of all achievable rate-equivocation pairs, and the secrecy capacity is defined as

\begin{matrix} C_{s} = \sup_{R} {R : (R, R) \in R_{s}} . \end{matrix}

(39)

The secrecy capacity of a general wiretap channel was shown by Csiszár and Körner [65] and is given by

\begin{matrix} C_{s} = \max_{P_{U X}} \{I (U; Y) - I (U; Y_{e})\} \end{matrix}

(40)

where U is an auxiliary random variable that satisfies the Markov relationship

U \leftrightarrow X \leftrightarrow (Y, Y_{e})

.

In the case of a degraded wiretap channel (i.e., a wiretap channel obeying the Markov relationship

X \leftrightarrow Y \leftrightarrow Y_{e}

) the expression in (40) reduces to

\begin{matrix} C_{s} = \max_{P_{X}} \{I (X; Y) - I (X; Y_{e})\} . \end{matrix}

(41)

In fact, the expression in (41) for the degraded channel predates the expression in (40) and was shown in the original work of Wyner [64].

4.1. Converse of the Gaussian Wiretap Channel

In this section, we consider the practically relevant scalar Gaussian wiretap channel given by

\begin{matrix} Y & = \sqrt{snr} X_{1} + Z_{1}, \end{matrix}

(42a)

\begin{matrix} Y_{e} & = \sqrt{{snr}_{0}} X_{2} + Z_{2}, \end{matrix}

(42b)

where

snr \geq {snr}_{0}

, with an additional input power constraint

E [X^{2}] \leq 1

. This setting was considered in [66], and the secrecy capacity was shown to be

\begin{matrix} C_{s} = \frac{1}{2} \log (\frac{1 + snr}{1 + {snr}_{0}}) . \end{matrix}

(43)

In contrast to the proof in [66], where the key technical tool used to maximize the expression in (41) was the EPI, by using the I-MMSE relationship the capacity in (43) can be shown via the following simple three line argument [30]:

\begin{matrix} I (X; Y) - I (X; Y_{e}) & = \frac{1}{2} \int_{{snr}_{0}}^{snr} mmse (X, t) d t \end{matrix}

(44a)

\begin{matrix} \leq \frac{1}{2} \int_{{snr}_{0}}^{snr} \frac{1}{1 + t} d t \end{matrix}

(44b)

\begin{matrix} = \frac{1}{2} \log (\frac{1 + snr}{1 + {snr}_{0}}), \end{matrix}

(44c)

where the inequality follows by using the LMMSE upper bound in (11). It is also interesting to point out that the technique in (44) can be easily mimicked to derive the entire rate-equivocation region; for details see [23].

4.2. SNR Evolution of Optimal Wiretap Codes

In the previous section, we saw that the I-MMSE relationship is a very powerful mathematical tool and can be used to provide a simple derivation of the secrecy capacity of the scalar Gaussian wiretap channel. In fact, as shown in [28,34], the I-MMSE relationship can also be used to obtain practical insights. Specifically, it was shown to be useful in identifying key properties of optimal wiretap codes.

Theorem 6.

(SNR evolution of the MMSE [28]) Any code sequence for the Gaussian wiretap channel attains a rate equivocation pair

(R, C_{s})

, meaning it attains the maximum level of equivocation, if and only if

\begin{matrix} {mmse}_{\infty} (X; γ ∣ W) = 0, γ \geq {snr}_{0}, \end{matrix}

(45)

and

\begin{matrix} {mmse}_{\infty} (X; γ) = \{\begin{matrix} \frac{1}{1 + γ}, & γ \leq snr, \\ 0, & γ > snr, \end{matrix} \end{matrix}

(46)

regardless of the rate of the code, meaning for any

R \geq C_{s}

.

Note that, as shown in Theorem 5, (46) is the SNR-evolution of any point-to-point capacity achieving code sequence, C, to

Y

as shown in [62,63]; however, only a one-to-one mapping over this codebook sequence leads to the maximum point-to-point rate. The idea is that the maximum level of equivocation determines the SNR-evolution of

mmse (X; γ)

regardless of the rate.

The additional condition given in (45) is required in order to fully define the sub-group of code sequences that are

(R, C_{s})

codes for the Gaussian wiretap channel. Still, these conditions do not fully specify the rate of the code sequence, as the group contains codes of different rates R as long as

R \geq d_{\max}

. Note that the rate of the code is determined solely by the SNR-evolution of

mmse (X; γ | W)

in the region of

γ \in [0, {snr}_{0})

and is given by

\begin{matrix} R = \frac{1}{2} \int_{0}^{snr} [{mmse}_{\infty} (X; γ) - {mmse}_{\infty} (X; γ | W)] d γ . \end{matrix}

(47)

The immediate question that arises is: Can we find MMSE properties that will distinguish code sequences of different rates? The answer is affirmative in the two extreme cases: (i) When

R = C_{s}

, meaning a completely secure code; (ii) When

R = C

, meaning maximum point-to-point capacity. In the latter case, one-to-one mapping is required, and the conditional MMSE is simply zero for all SNR. Figure 5 considers the former case of perfect secrecy as well as an arbitrary intermediate case in which the rate is between the secrecy capacity and the point-to-point capacity.

According to the above result, constructing a completely secure code sequence requires splitting the possible codewords into sub-codes that are asymptotically optimal for the eavesdropper. This approach is exactly the one in Wyner’s original work [64], and also emphasized by Massey in [67], wherein the achievability proof the construction of the code sequence is such that the bins of each secure message are asymptotically optimal code sequences to the eavesdropper (saturating the eavesdropper). The above claim extends this observation by claiming that any mapping of messages to codewords (alternatively, any binning of the codewords) that attains complete secrecy must saturate the eavesdropper, thus supporting the known achievability scheme of Wyner. Moreover, it is important to emphasize that the maximum level of equivocation can be attained with no loss in rate, meaning the reliable receiver can continue communicating at capacity.

Another important point to note is that these results supports the necessity of a stochastic encoder for any code sequence for the Gaussian wiretap channel, achieving the maximum level of equivocation and with

R < C

(as shown in [68] for a completely secure code for the discrete memoryless wiretap channel), since one can show that the conditions guarantee

H (X | W) > 0

for any such code sequence.

5. Communication with a Disturbance Constraint

Consider a scenario in which a message, encoded as

X

, must be decoded at the primary receiver

Y

while it is also seen at the unintended/secondary receiver

Y_{0}

for which it is interference, as shown in Figure 6a. The transmitter wishes to maximize its communication rate, while subject to a constraint on the disturbance it inflicts on the secondary receiver, and where the disturbance is measured by some function

F (X, Y_{0})

. It is common to refer to such a scenario as communication with a disturbance constraint. The choice of

F (X, Y_{0})

depends on the application one has in mind. For example, a common application is to limit the interference that the primary user inflicts on the secondary. In this case, two possible choices of

F (X, Y_{0})

are the mutual information

I (X; Y_{0})

and the MMSE

mmse (X | Y_{0})

, considered in [69,70], respectively. In what follows we review these two possible measures of disturbance, so as to explain the advantages of the MMSE as a measure of disturbance that best models the interference.

5.1. Max-I Problem

Consider a Gaussian noise channel and take the disturbance to be measured in terms of the MMSE (i.e.,

F (X, Y_{0}) = mmse (X, {snr}_{0})

), as shown on Figure 6b. Intuitively, the MMSE disturbance constraint quantifies the remaining interference after partial interference cancellation or soft-decoding have been performed [47,70]. Formally, the following problem was considered in [50]:

Definition 6.

(Max-I problem.) For some

β \in [0, 1]

\begin{matrix} C_{n} (snr, {snr}_{0}, β) : = \sup_{X} I_{n} (X, snr), \end{matrix}

(48a)

\begin{matrix} {s . t . ∥ X ∥}_{2}^{2} \leq 1, p o w e r c o n s t r a i n t \end{matrix}

(48b)

\begin{matrix} a n d mmse (X, {snr}_{0}) \leq \frac{β}{1 + β {snr}_{0}}, M M S E c o n s t r a i n t . \end{matrix}

(48c)

The subscript n in

C_{n} (snr, {snr}_{0}, β)

emphasizes that we consider length n inputs

X \in R^{n}

. Clearly

C_{n} (snr, {snr}_{0}, β)

is a non-decreasing function of n. The scenario depicted in Figure 6b is captured when

n \to \infty

in the Max-I problem, in which case the objective function has a meaning of reliable achievable rate.

The scenario modeled by the Max-I problem is motivated by the two-user Gaussian interference channel (G-IC), whose capacity is known only for some special cases. The following strategies are commonly used to manage interference in the G-IC:

Interference is treated as Gaussian noise: in this approach the interference is not explicitly decoded. Treating interference as noise with Gaussian codebooks has been shown to be sum-capacity optimal in the so called very-weak interference regime [71,72,73].
Partial interference cancellation: by using the Han-Kobayashi (HK) achievable scheme [74], part of the interfering message is jointly decoded with part of the desired signal. Then the decoded part of the interference is subtracted from the received signal, and the remaining part of the desired signal is decoded while the remaining part of the interference is treated as Gaussian noise. With Gaussian codebooks, this approach has been shown to be capacity achieving in the strong interference regime [75] and optimal within 1/2 bit per channel per user otherwise [76].
Soft-decoding/estimation: the unintended receiver employs soft-decoding of part of the interference. This is enabled by using non-Gaussian inputs and designing the decoders that treat interference as noise by taking into account the correct (non-Gaussian) distribution of the interference. Such scenarios were considered in [44,46,49], and shown to be optimal to within either a constant or a $O (\log \log (snr))$ gap for all regimes in [45].

Even though the Max-I problem is somewhat simplified, compared to that of determining the capacity of the G-IC, as it ignores the existence of the second transmission, it can serve as an important building block towards characterizing the capacity of the G-IC [47,70], especially in light of the known (but currently uncomputable) limiting expression for the capacity region [77]:

\begin{matrix} C_{\infty}^{IC} = \lim_{n \to \infty} co ⋃_{P_{X_{1} X_{2}} = P_{X_{1}} P_{X_{2}}} \{\begin{matrix} 0 \leq R_{1} \leq I_{n} (X_{1}; Y_{1}) \\ 0 \leq R_{2} \leq I_{n} (X_{2}; Y_{2}) \end{matrix}\}, \end{matrix}

(49)

where

co

denotes the convex closure operation. Moreover, observe that for any finite n we have that the capacity region can be inner bounded by

\begin{matrix} C_{n}^{IC} \subset C_{\infty}^{IC}, \end{matrix}

(50)

where

\begin{matrix} C_{n}^{IC} = co ⋃_{P_{X_{1} X_{2}} = P_{X_{1}} P_{X_{2}}} \{\begin{matrix} 0 \leq R_{1} \leq I_{n} (X_{1}; Y_{1}) \\ 0 \leq R_{2} \leq I_{n} (X_{2}; Y_{2}) \end{matrix}\} . \end{matrix}

(51)

The inner bound

C_{n}^{IC}

will be referred to as the treating interference as noise (TIN) inner bound. Finding the input distributions

P_{X_{1}} P_{X_{2}}

that exhaust the achievable region in

C_{n}^{IC}

is an important open problem. In Section 8, for a special case of

n = 1

, we will demonstrate that

C_{1}^{IC}

is within a constant or

O (\log \log (snr))

from the capacity

C_{\infty}^{IC}

. Therefore, the Max-I problem, denoted by

C_{n} (snr, {snr}_{0}, β)

in (48), can serve as an important step in characterizing the structure of optimal input distributions for

C_{n}^{IC}

. We also note that in [47,70] it was conjectured that the optimal input for

C_{1} (snr, {snr}_{0}, β)

is discrete. For other recent works on optimizing the TIN region in (51), we refer the reader to [43,46,49,78,79] and the references therein.

The importance of studying models of communication systems with disturbance constraints has been recognized previously. For example, in [69] Bandemer et al. studied the following problem related to the Max-I problem in (48).

Definition 7.

(Bandemer et al. problem [69]) For some

R \geq 0

\begin{matrix} I_{n} ({snr}_{0}, snr, R) & : = \max_{X} I_{n} (X, {snr}_{0}), \end{matrix}

(52a)

\begin{matrix} {s . t . ∥ X ∥}_{2}^{2} \leq 1, power constraint, \end{matrix}

(52b)

\begin{matrix} and I_{n} (X, {snr}_{0}) \leq R, disturbance constraint . \end{matrix}

(52c)

In [69] it was shown that the optimal solution for

I_{n} (snr, {snr}_{0}, R)

, for any n, is attained by

X \sim N (0, α I)

where

α = \min (1, \frac{e^{2 R} - 1}{{snr}_{0}})

; here

α

is such that the most stringent constraint between (52b) and (52c) is satisfied with equality. In other words, the optimal input is independent and identically distributed (i.i.d.) Gaussian with power reduced such that the disturbance constraint in (52c) is not violated.

Theorem 7

([69]). The rate-disturbance region of the problem in (52) is given by

\begin{matrix} I_{n} ({snr}_{0}, snr, R) \leq \frac{1}{2} \log (1 + α snr), \end{matrix}

(53)

with equality if and only if

X \sim N (0, α I)

where

α = \min (1, \frac{e^{2 R} - 1}{{snr}_{0}})

.

Measuring the disturbance with the mutual information as in (52), in contrast to the MMSE as in (48), suggests that it is always optimal to use Gaussian codebooks with reduced power without any rate splitting. Moreover, while the mutual information constraint in (52) limits the amount of information transmitted to the unintended receiver, it may not be the best choice for measuring the interference, since any information that can be reliably decoded by the unintended receiver is not really interference. For this reason, it has been argued in [47,70] that the Max-I problem in (48) with the MMSE disturbance constraint is a more suitable building block to study the G-IC, since the MMSE constraint accounts for the interference, and captures the key role of rate splitting.

We also refer the reader to [80] where, in the context of discrete memoryless channels, the disturbance constraint was modeled by controlling the type (i.e., empirical distribution) of the interference at the secondary user. Moreover, the authors of [80] were able to characterize the tradeoff between the rate and the type of the induced interference by exactly characterizing the capacity region of the problem at hand.

We first consider a case of the Max-I problem when

n \to \infty

.

5.2. Characterization of $C_{n} (snr, {snr}_{0}, β)$ as $n \to \infty$

For the practically relevant case of

n \to \infty

, which has an operational meaning,

C_{\infty} (snr, {snr}_{0}, β)

has been characterized in [70] and is given by the following theorem.

Theorem 8

([70]). For any

snr, {snr}_{0} \geq 0

and

β \in [0, 1]

\begin{matrix} C_{\infty} (snr, {snr}_{0}, β) = \lim_{n \to \infty} C_{n} (snr, {snr}_{0}, β), \\ = \{\begin{matrix} \frac{1}{2} \log (1 + snr), & snr \leq {snr}_{0}, \\ \frac{1}{2} \log (1 + β snr) + \frac{1}{2} \log (1 + \frac{{snr}_{0} (1 - β)}{1 + β {snr}_{0}}), & snr \geq {snr}_{0}, \end{matrix} \\ = \frac{1}{2} \log^{+} (\frac{1 + β snr}{1 + β {snr}_{0}}) + \frac{1}{2} \log (1 + \min (snr, {snr}_{0})), \end{matrix}

(54)

which is achieved by using superposition coding with Gaussian codebooks.

The proof of the achievability part of Theorem 8 is by using superposition coding and is outside of the scope of this work. The interested reader is referred to [63,70,81] for a detailed treatment of MMSE properties of superposition codes.

Next, we show a converse proof of Theorem 8. In addition, to the already familiar use of the LMMSE bound technique, as in the wiretap channel in Section 4.1, we also show an application of the SCPP bound. The proof for the case of

snr \leq {snr}_{0}

follows by ignoring the MMSE constraint at

{snr}_{0}

and using the LMMSE upper bound

\begin{matrix} I_{n} (X, snr) & = \frac{1}{2} \int_{0}^{snr} mmse (X, t) d t \\ \leq \frac{1}{2} \int_{0}^{snr} \frac{1}{1 + t} d t \\ = \frac{1}{2} \log (1 + snr) d t . \end{matrix}

Next, we focus on the case of

snr \geq {snr}_{0}

\begin{matrix} I_{n} (X, snr) & = \frac{1}{2} \int_{0}^{snr} mmse (X, t) d t \\ = \frac{1}{2} \int_{0}^{{snr}_{0}} mmse (X, t) d t + \frac{1}{2} \int_{{snr}_{0}}^{snr} mmse (X, t) d t \\ \leq \frac{1}{2} \int_{0}^{{snr}_{0}} \frac{1}{1 + t} d t + \frac{1}{2} \int_{{snr}_{0}}^{snr} \frac{β}{1 + β t} d t \\ = \frac{1}{2} \log (1 + β snr) + \frac{1}{2} \log (1 + \frac{{snr}_{0} (1 - β)}{1 + β {snr}_{0}}), \end{matrix}

where the last inequality follows by upper bounding the integral over

[0, {snr}_{0}]

by the LMMSE bound in (11) and by upper bounding the integral over

[{snr}_{0}, snr]

using the SCPP bound in (21).

Figure 7 shows a plot of

C_{\infty} (snr, {snr}_{0}, β)

in (54) normalized by the capacity of the point-to-point channel

\frac{1}{2} \log (1 + snr)

. The region

snr \leq {snr}_{0}

(flat part of the curve) is where the MMSE constraint is inactive since the channel with

{snr}_{0}

can decode the interference and guarantee zero MMSE. The regime

snr \geq {snr}_{0}

(curvy part of the curve) is where the receiver with

{snr}_{0}

can no-longer decode the interference and the MMSE constraint becomes active, which in practice is the more interesting regime because the secondary receiver experiences “weak interference” that cannot be fully decoded (recall that in this regime superposition coding appears to be the best achievable strategy for the two-user Gaussian interference channel, but it is unknown whether it achieves capacity [76]).

5.3. Proof of the Disturbance Constraint Problem with a Mutual Information Constraint

In this section we show that the mutual information disturbance constraint problem in (52) can also be solved via an estimation theoretic approach.

An Alternative Proof of the Converse Part of Theorem 7.

Observe that, similarly to the Max-I problem, the interesting case of the

I_{n} ({snr}_{0}, snr, R)

is the “weak interference” regime (i.e.,

snr \geq {snr}_{0}

). This, follows since for the “strong interference” regime (i.e.,

snr \leq {snr}_{0}

) the result follows trivially by the data processing inequality

\begin{matrix} I_{n} (X, snr) \leq I_{n} (X, {snr}_{0}) \leq R, \end{matrix}

(55)

and maximizing (55) under the power constraint. To show Theorem 7, for the case of

snr \geq {snr}_{0}

, observe that

\begin{matrix} 0 \leq I_{n} (X, {snr}_{0}) \leq \frac{1}{2} \log (1 + {snr}_{0}), \end{matrix}

(56)

where the inequality on the right is due to the power constraint on

X

. Therefore, there exists some

α \in [0, 1]

such that

\begin{matrix} I_{n} (X, {snr}_{0}) = \frac{1}{2} \log (1 + α {snr}_{0}) . \end{matrix}

(57)

Using the I-MMSE, (57) can be written as

\begin{matrix} \frac{1}{2} \int_{0}^{{snr}_{0}} mmse (X, t) d t = \frac{1}{2} \int_{0}^{{snr}_{0}} \frac{α}{1 + α t} d t . \end{matrix}

(58)

From (58) and SCPP property we conclude that

mmse (X, t)

and

\frac{α}{1 + α t}

are either equal for all t, or cross each other once in the region

[0, {snr}_{0})

. In both cases, by the SCPP, we have

\begin{matrix} mmse (X, t) \leq \frac{α}{1 + α t}, \forall t \in [{snr}_{0}, \infty) . \end{matrix}

(59)

We are now in the position to bound the main term of the disturbance constrained problem. By using the I-MMSE relationship the mutual information can be bounded as follows:

\begin{matrix} I_{n} (X, snr) & = \frac{1}{2} \int_{0}^{{snr}_{0}} mmse (X, t) d t + \frac{1}{2} \int_{{snr}_{0}}^{snr} mmse (X, t) d t \\ = \frac{1}{2} \log (1 + α {snr}_{0}) + \frac{1}{2} \int_{{snr}_{0}}^{snr} mmse (X, t) d t \end{matrix}

\begin{matrix} \leq \frac{1}{2} \log (1 + α {snr}_{0}) + \frac{1}{2} \int_{{snr}_{0}}^{snr} \frac{α}{1 + α t} d t \end{matrix}

(60)

\begin{matrix} = \frac{1}{2} \log (1 + α snr), \end{matrix}

(61)

where the bound in (60) follows by the inequality in (59). The proof of the converse is concluded by establishing that the maximum value of

α

in (61) is given by

α = \min (1, \frac{e^{2 R} - 1}{{snr}_{0}})

which is a consequence of the bound

I_{n} (X, {snr}_{0}) \leq R

.

This concludes the proof of the converse. ☐

The achievability proof of Theorem 7 follows by using an i.i.d. Gaussian input with power

α

. This concludes the proof of Theorem 7.

In contrast to the proof in [69] which appeals to the EPI, the proof outlined here only uses the SCPP and the I-MMSE. Note, that unlike the proof of the converse of the Max-I problem, which also requires the LMMSE bound, the only ingredient in the proof of the converse for

I_{n} ({snr}_{0}, snr, R)

is a clever use of the SCPP bound. In Section 6, we will make use of this technique and show a converse proof for the scalar Gaussian broadcast channel.

Another observation is that the achievability proof of the

I_{n} ({snr}_{0}, snr, R)

holds for an arbitrary finite n while the achievability proof of the Max-I problem holds only as

n \to \infty

. In the next section, we demonstrate techniques for how to extend the achievability of the Max-I problem to the case of finite n. These techniques will ultimately be used to show an approximate optimality of the TIN inner bound for the two-user G-IC in Section 8.

5.4. Max-MMSE Problem

The Max-I problem in (48) is closely related to the following optimization problem.

Definition 8.

(Max-MMSE problem [50,82]) For some

β \in [0, 1]

\begin{matrix} M_{n} (snr, {snr}_{0}, β) : = \sup_{X} mmse (X, snr), \end{matrix}

(62a)

\begin{matrix} {s . t . ∥ X ∥}_{2}^{2} \leq 1, power constraint, \end{matrix}

(62b)

\begin{matrix} and mmse (X, {snr}_{0}) \leq \frac{β}{1 + β {snr}_{0}}, MMSE constraint . \end{matrix}

(62c)

The authors of [63,70] proved that

\begin{matrix} M_{\infty} (snr, {snr}_{0}, β) = \lim_{n \to \infty} M_{n} (snr, {snr}_{0}, β) = \{\begin{matrix} \frac{1}{1 + snr}, & snr < {snr}_{0}, \\ \frac{β}{1 + β snr}, & snr \geq {snr}_{0}, \end{matrix}, \end{matrix}

(63)

achieved by superposition coding with Gaussian codebooks. Clearly there is a discontinuity in (63) at

snr = {snr}_{0}

for

β < 1

. This fact is a well known property of the MMSE, and it is referred to as a phase transition [63].

The LMMSE bound provides the converse solution for

M_{\infty} (snr, {snr}_{0}, β)

in (63) in the regime

snr \leq {snr}_{0}

. An interesting observation is that in this regime the knowledge of the MMSE at

{snr}_{0}

is not used. The SCPP bound provides the converse in the regime

snr \leq {snr}_{0}

and, unlike the LMMSE bound, does use the knowledge of the value of MMSE at

{snr}_{0}

.

The solution of the Max-MMSE problem provides an upper bound on the Max-I problem (for every n including in the limit as

n \to \infty

), through the I-MMSE relationship

\begin{matrix} C_{n} (snr, {snr}_{0}, β) = \frac{1}{2} \int_{0}^{snr} mmse (X, t) d t & \leq \frac{1}{2} \int_{0}^{snr} M_{n} (t, {snr}_{0}, β) d t . \end{matrix}

(64)

The reason is that in the Max-MMSE problem one maximizes the integrand in the I-MMSE relationship for every

γ

, and the maximizing input may have a different distribution for each

γ

. The surprising result is that in the limit as

n \to \infty

we have equality, meaning that in the limit there exists an input that attains the maximum Max-MMSE solution for every

γ

. In other words, the integration of

M_{\infty} (γ, {snr}_{0}, β)

over

γ \in [0, snr]

results in

C_{\infty} (snr, {snr}_{0}, β)

. In view of the relationship in (64) we focus on the

M_{n} (snr, {snr}_{0}, β)

problem.

Note that SCPP gives a solution to the Max-MMSE problem in (62) for

snr \geq {snr}_{0}

and any

n \geq 1

as follows:

\begin{matrix} M_{n} (snr, {snr}_{0}, β) = \frac{β}{1 + β snr}, for snr \geq {snr}_{0}, \end{matrix}

(65)

achieved by

X \sim N (0, β I)

.

However, for

snr \leq {snr}_{0}

, where the LMMSE bound (11) is used without taking the constraint into account, it is no longer tight for every

n \geq 1

. Therefore, the emphasis in the treatment of the Max-MMSE problem is on the regime

snr \leq {snr}_{0}

. In other words, the phase transition phenomenon can only be observed as

n \to \infty

, and for any finite n the LMMSE bound on the MMSE at

snr \leq {snr}_{0}

must be sharpened, as the MMSE constraint at

{snr}_{0}

must restrict the input in such a way that would effect the MMSE performance at

snr \leq {snr}_{0}

. We refer to the upper bounds in the regime

snr \leq {snr}_{0}

as complementary SCPP bounds. Also, for any finite n,

mmse (X, snr)

is a continuous function of

snr

[30]. Putting these two facts together we have that, for any finite n, the objective function

M_{n} (snr, {snr}_{0}, β)

must be continuous in

snr

and converge to a function with a jump-discontinuity at

{snr}_{0}

as

n \to \infty

. Therefore,

M_{n} (snr, {snr}_{0}, β)

must be of the following form:

\begin{matrix} M_{n} (snr, {snr}_{0}, β) = \{\begin{matrix} \frac{1}{1 + snr}, & snr \leq {snr}_{L}, \\ T_{n} (snr, {snr}_{0}, β), & {snr}_{L} \leq snr \leq {snr}_{0}, \\ \frac{β}{1 + β snr}, & {snr}_{0} \leq snr, \end{matrix} \end{matrix}

(66)

for some

{snr}_{L}

. The goal is to characterize

{snr}_{L}

in (66) and the continuous function

T_{n} (snr, {snr}_{0}, β)

such that

\begin{matrix} T_{n} ({snr}_{L}, {snr}_{0}, β) & = \frac{1}{1 + {snr}_{L}}, \end{matrix}

(67a)

\begin{matrix} T_{n} ({snr}_{0}, {snr}_{0}, β) & = \frac{β}{1 + β {snr}_{0}}, \end{matrix}

(67b)

and give scaling bounds on the width of the phase transition region defined as

\begin{matrix} W_{n} : = {snr}_{0} - {snr}_{L} . \end{matrix}

(68)

In other words, the objective is to understand the behavior of the MMSE phase transitions for arbitrary finite n by obtaining complementary upper bounds on the SCPP. We first focus on upper bounds on

M_{n} (snr, {snr}_{0}, β)

.

Theorem 9.

(D-Bound [50]) For any

X

and

0 < snr \leq {snr}_{0}

, we have

\begin{matrix} mmse (X, snr) \leq mmse (X, {snr}_{0}) + k_{n} (\frac{1}{snr} - \frac{1}{{snr}_{0}}), \end{matrix}

(69a)

\begin{matrix} k_{n} \leq n + 2, \end{matrix}

(69b)

The proof of Theorem 9 can be found in [50] and relies on developing bounds on the derivative of the MMSE with respect to the SNR.

Theorem 10.

(M-Bound [25]) For

0 < snr \leq {snr}_{0}

,

\begin{matrix} mmse (X, snr) \leq \min_{r > \frac{2}{γ}} κ (r, γ, n) {(mmse (X, {snr}_{0}))}^{\frac{γ r - 2}{r - 2}}, \end{matrix}

(70a)

where

γ : = \frac{snr}{2 {snr}_{0} - snr} \in (0, 1],

and

\begin{matrix} κ (r, γ, n) : = \frac{\sqrt{2}}{n^{1 - γ}} {(\frac{1 + γ}{γ})}^{\frac{n (1 - γ) - 1}{2}} M_{r}^{\frac{2 (1 - γ)}{r - 2}}, \end{matrix}

(70b)

\begin{matrix} M_{r} : = {∥X - E [X | Y_{{snr}_{0}}]∥}_{r}^{r} \leq 2^{r} \min (\frac{{∥ Z ∥}_{r}^{r}}{{snr}_{0}^{\frac{r}{2}}}, {∥ X ∥}_{r}^{r}) . \end{matrix}

(70c)

The bounds in (69a) and in (70a) are shown in Figure 8. The key observation is that the bounds in (69a) and in (70a) are sharper versions of the LMMSE bound that take into account the value of the MMSE at

{snr}_{0}

. It is interesting to observe how the bounds converge with n going to ∞.

The bound in (70a) is asymptotically tighter than the one in (69a). It can be shown that the phase transition region shrinks as

O (\frac{1}{\sqrt{n}})

for (70a), and as

O (\frac{1}{n})

for the bound in (69a). It is not possible in general to assert that (70a) is tighter than (69a). In fact, for small values of n, the bound in (69a) can offer advantages, as seen for the case

n = 1

shown in Figure 8b. Another advantage of the bound in (69a) is its analytical simplicity.

With the bounds in (69a) and in (70a) at our disposal we can repeat the converse proof outline in (61).

5.5. Mixed Inputs

Another question that arises, in the context of finite n, is how to mimic the achievability of superposition codes? Specifically, how to select an input that will maximize

M_{n} (snr, {snr}_{0}, β)

when

snr \leq {snr}_{0}

.

We propose to use the following input, which in [45] was termed a mixed input:

\begin{matrix} X_{mix} & : = \sqrt{1 - δ} X_{D} + \sqrt{δ} X_{G}, δ \in [0, 1] : \end{matrix}

(71)

\begin{matrix} X_{G} & \sim N (0, I), ∥ X_{D} ∥_{2}^{2} \leq 1, \frac{1}{n} H (X_{D}) < \infty, \end{matrix}

(72)

where

X_{G}

and

X_{D}

are independent. The parameter

δ

and the distribution of

X_{D}

are to be optimized over.

The behavior of the input in (71) exibits many properties of superposition codes and we will see that the discrete part

X_{D}

will behave as the common message and the Gaussian part

X_{G}

will behave as the private message.

The input

X_{mix}

exhibits a decomposition property via which the MMSE and the mutual information can be written as the sum of the MMSE and the mutual information of the

X_{D}

and

X_{G}

components, albeit at different SNR values.

Proposition 7

([50]). For

X_{mix}

defined in (71) we have that

\begin{matrix} I_{n} (X_{mix}, snr) & = I_{n} (X_{D}, \frac{snr (1 - δ)}{1 + δ snr}) + I_{n} (X_{G}, snr δ), \end{matrix}

(73a)

\begin{matrix} mmse (X_{mix}, snr) & = \frac{1 - δ}{{(1 + snr δ)}^{2}} mmse (X_{D}, \frac{snr (1 - δ)}{1 + δ snr}) + δ mmse (X_{G}, snr δ) . \end{matrix}

(73b)

Observe that Proposition 7 implies that, in order for mixed inputs (with

δ < 1

) to comply with the MMSE constraint in (48c) and (62c), the MMSE of

X_{D}

must satisfy

\begin{matrix} mmse (X_{D}, \frac{{snr}_{0} (1 - δ)}{1 + δ {snr}_{0}}) \leq \frac{(β - δ) (1 + δ {snr}_{0})}{(1 - δ) (1 + β {snr}_{0})} . \end{matrix}

(74)

Proposition 7 is particularly useful because it allows us to design the Gaussian and discrete components of the mixed input independently.

Next, we evaluate the performance of

X_{mix}

in

M_{n} (snr, {snr}_{0}, β)

for the important special case of

n = 1

. Figure 9 shows upper and lower bounds on

M_{1} (snr, {snr}_{0}, β)

where we show the following:

The $M_{\infty} (snr, {snr}_{0}, β)$ upper bound in (63) (solid red line);
The upper D-bound (69a) (dashed cyan line) and upper M-bound (dashed red line) (70a);
The Gaussian-only input (solid green line), with $X \sim N (0, β)$ , where the power has been reduced to meet the MMSE constraint;
The mixed input (blue dashed line), with the input in (71). We used Proposition 7 where we optimized over $X_{D}$ for $δ = β \frac{{snr}_{0}}{1 + {snr}_{0}}$ . The choice of $δ$ is motivated by the scaling property of the MMSE, that is, $δ mmse (X_{G}, snr δ) = mmse (\sqrt{δ} X_{G}, snr)$ , and the constraint on the discrete component in (74). That is, we chose $δ$ such that the power of $X_{G}$ is approximately $β$ while the MMSE constraint on $X_{D}$ in (74) is not equal to zero. The input $X_{D}$ used in Figure 9 was found by a local search algorithm on the space of distributions with $N = 3$ , and resulted in $X_{D} = [- 1.8412, - 1.7386, 0.5594]$ with $P_{X} = [0.1111, 0.1274, 0.7615]$ , which we do not claim to be optimal;
The discrete-only input (Discrete 1 brown dashed-dotted line), with
$X_{D} = [- 1.8412, - 1.7386, 0.5594]$ with $P_{X} = [0.1111, 0.1274, 0.7615]$ , that is, the same discrete part of the above mentioned mixed input. This is done for completeness, and to compare the performance of the MMSE of the discrete component of the mixed input with and without the Gaussian component; and
The discrete-only input (Discrete 2 dotted magenta line), with
$X_{D} = [- 1.4689, - 1.1634, 0.7838]$ with $P_{X} = [0.1282, 0.2542, 0.6176]$ , which was found by using a local search algorithm on the space of discrete-only distributions with $N = 3$ points.

The choice of

N = 3

is motivated by the fact that it requires roughly

N = ⌊ \sqrt{1 + {snr}_{0}} ⌋

points for the PAM input to approximately achieve capacity of the point-to-point channel with SNR value

{snr}_{0}

.

On the one hand, Figure 9 shows that, for

snr \geq {snr}_{0}

, a Gaussian-only input with power reduced to

β

maximizes

M_{1} (snr, {snr}_{0}, β)

in agreement with the SCPP bound (green line). On the other hand, for

snr \leq {snr}_{0}

, we see that discrete-only inputs (brown dashed-dotted line and magenta dotted line) achieve higher MMSE than a Gaussian-only input with reduced power. Interestingly, unlike Gaussian-only inputs, discrete-only inputs do not have to reduce power in order to meet the MMSE constraint. The reason discrete-only inputs can use full power, as per the power constraint only, is because their MMSE decreases fast enough (exponentially in SNR) to comply with the MMSE constraint. However, for

snr \geq {snr}_{0}

, the behavior of the MMSE of discrete-only inputs, as opposed to mixed inputs, prevents it from being optimal; this is due to their exponential tail behavior. The mixed input (blue dashed line) gets the best of both (Gaussian-only and discrete-only) worlds: it has the behavior of Gaussian-only inputs for

snr \geq {snr}_{0}

(without any reduction in power) and the behavior of discrete-only inputs for

snr \leq {snr}_{0}

. This behavior of mixed inputs turns out to be important for the Max-I problem, where we need to choose an input that has the largest area under the MMSE curve.

Finally, Figure 9 shows the achievable MMSE with another discrete-only input (Discrete 2, dotted magenta line) that achieves higher MMSE than the mixed input for

snr \leq {snr}_{0}

but lower than the mixed input for

snr \geq {snr}_{0}

. This is again due to the tail behavior of the MMSE of discrete inputs. The reason this second discrete input is not used as a component of the mixed inputs is because this choice would violate the MMSE constraint on

X_{D}

in (74). Note that the difference between Discrete 1 and 2 is that, Discrete 1 was found as an optimal discrete component of a mixed input (i.e.,

δ = β \frac{{snr}_{0}}{1 + {snr}_{0}}

), while Discrete 2 was found as an optimal discrete input without a Gaussian component (i.e.,

δ = 0

).

We conclude this section by demonstrating that an inner bound on

C_{1} (snr, {snr}_{0}, β)

with the mixed input in (71) is to within an additive gap of the outer bound.

Theorem 11

([50]). A lower bound on

C_{1} (snr, {snr}_{0}, β)

with the mixed input in (71), with

X_{D} \sim PAM

and with input parameters as specified in Table 1, is to within

O (\log \log (\frac{1}{mmse (X, {snr}_{0})}))

.

We refer the reader to [50] for the details of the proof and extension of Theorem 11 to arbitrary n.

Please note that the gap result in Proposition 11 is constant in

snr

(i.e., independent of

snr

) but not in

{snr}_{0}

. Figure 10 compares the inner bounds on

C_{1} (snr, {snr}_{0}, β)

, normalized by the point-to-point capacity

\frac{1}{2} \log (1 + snr)

, with mixed inputs (dashed magenta line) in Proposition 11 to:

The $C_{\infty} (snr, {snr}_{0}, β)$ upper bound in (54) (solid red line);
The upper bound from integration of the bound in (69a) (dashed blue line);
The upper bound from integration of the bound in (70a) (dashed red line); and
The inner bound with $X \sim N (0, β)$ , where the reduction in power is necessary to satisfy the MMSE constraint $mmse (X, {snr}_{0}) \leq \frac{β}{1 + β {snr}_{0}}$ (dotted green line).

Figure 10 shows that Gaussian inputs are sub-optimal and that mixed inputs achieve large degrees of freedom compared to Gaussian inputs. Interestingly, in the regime

snr \leq {snr}_{0}

, it is approximately optimal to set

δ = 0

, that is, only the discrete part of the mixed input is used. This in particular supports the conjecture in [70] that discrete inputs may be optimal for

n = 1

and

snr \leq {snr}_{0}

. For the case

snr \geq {snr}_{0}

our results partially refute the conjecture by excluding the possibility of discrete inputs with finitely many points from being optimal.

The key intuition developed in this section about the mixed input and its close resemblance to superposition coding will be used in Section 8 to show approximate optimality of TIN for the two-user G-IC.

6. Applications to the Broadcast Channel

The broadcast channel (BC), introduced by Cover in [83], is depicted in Figure 11a. In the BC the goal of the transmitter is to reliably transmit the message

W_{1}

to receiver 1 and the message

W_{2}

to receiver 2. The transmitter encodes the pair of messages

(W_{1}, W_{2})

into a transmitted codeword

X

of length n. Receiver 1 receives the sequence

Y_{1}

of length n and receiver 2 receives the sequence

Y_{2}

of length n. They both try to decode their respective messages from their received sequence. An achievable rate pair is defined as follows:

Definition 9.

A rate pair

(R_{1}, R_{2})

is said to be achievable for each n, for a message

W_{1}

of cardinality

2^{n R_{1}}

and a message

W_{2}

of cardinality

2^{n R_{2}}

, if there exists an encoding function

\begin{matrix} f_{n} (W_{1}, W_{2}) & = X, \end{matrix}

and decoding functions

\begin{matrix} {\hat{W}}_{1} & = g_{1, n} (Y_{1}), \\ {\hat{W}}_{2} & = g_{2, n} (Y_{2}), \end{matrix}

such that

\begin{matrix} \lim_{n \to \infty} P [(W_{1}, W_{2}) \neq ({\hat{W}}_{1}, {\hat{W}}_{2})] = 0, \end{matrix}

assuming that

W_{1}

and

W_{2}

are uniformly distributed over the respective message sets.

The capacity is defined as the closure over all achievable rate pairs. Note that one can easily add to the above definition a common message.

The capacity of a general broadcast channel is still an open problem. However, the capacity is known for some important special cases [42] such as the degraded broadcast channel which is of interest in this work.

As told by Cover in [84] 1973–1974 was a year of “intense activity” where Bergmans, Gallager and others tried to provide a converse proof showing that the natural achievable region (shown in 1973 by Bergmans) is indeed the capacity region. Correspondences were exchanged between Gallager, Bergmans and Wyner until finally one day both Gallager and Bergmans sent a converse proof to Wyner. Gallager’s proof tackled the degraded (i.e.,

X \leftrightarrow Y_{1} \leftrightarrow Y_{2}

) discrete memoryless BC yielding the following [85]:

\begin{matrix} C_{B C} = ⋃_{P_{U X}} \{\begin{matrix} R_{1} \leq I (X; Y_{1} | U) \\ R_{2} \leq I (U; Y_{2}) \end{matrix}, \end{matrix}

(75)

where U is an auxiliary random variable with

U \leftrightarrow X \leftrightarrow (Y_{1}, Y_{2})

. It did not consider a constraint on the input.

Bergman’s proof directly examined the scalar Gaussian channel under a power constraint

E [X^{2}] \leq 1

and input-output relationship given by

\begin{matrix} Y_{1} & = \sqrt{{snr}_{1}} X + Z_{1}, \end{matrix}

(76)

\begin{matrix} Y_{2} & = \sqrt{{snr}_{2}} X + Z_{2}, \end{matrix}

(77)

where

{snr}_{1} \geq {snr}_{2}

(i.e., the degraded case) and applied the EPI (its first use since Shannon’s paper in 1948) [86]:

\begin{matrix} C_{B C} = ⋃_{α \in [0, 1]} \{\begin{matrix} R_{1} \leq \frac{1}{2} \log (1 + α {snr}_{1}) \\ R_{2} \leq \frac{1}{2} \log (\frac{1 + {snr}_{2}}{1 + α {snr}_{2}}) \end{matrix} . \end{matrix}

(78)

6.1. Converse for the Gaussian Broadcast Channel

In [30] Guo et al. have shown that a converse proof of the scalar (degraded) Gaussian channel can also be derived using the SCPP bound instead of the EPI, when applied on the extension of Gallager’s single-letter expression which takes into account also a power constraint.

The power constraint

E [X^{2}] \leq 1

, implies that there exists some

α \in [0, 1]

such that

\begin{matrix} I (X; Y_{2} | U) = \frac{1}{2} \log (1 + α {snr}_{2}) = \frac{1}{2} \int_{0}^{{snr}_{2}} \frac{α}{1 + α t} d t . \end{matrix}

(79)

By the chain rule of mutual information

\begin{matrix} I (U; Y_{2}) & = I (U, X; Y_{2}) - I (X; Y_{2} | U) \\ = I (X; Y_{2}) - I (X; Y_{2} | U), \end{matrix}

(80)

where in the last step we have used the Markov chain relationship

U \leftrightarrow X \leftrightarrow (Y_{1}, Y_{2})

. Using (79) and (80) the bound on

R_{2}

is given by

\begin{matrix} R_{2} & \leq I (U; Y_{2}) \\ = \frac{1}{2} \int_{0}^{{snr}_{2}} mmse (X, t) d t + \int_{0}^{{snr}_{2}} \frac{α}{1 + α t} d t \end{matrix}

\begin{matrix} \leq \frac{1}{2} \int_{0}^{{snr}_{2}} \frac{1}{1 + t} d t + \int_{0}^{{snr}_{2}} \frac{α}{1 + α t} d t \end{matrix}

(81)

\begin{matrix} = \frac{1}{2} \log (\frac{1 + {snr}_{2}}{1 + α {snr}_{2}}), \end{matrix}

(82)

where in (81) we have used the LMMSE bound. The inequality in (82) establishes the desired bound on

R_{2}

. To bound the

R_{1}

term observe that by using I-MMSE and (79)

\begin{matrix} I (X; Y_{2} | U) = \frac{1}{2} \int_{0}^{{snr}_{2}} mmse (X, t | U) d t = \frac{1}{2} \int_{0}^{{snr}_{2}} \frac{α}{1 + α t} d t, \end{matrix}

(83)

the expression in (83) implies that there exists some

0 \leq {snr}_{0} \leq {snr}_{2}

such that

\begin{matrix} mmse (X, {snr}_{0} | U) = \frac{α}{1 + α {snr}_{0}} . \end{matrix}

(84)

The equality in (84) together with the SCPP bound implies the following inequality:

\begin{matrix} mmse (X, t | U) \leq \frac{α}{1 + α t}, \end{matrix}

(85)

for all

t \geq {snr}_{2} \geq {snr}_{0}

. Therefore,

\begin{matrix} R_{1} & \leq I (X; Y_{1} | U) \\ = \frac{1}{2} \int_{0}^{{snr}_{2}} mmse (X, t | U) d t + \frac{1}{2} \int_{{snr}_{2}}^{{snr}_{1}} mmse (X, t | U) d t \end{matrix}

(86)

\begin{matrix} = \frac{1}{2} \log (1 + α {snr}_{2}) + \frac{1}{2} \int_{{snr}_{2}}^{{snr}_{1}} mmse (X, t | U) d t \end{matrix}

(87)

\begin{matrix} \leq \frac{1}{2} \log (1 + α {snr}_{2}) + \frac{1}{2} \int_{{snr}_{2}}^{{snr}_{1}} \frac{α}{1 + α t} d t \\ = \frac{1}{2} \log (\frac{1 + {snr}_{2}}{1 + α {snr}_{2}}), \end{matrix}

where the expression in (86) follows from (79) and the bound in (87) follows by using the bound in (85). This concludes the proof.

6.2. SNR Evolution of Optimal BC Codes

Similarly to the analysis presented in Section 4.2 the I-MMSE relationship can be used also to obtain practical insights and key properties of optimal code sequences for the scalar Gaussian BC. These were shown in [28,87].

The first result we present explains the implications of reliable decoding in terms of the MMSE behavior.

Theorem 12

([28]). Consider a code sequence, transmitting a message pair

(W_{1}, W_{2})

, at rates

(R_{1}, R_{2})

(not necessarily on the boundary of the capacity region), over the Gaussian BC.

W_{2}

can be reliably decoded from

Y_{2}

if and only if

\begin{matrix} {mmse}_{\infty} (X; γ | W_{2}) = {mmse}_{\infty} (X; γ), \forall γ \geq {snr}_{2} . \end{matrix}

(88)

The above theorem formally states a very obvious observation which is that once

W_{2}

can be decoded, it provides no improvement to the estimation of the transmitted codeword, beyond the estimation from the output. This insight is strengthened as it is also a sufficient condition for reliable decoding of the message

W_{2}

.

The main observation is an extension of the result given in [63], where it was shown that a typical code from the hierarchical code ensemble (which achieves capacity) designed for a given Gaussian BC has a specific SNR-evolution of the MMSE function. This result was extended and shown to hold for any code sequence on the boundary of the capacity region.

Theorem 13

([28]). An achievable code sequence for the Gaussian BC has rates on the boundary of the capacity region, meaning

\begin{matrix} (R_{1}, R_{2}) = (\frac{1}{2} \log (1 + α {snr}_{1}), \frac{1}{2} \log (\frac{1 + {snr}_{2}}{1 + α {snr}_{2}})), \end{matrix}

(89)

for some

α \in [0, 1]

, if and only if it has a deterministic mapping from

(W_{1}, W_{2})

to the transmitted codeword and

\begin{matrix} {mmse}_{\infty} (X; γ) & = \{\begin{matrix} \frac{1}{1 + γ}, & snr \in [0, {snr}_{2}) \\ \frac{α}{1 + α γ}, & γ \in [{snr}_{2}, {snr}_{1}) \\ 0, & γ \geq {snr}_{1} \end{matrix}, \end{matrix}

(90)

\begin{matrix} {mmse}_{\infty} (X; γ | W_{2}) & = \{\begin{matrix} \frac{α}{1 + α γ}, & snr \in [0, {snr}_{1}) \\ 0, & snr \geq {snr}_{1} \end{matrix} . \end{matrix}

(91)

Note that the above SNR-evolution holds for any capacity achieving code sequence for the Gaussian BC. This includes also codes designed for decoding schemes such as “dirty paper coding”, in which case the decoding at

Y_{1}

does not require the reliable decoding of the known “interference” (the part of the codeword that carries the information of

W_{2}

), but simply encodes the desired messages against that “interference”. In that sense the result is surprising since one does not expect such a scheme to have the same SNR-evolution as a superposition coding scheme, where the decoding is in layers: first the “interference” and only after its removal, the reliable decoding of the desired message.

Figure 12 depicts the result of Theorem 13 for capacity achieving code sequences.

7. Multi-Receiver SNR-Evolution

In this section we extend the results regarding the SNR-evolution of the Gaussian wiretap channel and the SNR-evolution of the Gaussian broadcast channel, given in Section 4.2 and Section 6.2, respectively, to the multi-receiver setting. Moreover, we enhance the graphical interpretation of the SNR-evolution to relate to the basic relevant quantities of rate and equivocation.

More specifically, we now consider a multi-receiver additive Gaussian noise setting in which

\begin{matrix} Y_{i} = \sqrt{{snr}_{i}} X + Z_{i}, \end{matrix}

(92)

where we assume that

{snr}_{1} \leq {snr}_{2} \leq \dots \leq {snr}_{K}

for some

K \geq 2

. Since both rate and equivocation are measured according to the conditional densities at the receivers we may further assume that

Z = Z_{i}

for all i. Moreover,

X

is the transmitted message encoded at the transmitter, assuming a set of some L messages

(W_{1}, W_{2}, \dots, W_{L})

. Each receiver may have a different set of requirements regarding these messages. Such requirements can include:

Reliably decoding some subset of these messages;
Begin ignorant to some extent regarding some subset of these messages, meaning having at least some level of equivocation regarding the messages within this subset;
A receiver may be an “unintended” receiver with respect to some subset of messages, in which case we might wish also to limit the “disturbance” these message have at this specific receiver. We may do so by limiting the MMSE of these messages; and
Some combination of the above requirements.

There might be, of course, additional requirements, but so far the application of the I-MMSE approach as done in [34,70,87,88], was able to analyze these types of requirements. We will now give the main results from which one can consider other specific cases as discussed at the end of this section.

We first consider only reliable communication, meaning a set of messages intended for receivers at different SNRs, in other words, a K-user Gaussian BC.

Theorem 14

([88]). Given a set of messages

(W_{1}, W_{2}, \dots, W_{K})

, such that

W_{i}

is reliably decoded at

{snr}_{i}

and

{snr}_{1} \leq {snr}_{2} \leq \dots \leq {snr}_{K}

, we have that

\begin{matrix} R_{i} = \frac{1}{2} \int_{0}^{{snr}_{i}} {mmse}_{\infty} (X; γ | W_{1}, \dots, W_{i - 1}) - {mmse}_{\infty} (X; γ | W_{1}, \dots, W_{i}) d γ . \end{matrix}

(93)

In the case of

R_{1}

the first MMSE is simply

{mmse}_{\infty} (X; γ)

(meaning

W_{0} = \emptyset

).

Note that due to the basic ordering of the MMSE quantity, meaning that for all

γ \geq 0

\begin{matrix} {mmse}_{\infty} (X; γ) \geq {mmse}_{\infty} (X; γ | W_{1}) \geq {mmse}_{\infty} (X; γ | W_{1}, W_{2}) \geq {mmse}_{\infty} (X; γ | W_{1}, W_{2}, W_{3}) \geq \dots, \end{matrix}

we have that the integrand is always non-negative. Thus, the above result slices the region defined by

{mmse}_{\infty} (X; γ)

into distinctive stripes defined by the conditional MMSE functions. Each such stripe corresponds to twice the respective rate. The order of the stripes from top to bottom is by the message first decoded to the one last decoded (see Figure 13); further, taking into account Theorem 12, which gives a necessary and sufficient condition for reliable communication in terms of MMSE functions, we know that for

snr \geq {snr}_{i}

the MMSE conditioned on any message reliably decoded at

{snr}_{i}

equals

{mmse}_{\infty} (X; γ)

; thus, we may extend the integration in the above result to any

snr \geq {snr}_{i}

(or even integrate to infinity).

We now consider in addition to reliable communication also the equivocation measure.

Theorem 15

([88]). Assume a set of independent messages

(W_{1}, W_{2}, \dots, W_{i})

such that

(W_{1}, W_{2}, \dots, W_{i - 1})

are reliably decoded at

Y ({snr}_{i - 1})

, however

W_{i}

is reliably decoded only at some

{snr}_{i} > {snr}_{i - 1}

. The equivocation of

W_{i}

at

Y ({snr}_{i - 1})

equals

\begin{matrix} H (W_{i} | Y ({snr}_{i - 1})) = \frac{1}{2} \int_{{snr}_{i - 1}}^{{snr}_{i}} {mmse}_{\infty} (X; γ | W_{1}, \dots, W_{i - 1}) - {mmse}_{\infty} (X; γ | W_{1}, W_{2}, \dots, W_{i}) d γ, \end{matrix}

(94)

which can also be written as

\begin{matrix} H (W_{i} | Y ({snr}_{i - 1})) = \frac{1}{2} \int_{{snr}_{i - 1}}^{{snr}_{i}} {mmse}_{\infty} (X; γ) - {mmse}_{\infty} (X; γ | W_{1}, \dots, W_{i}) d γ . \end{matrix}

(95)

The above result together with Theorem 14 provides a novel graphical interpretation. Theorem 14 divides the area below

{mmse}_{\infty} (X; γ)

into stripes, each corresponding to a rate. Theorem 15 further divides these stripes horizontally. The stripe corresponding to the rate of message

W_{i}

is an area between

{mmse}_{\infty} (X; γ | W_{1}, W_{2}, \dots, W_{i - 1})

and

{mmse}_{\infty} (X; γ | W_{1}, W_{2}, \dots, W_{i})

from

[0, {snr}_{i}]

. For any point

snr > {snr}_{i}

this area is then split into the region between

[0, snr]

which corresponds to the information that can be obtained regarding the message by

Y (snr)

and the region

[snr, {snr}_{i}]

which corresponds to the equivocation (see Figure 14 for an example).

Let us now assume complete secrecy, meaning

\begin{matrix} H (W_{i} | Y ({snr}_{i - 1})) & = H (W_{i}) . \end{matrix}

(96)

Using Theorems 14 and 15 we have that

\begin{matrix} \int_{0}^{{snr}_{i}} {mmse}_{\infty} (X; γ | W_{1}, \dots, W_{i - 1}) - {mmse}_{\infty} (X; γ | W_{1}, \dots, W_{i}) d γ = \\ \int_{{snr}_{i - 1}}^{{snr}_{i}} {mmse}_{\infty} (X; γ | W_{1}, \dots, W_{i - 1}) - {mmse}_{\infty} (X; γ | W_{1}, \dots, W_{i}) d γ, \end{matrix}

(97)

assuming we have reliable decoding of messages

(W_{1}, W_{2}, \dots, W_{i - 1})

at

{snr}_{i - 1}

. This reduces to

\begin{matrix} \int_{0}^{{snr}_{i - 1}} {mmse}_{\infty} (X; γ | W_{1}, \dots, W_{i - 1}) - {mmse}_{\infty} (X; γ | W_{1}, \dots, W_{i}) d γ = 0, \end{matrix}

(98)

which due to the non-negativity of the integrand results in

\begin{matrix} {mmse}_{\infty} (X; γ | W_{1}, W_{2}, \dots, W_{i - 1}) = {mmse}_{\infty} (X; γ | W_{1}, W_{2}, \dots, W_{i}), \end{matrix}

(99)

for all

γ \in [0, {snr}_{i - 1})

. This is exactly the condition for complete secrecy given in [34]. The important observation here is that to obtain complete secrecy we require that the stripe of the secure message is reduced to the section

[{snr}_{i - 1}, {snr}_{i}]

, where the eavesdropper is at

{snr}_{i - 1}

and the legitimate receiver is at

{snr}_{i}

. This reduction in the stripe of the secure message can be interpreted as having been used for the transmission of the camouflaging information required for complying with the secrecy constraint.

The above approach can be further extended and can provide a graphical interpretation for more elaborate settings with additional requirements at the receiver. An immediate such example would be adding “disturbance” constraints in terms of MMSEs. Another extension which has been also considered in [88] is the problem of “secrecy outside the bounded range” [89]. For this setting complete secrecy rate can be enhanced by using the inherent randomness in the message which results from the fact that it contains an additional “unintended” message which is not necessarily reliably decoded. For more details on this problem and its graphical interpretation the reader is referred to [88,89].

8. Interference Channels

A two user interference channel (IC), introduced by Ahlswede in [77], depicted in Figure 15, is a system consisting of two transmitters and two receivers. The goal of a transmitter

i \in [1 : 2]

is to reliably transmit the message

W_{i}

to receiver i. Transmitter i encodes a message

W_{i}

into a transmitted codeword

X_{i}

of length n. Receiver i receives the sequence

Y_{i}

of length n and tries to decode the message

W_{i}

from the observed sequence

Y_{i}

. An achievable rate pair for the IC is defined as follows:

Definition 10.

A rate pair

(R_{1}, R_{2})

is said to be achievable, if for a message

W_{1}

of cardinality

2^{n R_{1}}

and a message

W_{2}

of cardinality

2^{n R_{2}}

there exists a sequence of encoding functions

\begin{matrix} f_{n, 1} (W_{1}) & = X_{1}, \\ f_{n, 2} (W_{2}) & = X_{2}, \end{matrix}

and decoding functions

\begin{matrix} {\hat{W}}_{1} & = g_{1, n} (Y_{1}), \\ {\hat{W}}_{2} & = g_{2, n} (Y_{2}), \end{matrix}

such that

\begin{matrix} \lim_{n \to \infty} P [(W_{1}, W_{2}) \neq ({\hat{W}}_{1}, {\hat{W}}_{2})] = 0, \end{matrix}

(100)

assuming that

W_{1}

and

W_{2}

are uniformly distributed over their respective message sets.

The capacity region is defined as the closure over all achievable rate pairs. In [77] Ahlswede demonstrated a multi-letter capacity expression given in (49). Unfortunately, the capacity expression in (49) is considered “uncomputable” in the sense that we do not know how to explicitly characterize the input distributions that attain its convex closure. Moreover, it is not clear whether there exists an equivalent single-letter form for (49) in general.

Because of “uncomputability” the capacity expression in (49) has received little attention, except for the following: in [79] the limiting expression was used to show that limiting to jointly Gaussian distributions is suboptimal; in [72] the limiting expression was used to derive the sum-capacity in the very weak interference regime; and in [90] it was shown that in the high-power regime the limiting expression normalized by the point-to-point capacity (i.e., the degrees of freedom (DoF)) can be single letterized.

Instead, the field has focussed on finding alternative ways to characterize single-letter inner and outer bounds. The best known inner bound is the HK achievable scheme [74], which is:

capacity achieving in the strong interference regime [75,91,92];
capacity achieving for a class of injective deterministic channels [93,94];
approximately capacity achieving for a class of injective semi-deterministic channels [95]; and
approximately capacity achieving (within $1 / 2$ bit) for a class of Gaussian noise channels (which is a special case of the injective semi-deterministic channel) [76].

It is important to point out that in [96] the HK scheme was shown to be strictly sub-optimal for a class of DMC’s. Moreover, the result in [96] suggests that multi-letter achievable strategies might be needed to achieve capacity of the IC.

8.1. Gaussian Interference Channel

In this section we consider the practically relevant scalar G-IC channel, depicted in Figure 15b, with input-output relationship

\begin{matrix} Y_{1} & = h_{11} X_{1} + h_{12} X_{2} + Z_{1}, \end{matrix}

(101a)

\begin{matrix} Y_{2} & = h_{21} X_{1} + h_{22} X_{2} + Z_{2}, \end{matrix}

(101b)

where

Z_{i}

is i.i.d. zero-mean unit-variance Gaussian noise. For the G-IC in (101), the maximization in (49) is further restricted to inputs satisfying the power constraint

E [X_{i}^{2}] \leq 1

,

i \in [1 : 2]

.

For simplicity we will focus primarily on the symmetric G-IC defined by

\begin{matrix} | h_{11} |^{2} = {| h_{22} |}^{2} = snr \geq 0, \end{matrix}

(102a)

\begin{matrix} | h_{12} |^{2} = {| h_{21} |}^{2} = inr \geq 0, \end{matrix}

(102b)

and we will discuss how the results for the symmetric G-IC extend to the general asymmetric setting.

In general, little is known about the optimizing input distribution in (49) for the G-IC and only some special cases have been solved. In [71,72,73] it was shown that i.i.d. Gaussian inputs maximize the sum-capacity in (49) for

\sqrt{\frac{inr}{snr}} (1 + inr) \leq \frac{1}{2}

in the symmetric case. In contrast, the authors of [79] showed that in general multivariate Gaussian inputs do not exhaust regions of the form in (49). The difficulty arises from the competitive nature of the problem [43]: for example, say

X_{2}

is i.i.d. Gaussian; taking

X_{1}

to be Gaussian increases

I (X_{1}; Y_{1})

but simultaneously decreases

I (X_{2}; Y_{2})

, as Gaussians are known to be the “best inputs” for Gaussian point-to-point power-constrained channels, but are also the “worst noise” (or interference, if it is treated as noise) for a Gaussian input.

So, instead of pursuing exact results, the community has recently focussed on giving performance guarantees on approximations of the capacity region [97]. In [76] the authors showed that the HK scheme with Gaussian inputs and without time-sharing is optimal to within 1/2 bit, irrespective of the channel parameters.

8.2. Generalized Degrees of Freedom

The constant gap result of [76] provides an exact characterization of the generalized degrees of freedom (gDoF) region defined as

\begin{matrix} D (α) & : = \{(d_{1}, d_{2}) : d_{i} : = \lim_{snr \to \infty} \frac{R_{i} (snr, inr = {snr}^{α})}{\frac{1}{2} \log (1 + snr)}, i \in [1 : 2], (R_{1}, R_{2}) is achievable\}, \end{matrix}

(103)

and where

D (α)

was shown to be

\begin{matrix} D (α) = \{(d_{1}, d_{2}) : \begin{matrix} d_{1} & \leq 1 \\ d_{2} & \leq 1 \\ d_{1} + d_{2} & \leq {[1 - α]}^{+} + \max (α, 1) \\ d_{1} + d_{2} & \leq \max (1 - α, α) + \max (1 - α, α) \\ 2 d_{1} + d_{2} & \leq {[1 - α]}^{+} + \max (1, α) + \max (1 - α, α) \\ d_{1} + 2 d_{2} & \leq {[1 - α]}^{+} + \max (1, α) + \max (1 - α, α) \end{matrix}\} . \end{matrix}

(104a)

The region in (104) is achieved by the HK scheme without time sharing; for the details see [42,76].

The

α

parameter is the strength of the interference in dB. The gDoF is an important metric that sheds light on the optimal coding strategies in the high SNR regime. The gDoF metric deemphasizes the role of noise in the network and only focuses on the role of signal interactions. Often these strategies can be translated to the medium and low SNR regions. The gDoF is especially useful in analyzing interference alignment strategies [98,99] where proper design of the signaling scheme can ensure very high rates. The notion of gDoF has received considerable attention in information theoretic literature and the interested reader is referred to [100] and reference therein.

For our purposes, we will only look at the sum-gDoF of the interference channel given by

\begin{matrix} d_{Σ} (α) = \max_{(d_{1}, d_{2}) \in D (α)} d_{1} + d_{2} = 2 \min (1, \max (\frac{α}{2}, 1 - \frac{α}{2}), \max (α, 1 - α)) . \end{matrix}

(105)

The sum-gDoF in (105) as a function of the parameter

α

is plotted in Figure 16. The curve in Figure 16 is often called the W-curve.

Depending on the parameters

(snr, inr = {snr}^{α})

we identify the following operational regimes: very strong interference,

α \geq 2

; strong interference,

1 \leq α < 2

; weak interference type I,

2 / 3 \leq α < 1

; weak interference type II,

1 / 2 \leq α < 2 / 3

; very weak interference,

0 \leq α < 1 / 2

.

8.3. Treating Interference as Noise

An inner bound on the capacity region in (49) can be obtained by considering i.i.d. inputs in (49) thus giving

\begin{matrix} R_{in}^{TIN + TS} & = co (⋃_{P_{X_{1} X_{2}} = P_{X_{1}} P_{X_{2}}} \{\begin{matrix} 0 \leq R_{1} \leq I (X_{1}; Y_{1}) \\ 0 \leq R_{2} \leq I (X_{2}; Y_{2}) \end{matrix}\}), \end{matrix}

(106)

where the superscript “TIN+TS” reminds the reader that the region is achieved by treating interference and noise and with time sharing (TS), where TS is enabled by the convex hull operation [42]. By further removing the convex hull operation in (106) we arrive at

\begin{matrix} R_{in}^{TINnoTS} & = ⋃_{P_{X_{1} X_{2}} = P_{X_{1}} P_{X_{2}}} \{\begin{matrix} 0 \leq R_{1} \leq I (X_{1}; Y_{1}) \\ 0 \leq R_{2} \leq I (X_{2}; Y_{2}) \end{matrix}\} . \end{matrix}

(107)

The region in (107) does not allow the users to time-share.

Obviously

\begin{matrix} R_{in}^{TINnoTS} \subseteq R_{in}^{TIN + TS} \subseteq C . \end{matrix}

The question of interest in this section is how

R_{in}^{TINnoTS}

fares compared to

C

. Note that there are many advantages in using TINnoTS in practice. For example, TINnoTS does not require codeword synchronization, as for example for joint decoding or interference cancellation, and does not require much coordination between users, thereby reducing communications overhead. Therefore, an interesting question that arises is: What are the limits of the TIN region?

By evaluating the TIN region with Gaussian inputs we get an achievable sum-gDoF of

\begin{matrix} d_{Σ}^{TIN - G} (α) = 2 \max (0, 1 - α), \end{matrix}

(108)

shown by a red curve in Figure 16. Clearly, using Gaussian inputs in the TIN region is gDoF optimal in the very weak interference regime and is otherwise strictly suboptimal. Because Gaussian inputs are often mutual information maximizers, one might think that the expression in (108) is the best that we can hope for. However, this intuition can be very misleading, and despite the simplicity of TIN, in [45] TINnoTS was shown to achieve capacity

C

within a gap which also implies that TIN is gDoF optimal. The key observation is to use non-Gaussian inputs, specifically the mixed inputs presented in Section 5.5.

Theorem 16

([45]). For the G-IC, as defined in (102), the TINnoTS achievable region in (107) is optimal to within a constant gap, or a gap of order

O (\log \log (\min (snr, inr)))

, and it is therefore gDoF optimal.

Next, we demonstrated the main ideas behind Theorem 16. The key to this analysis is to use mixed inputs, presented in Section 5.5, and given by

\begin{matrix} X_{i} & = \sqrt{1 - δ_{i}} X_{i D} + \sqrt{δ_{i}} X_{i G}, i \in [1 : 2] : \end{matrix}

(109a)

\begin{matrix} X_{i D} \sim PAM (N_{i}, \sqrt{\frac{12}{N_{i}^{2} - 1}}), \end{matrix}

(109b)

\begin{matrix} X_{i G} \sim N (0, 1), \end{matrix}

(109c)

where the random variables

X_{i j}

are independent for

i \in [1 : 2]

and

j \in {D, G}

. Inputs in (109) have four parameters, namely the number of points

N_{i} \in N

and the power split

δ_{i} \in [0, 1]

, for

i \in [1 : 2]

, which must be chosen carefully in order to match a given outer bound.

By evaluating the TIN region in (107) with mixed inputs in (109) we arrive at the following achievable region.

Proposition 8.

(TIN with Mixed Inputs [45]) For the G-IC the TINnoTS region in (107) contains the region

R_{in}

defined as

\begin{matrix} R_{in} : = \\ ⋃ \{\begin{matrix} 0 \leq R_{1} \leq I (S_{1}, S_{1} + Z_{1}) + \frac{1}{2} \log (1 + \frac{| h_{11} |^{2} δ_{1}}{1 + | h_{12} |^{2} δ_{2}}) - \min (\log (N_{2}), \frac{1}{2} \log (1 + \frac{| h_{12} |^{2} (1 - δ_{2})}{1 + | h_{12} |^{2} δ_{2}})) \\ 0 \leq R_{2} \leq I (S_{2}; S_{2} + Z_{2}) + \frac{1}{2} \log (1 + \frac{| h_{22} |^{2} δ_{2}}{1 + | h_{21} |^{2} δ_{1}}) - \min (\log (N_{1}), \frac{1}{2} \log (1 + \frac{| h_{21} |^{2} (1 - δ_{1})}{1 + | h_{21} |^{2} δ_{1}})) \end{matrix}\}, \end{matrix}

(110)

where the union is over all possible parameters

[N_{1}, N_{2}, δ_{1}, δ_{2}] \in N^{2} \times {[0, 1]}^{2}

for the mixed inputs in (109) and where the equivalent discrete constellations seen at the receivers are

\begin{matrix} S_{1} : = \frac{\sqrt{1 - δ_{1}} h_{11} X_{1 D} + \sqrt{1 - δ_{2}} h_{12} X_{2 D}}{\sqrt{1 + | h_{11} |^{2} δ_{1} + {| h_{12} |}^{2} δ_{2}}}, \end{matrix}

(111a)

\begin{matrix} S_{2} : = \frac{\sqrt{1 - δ_{1}} h_{21} X_{1 D} + \sqrt{1 - δ_{2}} h_{22} X_{2 D}}{\sqrt{1 + | h_{21} |^{2} δ_{1} + {| h_{22} |}^{2} δ_{2}}} . \end{matrix}

(111b)

Next, we select the parameters

[N_{1}, N_{2}, δ_{1}, δ_{2}]

to optimize the region in (110). For simplicity, we focus only on the very strong interference regime (

α \geq 2

). The gDoF optimality of TIN in the very strong interference regime is perhaps the most surprising. The capacity in this regime has been shown by Carleial in [91] who demonstrated that capacity can be achieved with a successive cancellation decoding strategy where the interference is decoded before the desired signal. Unlike the Carleial scheme TIN only use a point-to-point decoder for non-Gaussian noise and can be classified as a soft-interference-decoding strategy discussed in Section 5.1.

In the very strong interference (

α \geq 2

) regime the sum-gDoF is given by

\begin{matrix} d_{Σ} (α) = 2 . \end{matrix}

(112)

To show that TIN can achieve the gDoF (112), let the parameters in (110) be given by

N = N_{1} = N_{2} = ⌊ \sqrt{1 + snr} ⌋

and

δ_{1} = δ_{2} = 0

. It is not difficult to see that with this choice of inputs the rate in (110) is given by

\begin{matrix} R_{i} & = I (S_{i}, S_{i} + Z_{i}) - \min (\log (N), \frac{1}{2} \log (1 + inr)) \\ \geq I (S_{i}, S_{i} + Z_{i}) - \log (N) . \end{matrix}

Therefore, the key now is to lower bound

I (S_{i}, S_{i} + Z_{i})

. This is done by using the Ozarow-Wyner bound in (35b).

Lemma 2.

Let

N = N_{1} = N_{2} = ⌊ \sqrt{1 + snr} ⌋

and

δ_{1} = δ_{2} = 0

. Then,

\begin{matrix} I (S_{i}, S_{i} + Z_{i}) & \geq 2 \log (N) - \frac{1}{2} \log (\frac{π e}{3}) . \end{matrix}

(113)

Proof.

Using the Ozarow-Wyner bound in (35b)

\begin{matrix} I (S_{i}, S_{i} + Z_{i}) & \geq H (S_{i}) - \frac{1}{2} \log (\frac{π e}{6}) - \frac{1}{2} \log (1 + \frac{12 mmse (S_{i}, 1)}{d_{\min} {(S_{i})}^{2}}) \\ \overset{a)}{\geq} H (S_{i}) - \frac{1}{2} \log (\frac{π e}{6}) - \frac{1}{2} \log (1 + \frac{12 mmse (S_{i}, 1)}{snr d_{\min} {(X_{i})}^{2}}) \\ \overset{b)}{\geq} H (S_{i}) - \frac{1}{2} \log (\frac{π e}{6}) - \frac{1}{2} \log (1 + \frac{12 (snr + inr)}{(1 + snr + inr) snr d_{\min} {(X_{i})}^{2}}) \\ \overset{c)}{\geq} H (S_{i}) - \frac{1}{2} \log (\frac{π e}{6}) - \frac{1}{2} \log (1 + \frac{(snr + inr)}{(1 + snr + inr)}) \\ \overset{d)}{\geq} H (S_{i}) - \frac{1}{2} \log (\frac{π e}{3}), \end{matrix}

where the (in)-equalities follow from: (a) using the bound

d_{\min} (S_{i}) \geq \min (\sqrt{snr} d_{\min} (X_{1}),

\sqrt{inr} d_{\min} (X_{2}))

; (b) using the LMMSE bound in (11); (c) using the bound

d_{\min} {(X_{i})}^{2} = \frac{12}{N_{i}^{2} - 1} \geq \frac{12}{snr}

; and (d) using the bound

\frac{(snr + inr)}{(1 + snr + inr)} \leq 1

.

The proof is concluded by observing that in the very strong interference regime with the choice

N = N_{1} = N_{2} = ⌊ \sqrt{1 + snr} ⌋

, the entropy of a sum-set is given by

\begin{matrix} H (S_{i}) = H (X_{1}) + H (X_{2}) = 2 \log (N) . \end{matrix}

☐

Therefore, the sum-gDoF of TIN is given by

\begin{matrix} \lim_{snr \to \infty} \frac{R_{1} + R_{2}}{\frac{1}{2} \log (1 + snr)} & \geq \lim_{snr \to \infty} \frac{2 \log (N) - \log (\frac{π e}{3})}{\frac{1}{2} \log (1 + snr)} \\ = \lim_{snr \to \infty} \frac{2 \log (⌊ \sqrt{1 + snr} ⌋) - \log (\frac{π e}{3})}{\frac{1}{2} \log (1 + snr)} = 2 . \end{matrix}

This concludes the proof of the achievability for the very strong interference regime.

Using the same ideas as in the proof for the very strong interference regime one can extend the optimality of TIN to other regimes.

9. Concluding Remarks and Future Directions

This section concludes this work by summarizing some interesting future directions.

One of the intriguing extensions of the I-MMSE relationship is the gradient formula, obtained by Palomar and Verdú [9]:

\begin{matrix} \nabla_{H} I (X; H X + Z) = H E [(X - E [X ∣ H X + Z]) {(X - E [X ∣ H X + Z])}^{T}] . \end{matrix}

(114)

The expression in (114) has been used to study MIMO wiretap channels [101], extensions of Costa’s EPI [102], and the design of optimal precoders for MIMO Gaussian channels [103].

However, much work is needed to attain the same level of maturity for the gradient expression in (114) as for the original I-MMSE results [8]. For example, it is not clear what is the correct extension (or if it exists) of a matrix version of the SCPP in Proposition 4. A matrix version of the SCPP could facilitate a new proof of the converse of the MIMO BC by following the steps of Section 6.1. The reader is referred to [33] where the SCPP type bounds have been extended to several classes of MIMO Gaussian channels.

Estimation theoretic principles have been instrumental in finding a general formula for the DoF for a static scalar K-user Gaussian interference channel [90], based on notions of the information dimension [104] and the MMSE dimension [32]. While the DoF is an important measure of the network performance it would be interesting to see if the approach of [90] could be used to analyze the more robust gDoF measure. Undoubtedly, such an extension will rely on the interplay between estimation and information measures.

"Information bottleneck" type problems [105] are defined as

\begin{matrix} \min_{Y} I (X; Y, Z) \end{matrix}

(115a)

\begin{matrix} s . t . I (X; Y) = C, \end{matrix}

(115b)

where

X \sim N (0, 1)

,

Z = \sqrt{snr} X + N

with

N \sim N (0, 1)

independent of X and Y. A very elegant solution to (115) can be found by using the I-MMSE, the SCPP, and the argument used in the proof of the converse for the Gaussian BC in Section 6.1. It would be interesting to explore whether other variants of the bottleneck problem can be solved via estimation theoretic tools. For example, it would be interesting to consider

\begin{matrix} \max_{X, Z} I (X; Z) \end{matrix}

(116a)

\begin{matrix} s . t . I (Y; Z) \leq C, \end{matrix}

(116b)

where

X \leftrightarrow Y \leftrightarrow Z

and

Y = \sqrt{snr} X + N

where

N \sim N (0, 1)

and independent of X.

The extremal entropy inequality of [106], inspired by the channel enhancement method [107], was instrumental in showing several information theoretic converses in problems such as the MIMO wiretap channel [108], two-user Gaussian interference channel [71,72,73], and cognitive interference channel [109] to name a few. In view of the successful applications of the I-MMSE relationship to prove EPI type inequalities (e.g., [35,36,38,102]), it would be interesting to see if the extremal inequality presented in [106] can be shown via estimation theoretic arguments. Existence of such a method can reveal a way of deriving a large class of extremal inequalities potentially useful for information theoretic converses.

The extension of the I-MMSE results to cases that allow

snr

dependency of the input signal have been derived and shown to be useful in [10]. An interesting future direction is to consider the MMPE while allowing

snr

dependency of the input signal; such a generalization has the potential of being useful when studying feedback systems as did the generalization of the MMSE in [10].

Another interesting direction is to study sum-rates of arbitrary networks with the use of the Ozarow-Wyner bound in Theorem 4. Note that, the Ozarow-Wyner bound holds for an arbitrary transition probability and the rate of an arbitrary network with n independent inputs and outputs can be lower bounded as

\begin{matrix} \sum_{i} R_{i} = I (X_{1}, . . ., X_{n}; Y_{1}, . . ., Y_{n}) \geq \sum_{i} H (X_{i}) - gap, \end{matrix}

(117)

where the

gap

term is explicitly given in Theorem 4 and is a function of the network transition probability.

Acknowledgments

The work of Alex Dytso and H. Vincent Poor was partially supported by the NSF under Grants CNS-1456793 and CCF-1420575. The work of Ronit Bustin and Shlomo Shamai was supported by the European Union’s Horizon 2020 Research And Innovation Programme, grant agreement No. 694630. The contents of this article are solely the responsibility of the authors and do not necessarily represent the official views of the funding agencies.

Author Contributions

All authors contributed equally to this work.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Proposition 3

Wu et al. [110] have shown precisely and rigorously, using basic inequalities, that the I-MMSE relationship holds for any input of finite variance. The approach in [110] followed by examining the truncated input. Although this approach extends directly for any finite n it is not trivial to extend it to the limit as

n \to \infty

. Note that the truncation argument is used only for the lower bound, whereas the upper bound is obtained directly and can be extended to the limit as shown in the sequel. Thus, our approach is to show this extension indirectly, relying on the existence of the I-MMSE relationship for any n.

Proof.

We begin with an upper bound on the quantity

\begin{matrix} \frac{1}{n} I (X_{n, δ}; \sqrt{δ} X_{n, δ} + N_{n} | V_{n, δ} = v_{n}), \end{matrix}

(A1)

where

{X_{n, δ}, V_{n, δ}}_{δ > 0}

is a collection of jointly distributed random vectors and

N_{n} \sim N (0_{n}, I_{n})

is independent of the pair

{X_{n, δ}, V_{n, δ}}_{δ > 0}

.

For fixed

δ \in (0, 1)

, we have

\begin{matrix} \frac{1}{n} I (X_{n, δ}; \sqrt{δ} X_{n, δ} + N_{n} | V_{n, δ} = v_{n}) & \leq \frac{1}{n} \frac{1}{2} \log |I_{n} + δ K_{δ, v_{n}}| \\ = \frac{1}{n} \frac{1}{2} \log \prod_{i = 1}^{n} (1 + δ λ_{i} (K_{δ, v_{n}})) \\ = \frac{1}{2} \frac{1}{n} \sum_{i = 1}^{n} \log (1 + δ λ_{i} (K_{δ, v_{n}})) \\ \leq \frac{1}{2} \log (1 + δ \frac{1}{n} \sum_{i = 1}^{n} λ_{i} (K_{δ, v_{n}})) \\ \leq \frac{δ}{2} \frac{1}{n} \sum_{i = 1}^{n} λ_{i} (K_{δ, v_{n}}) \\ = \frac{δ}{2} \frac{1}{n} Tr (K_{δ, v_{n}}), \end{matrix}

(A2)

where in the first inequality we have used the fact that the Gaussian distribution maximizes the entropy and where we denote the conditional covariance matrix of

X_{n, δ}

given

v_{n}

as follows:

\begin{matrix} K_{δ, v_{n}} = E [(X_{n, δ} - E [X_{n, δ} | V_{n, δ} = v_{n}]) {(X_{n, δ} - E [X_{n, δ} | V_{n, δ} = v_{n}])}^{T}] . \end{matrix}

(A3)

The second inequality uses Jensen’s inequality and the last inequality uses

\log (1 + x) \leq x

. Thus, we have that

\begin{matrix} \frac{1}{n} I (X_{n, δ}; \sqrt{δ} X_{n, δ} + N_{n} | V_{n, δ} = v_{n}) \leq \frac{δ}{2} \frac{1}{n} Tr (K_{δ, v_{n}}) . \end{matrix}

(A4)

Note that we may take the expectation with respect to

V_{n}

of both sides of the inequality at any point, either by applying Jensen’s inequality, or simply when the right-hand-side is linear in

K_{δ, v_{n}}

. We get that

\begin{matrix} \frac{1}{n} I (X_{n, δ}; \sqrt{δ} X_{n, δ} + N_{n} | V_{n, δ}) \leq \frac{δ}{2} \frac{1}{n} Tr (K_{δ}), \end{matrix}

(A5)

which holds for all n. Our assumption is that the limit of the normalized conditional mutual information quantity exists; however, we have no such assumption over the normalized MMSE quantity on the right-hand-side of the above. Thus, we take the lim inf of both sides of the above inequality, to obtain

\begin{matrix} I_{\infty} (X_{δ}; \sqrt{δ} X_{δ} + N | V_{δ}) \leq \frac{δ}{2} \underset{n \to \infty}{lim inf} \frac{1}{n} Tr (K_{δ}) . \end{matrix}

(A6)

A similar assumption to that in [110] is that

\begin{matrix} \lim_{δ \to 0} \underset{n \to \infty}{lim inf} \frac{1}{n} Tr (K_{δ}) = σ_{\inf}^{2} < \infty, \end{matrix}

(A7)

for any pair

{X_{n, δ}, V_{n, δ}}_{δ > 0}

. However, in our setting

X_{n, δ}

is independent of

δ

and

V_{n, δ} = \sqrt{snr} X_{n} + N_{n}

is also independent of

δ

. Thus, the above convergence requirement reduces simply to

\begin{matrix} \underset{n \to \infty}{lim inf} \frac{1}{n} Tr (K) = σ_{\inf}^{2} (snr) < \infty, \end{matrix}

(A8)

where we emphasize the dependence on

V_{n}

, meaning the dependence on

snr

.

Now, instead of considering the lower bound on the normalized conditional mutual information as done in [110] going through the truncation argument, we examine the I-MMSE relationship directly:

\begin{matrix} \frac{d}{d snr} \frac{1}{n} I (X_{n}; \sqrt{snr} X_{n} + N_{n}) = \frac{1}{2} \frac{1}{n} Tr (K_{snr}), \end{matrix}

(A9)

where

\begin{matrix} K_{snr} = E [(X_{n} - E [X_{n} | \sqrt{snr} X_{n} + N_{n}]) {(X_{n} - E [X_{n} | \sqrt{snr} X_{n} + N_{n}])}^{T}] . \end{matrix}

(A10)

As shown in the “incremental proof” of the I-MMSE in [8] using the chain rule for mutual information and data processing arguments, the above is equivalent to

\begin{matrix} \lim_{δ \to 0} \frac{\frac{1}{n} I (X_{n}; \sqrt{δ} X_{n} + N_{1, n} | \sqrt{snr} X_{n} + N_{2, n})}{δ} = \frac{1}{2} \frac{1}{n} Tr (K_{snr}), \end{matrix}

(A11)

where

N_{1, n}

is independent of

N_{2, n}

and both are standard Gaussian. The above can also be equivalently written as follows:

\begin{matrix} \frac{1}{n} I (X_{n}; \sqrt{snr} X_{n} + N_{n}) = \frac{1}{2} \int_{0}^{snr} \frac{1}{n} Tr (K_{γ}) d γ . \end{matrix}

(A12)

We take the above integral form of this result in order to apply the reverse Fatou lemma. Denote for any

γ \geq 0

\begin{matrix} \underset{n \to \infty}{lim sup} \frac{1}{n} Tr (K_{γ}) = σ_{\sup}^{2} (γ) < \infty, \end{matrix}

(A13)

where the above is bounded again due to the assumption of an average power coonstraint on the input. We have that

\begin{matrix} \underset{n \to \infty}{lim sup} \int_{0}^{snr} |\frac{1}{n} Tr (K_{γ}) - σ_{\sup}^{2}| d γ \leq \int_{0}^{snr} \underset{n \to \infty}{lim sup} |\frac{1}{n} Tr (K_{γ}) - σ_{\sup}^{2}| d γ = 0, \end{matrix}

(A14)

where the inequality is due to the reverse Fatou lemma, where

\begin{matrix} f_{n} = |\frac{1}{n} Tr (K_{γ}) - σ_{\sup}^{2}| \leq 1 + σ_{\sup}^{2}, \forall n \geq 1, \end{matrix}

(A15)

due to the fact that

K_{γ} ⪯ E [X_{n} X_{n}^{T}]

and the power constraint asusmed on the input sequence. Due to the non-negativity of the integrand we have that

\begin{matrix} \underset{n \to \infty}{lim sup} \int_{0}^{snr} \frac{1}{n} Tr (K_{γ}) d γ = \int_{0}^{snr} σ_{\sup}^{2} (γ) d γ . \end{matrix}

(A16)

Thus, putting everything together we have that

\begin{matrix} \lim_{n \to \infty} \frac{1}{n} I (X_{n}; \sqrt{snr} X_{n} + N_{n}) & \overset{a}{=} \underset{n \to \infty}{lim sup} \frac{1}{n} I (X_{n}; \sqrt{snr} X_{n} + N_{n}) \\ \overset{b}{=} \underset{n \to \infty}{lim sup} \int_{0}^{snr} \frac{1}{n} Tr (K_{γ}) d γ \\ \overset{c}{=} \frac{1}{2} \int_{0}^{snr} σ_{\sup}^{2} (γ) d γ, \end{matrix}

(A17)

where a is due to assumption that the limit of the normalized mutual information exists; b is due to (A12); and c is due to (A16), meaning a consequence of Fatou’s lemma.

Taking the derivative with respect to

snr

of both sides we have that

\begin{matrix} \frac{d}{d snr} \lim_{n \to \infty} \frac{1}{n} I (X_{n}; \sqrt{snr} X_{n} + N_{n}) & = \frac{1}{2} σ_{\sup}^{2} (snr) . \end{matrix}

(A18)

We can again follow the arguments in the “incremental proof” of the I-MMSE in [8] using the chain rule for mutual information and data processing on the normalized mutual information in the limit to obtain that the above is equivalent to

\begin{matrix} \lim_{δ \to 0} \frac{\lim_{n \to \infty} \frac{1}{n} I (X_{n}; \sqrt{δ} X_{n} + N_{1, n} | \sqrt{snr} X_{n} + N_{2, n})}{δ} & = \frac{1}{2} σ_{\sup}^{2} (snr), \end{matrix}

(A19)

where in the second equation we again apply the same steps as in the “incremental channel” proof as in [8] on the normalized mutual information in the limit. Thus, we have that

\begin{matrix} \lim_{n \to \infty} \frac{1}{n} I (X_{n}; \sqrt{δ} X_{n} + N_{1, n} | \sqrt{snr} X_{n} + N_{2, n}) = \frac{δ}{2} σ_{\sup}^{2} (snr) + o (δ) . \end{matrix}

(A20)

Putting this together with the upper bound that we have obtained, we get that

\begin{matrix} \lim_{n \to \infty} \frac{1}{n} I (X_{n}; \sqrt{δ} X_{n} + N_{1, n} | \sqrt{snr} X_{n} + N_{2, n}) = \frac{δ}{2} σ_{\sup}^{2} (snr) + o (δ) \leq \frac{δ}{2} σ_{\inf}^{2} (snr) + o (δ) . \end{matrix}

(A21)

But since by definition

\begin{matrix} σ_{\inf}^{2} (snr) \leq σ_{\sup}^{2} (snr), \end{matrix}

(A22)

for all

snr \geq 0

we have that

\begin{matrix} \lim_{n \to \infty} \frac{1}{n} I (X_{n}; \sqrt{δ} X_{n} + N_{1, n} | \sqrt{snr} X_{n} + N_{2, n}) = \frac{δ}{2} σ_{\sup}^{2} (snr) + o (δ) = \frac{δ}{2} σ_{\inf}^{2} (snr) + o (δ), \end{matrix}

(A23)

and

\begin{matrix} σ_{\inf}^{2} (snr) = σ_{\sup}^{2} (snr) . \end{matrix}

(A24)

This concludes the proof. ☐

Appendix B. Proof of Proposition 5

Before showing the proof of Proposition 5 we present several auxiliary results and definitions.

We define the conditional MMPE as follows.

Definition A1.

For any

X

and

V

, the conditional MMPE of

X

given

V

is defined as

\begin{matrix} mmpe (X, snr, p | V) : = ∥ X - f_{p} (X | Y_{snr}, V) ∥_{p}^{p} . \end{matrix}

(A25)

The conditional MMPE in (A25) reflects the fact that the optimal estimator has been given additional information in the form of

V

. Note that when

Z

is independent of

(X, V)

we can write the conditional MMPE for

X_{u} \sim P_{X | V} (\cdot | v)

as

\begin{matrix} mmpe (X, snr, p | V) = \int mmpe (X_{v}, snr, p) d P_{V} (v) . \end{matrix}

(A26)

Since giving extra information does not increase the estimation error, we have the following result.

Proposition A1.

(Conditioning reduces the MMPE [25].) For every

snr \geq 0

, and random variable

X

, we have

\begin{matrix} mmpe (X, snr, p) \geq mmpe (X, snr, p | V) . \end{matrix}

(A27)

Finally, the following proposition generalizes [23] and states that the MMPE estimation of

X

from two observations is equivalent to estimating

X

from a single observation with a higher SNR.

Proposition A2

([25]). For every

X

and

p \geq 0

, let

V = \sqrt{Δ} \cdot X + Z_{Δ}

where

Z_{Δ} \sim N (0, I)

and where

(X, Z, Z_{Δ})

are mutually independent. Then

\begin{matrix} mmpe (X, {snr}_{0}, p | V) = mmpe (X, {snr}_{0} + Δ, p) . \end{matrix}

(A28)

Proof.

For two independent observations

Y_{{snr}_{0}} = \sqrt{{snr}_{0}} X + Z

and

Y_{Δ} = \sqrt{Δ} X + Z_{Δ}

where

Z_{Δ}

and

Z

are independent, by using maximal ratio combining, we have that

\begin{matrix} Y_{snr} & = \frac{\sqrt{Δ}}{\sqrt{{snr}_{0} + Δ}} Y_{Δ} + \frac{\sqrt{{snr}_{0}}}{\sqrt{{snr}_{0} + Δ}} Y_{{snr}_{0}} \\ = \sqrt{{snr}_{0} + Δ} X + W, \end{matrix}

where

W \sim N (0, I)

. Next by using the same argument as in [23], we have that the conditional probabilities are

\begin{matrix} p_{X | Y_{{snr}_{0}}, Y_{Δ}} (x | y_{{snr}_{0}}, y_{Δ}) = p_{X | Y} (x | y_{snr}), \end{matrix}

(A29)

for

y_{snr} = \frac{\sqrt{Δ}}{\sqrt{{snr}_{0} + Δ}} y_{Δ} + \frac{\sqrt{{snr}_{0}}}{\sqrt{{snr}_{0} + Δ}} y_{{snr}_{0}}

. The equivalence of the posterior probabilities implies that the estimation of

X

from

Y_{snr}

is as good as the estimation of

X

from

(Y_{{snr}_{0}}, Y_{Δ})

. This concludes the proof. ☐

We are now in a position to prove the SCPP bound in Proposition 5.

Proof.

Let

snr = {snr}_{0} + Δ

for

Δ \geq 0,

and let

Y_{Δ} = \sqrt{Δ} X + Z_{Δ} .

Then

\begin{matrix} Y_{snr} & = \frac{\sqrt{Δ}}{\sqrt{{snr}_{0} + Δ}} Y_{Δ} + \frac{\sqrt{{snr}_{0}}}{\sqrt{{snr}_{0} + Δ}} Y_{{snr}_{0}} \\ = \sqrt{{snr}_{0} + Δ} X + W, \end{matrix}

where

W \sim N (0, I)

. Next, let

\begin{matrix} m : = {mmpe}^{\frac{2}{p}} (X, {snr}_{0}, p) = {∥ X - f_{p} (X | Y_{{snr}_{0}}) ∥}_{p}^{2}, \end{matrix}

(A30)

and define a suboptimal estimator given

(Y_{Δ}, Y_{{snr}_{0}})

as

\begin{matrix} \hat{X} = \frac{(1 - γ)}{\sqrt{Δ}} Y_{Δ} + γ f_{p} (X | Y_{{snr}_{0}}), \end{matrix}

(A31)

for some

γ \in R

to be determined later. Then

\begin{matrix} X - \hat{X} = γ (X - f_{p} (X | Y_{{snr}_{0}})) - \frac{(1 - γ)}{\sqrt{Δ}} Z_{Δ}, \end{matrix}

and

\begin{matrix} {mmpe}^{\frac{1}{p}} (X, snr, p) & = ∥ X - f_{p} (X | Y_{snr}) ∥_{p} \\ \overset{a)}{=} {∥ X - f_{p} (X | Y_{Δ}, Y_{{snr}_{0}}) ∥}_{p} \\ \overset{b)}{\leq} {∥ X - \hat{X} ∥}_{p} = {∥γ (X - f_{p} (X | Y_{{snr}_{0}})) - \frac{(1 - γ)}{\sqrt{Δ}} Z_{Δ}∥}_{p} \\ \overset{c)}{=} \frac{{∥{∥ Z ∥}_{p}^{2} (X - f_{p} (X | Y_{{snr}_{0}})) - \sqrt{Δ} \cdot m \cdot Z_{Δ}∥}_{p}}{{∥ Z ∥}_{p}^{2} + Δ \cdot m}, \end{matrix}

(A32)

where the (in)-equalities follow from: (a) Proposition A2; (b) by using the sub-optimal estimator in (A31); and (c) by choosing

γ = \frac{{∥ Z ∥}_{p}^{2}}{{∥ Z ∥}_{p}^{2} + Δ \cdot m}

for m defined in (A30).

Next, by applying the triangle inequality to (A32) we get

\begin{matrix} {mmpe}^{\frac{1}{p}} (X, snr, p) & \leq \frac{{∥{∥ Z ∥}_{p}^{2} (X - f_{p} (X | Y_{{snr}_{0}}))∥}_{p} + {∥\sqrt{Δ} \cdot m \cdot Z_{Δ}∥}_{p}}{{∥ Z ∥}_{p}^{2} + Δ \cdot m} \\ = \frac{\sqrt{m} {∥ Z ∥}_{p} \cdot {(∥ Z ∥}_{p} + \sqrt{Δ} \cdot \sqrt{m})}{{∥ Z ∥}_{p}^{2} + Δ \cdot m} \\ \leq \sqrt{2} \frac{\sqrt{m} {∥ Z ∥}_{p}}{\sqrt{{∥ Z ∥}_{p}^{2} + Δ \cdot m}}, \end{matrix}

(A33)

where in the last step we have used

(a + b) \leq \sqrt{2} \sqrt{a^{2} + b^{2}}

.

Note that for the case

p = 2

, instead of using the triangle inequality in (A33), the term in (A32) can be expanded into a quadratic equation for which it is not hard to see that the choice of

γ = \frac{{∥ Z ∥}_{p}^{2}}{{∥ Z ∥}_{p}^{2} + Δ \cdot m}

is optimal and leads to the bound

\begin{matrix} {mmpe}^{\frac{1}{p}} (X, snr, p) \leq \frac{\sqrt{m} {∥ Z ∥}_{p}}{\sqrt{{∥ Z ∥}_{p}^{2} + Δ \cdot m}} . \end{matrix}

The proof is concluded by noting that

β = \frac{m}{{∥ Z ∥}_{p}^{2} - {snr}_{0} m}

. ☐

References

Stam, A.J. Some inequalities satisfied by the quantities of information of Fisher and Shannon. Inf. Control 1959, 2, 101–112. [Google Scholar] [CrossRef]
Esposito, R. On a relation between detection and estimation in decision theory. Inf. Control 1968, 12, 116–120. [Google Scholar] [CrossRef]
Hatsell, C.; Nolte, L. Some geometric properties of the likelihood ratio. IEEE Trans. Inf. Theor. 1971, 17, 616–618. [Google Scholar] [CrossRef]
Duncan, T.E. Evaluation of likelihood functions. Inf. Control 1968, 13, 62–74. [Google Scholar] [CrossRef]
Kadota, T.; Zakai, M.; Ziv, J. Mutual information of the white Gaussian channel with and without feedback. IEEE Trans. Inf. Theor. 1971, 17, 368–371. [Google Scholar] [CrossRef]
Kailath, T. The innovations approach to detection and estimation theory. Proc. IEEE 1970, 58, 680–695. [Google Scholar] [CrossRef]
Duncan, T.E. On the calculation of mutual information. SIAM J. Appl. Math. 1970, 19, 215–220. [Google Scholar] [CrossRef]
Guo, D.; Shamai (Shitz), S.; Verdú, S. Mutual information and minimum mean-square error in Gaussian channels. IEEE Trans. Inf. Theor. 2005, 51, 1261–1282. [Google Scholar] [CrossRef]
Palomar, D.P.; Verdú, S. Gradient of mutual information in linear vector Gaussian channels. IEEE Trans. Inf. Theor. 2006, 52, 141–154. [Google Scholar] [CrossRef]
Han, G.; Song, J. Extensions of the I-MMSE Relationship to Gaussian Channels With Feedback and Memory. IEEE Trans. Inf. Theor. 2016, 62, 5422–5445. [Google Scholar] [CrossRef]
Guo, D.; Shamai (Shitz), S.; Verdú, S. Additive non-Gaussian noise channels: Mutual information and conditional mean estimation. In Proceedings of the IEEE International Symposium on Information Theory, Adelaide, Australia, 4–9 September 2005; pp. 719–723. [Google Scholar]
Guo, D.; Shamai (Shitz), S.; Verdú, S. Mutual information and conditional mean estimation in Poisson channels. IEEE Trans. Inf. Theor. 2008, 54, 1837–1849. [Google Scholar] [CrossRef]
Atar, R.; Weissman, T. Mutual information, relative entropy, and estimation in the Poisson channel. IEEE Trans. Inf. Theor. 2012, 58, 1302–1318. [Google Scholar] [CrossRef]
Jiao, J.; Venkat, K.; Weissman, T. Relation between Information and Estimation in Discrete-Time Lévy Channels. IEEE Trans. Inf. Theor. 2017, 63, 3579–3594. [Google Scholar] [CrossRef]
Taborda, C.G.; Guo, D.; Perez-Cruz, F. Information-estimation relationships over binomial and negative binomial models. IEEE Trans. Inf. Theor. 2014, 60, 2630–2646. [Google Scholar] [CrossRef]
Verdú, S. Mismatched estimation and relative entropy. IEEE Trans. Inf. Theor. 2010, 56, 3712–3720. [Google Scholar] [CrossRef]
Guo, D. Relative entropy and score function: New information-estimation relationships through arbitrary additive perturbation. In Proceedings of the IEEE International Symposium on Information Theory, Seoul, Korea, 28 June–3 July 2009; pp. 814–818. [Google Scholar]
Zakai, M. On mutual information, likelihood ratios, and estimation error for the additive Gaussian channel. IEEE Trans. Inf. Theor. 2005, 51, 3017–3024. [Google Scholar] [CrossRef]
Duncan, T.E. Mutual information for stochastic signals and fractional Brownian motion. IEEE Trans. Inf. Theor. 2008, 54, 4432–4438. [Google Scholar] [CrossRef]
Duncan, T.E. Mutual information for stochastic signals and Lévy processes. IEEE Trans. Inf. Theor. 2010, 56, 18–24. [Google Scholar] [CrossRef]
Weissman, T.; Kim, Y.H.; Permuter, H.H. Directed information, causal estimation, and communication in continuous time. IEEE Trans. Inf. Theor. 2013, 59, 1271–1287. [Google Scholar] [CrossRef]
Venkat, K.; Weissman, T. Pointwise relations between information and estimation in Gaussian noise. IEEE Trans. Inf. Theor. 2012, 58, 6264–6281. [Google Scholar] [CrossRef]
Guo, D.; Shamai (Shitz), S.; Verdú, S. The Interplay Between Information and Estimation Measures; Now Publishers: Boston, MA, USA, 2013. [Google Scholar]
Ozarow, L.; Wyner, A. On the capacity of the Gaussian channel with a finite number of input levels. IEEE Trans. Inf. Theor. 1990, 36, 1426–1428. [Google Scholar] [CrossRef]
Dytso, A.; Bustin, R.; Tuninetti, D.; Devroye, N.; Poor, H.V.; Shamai (Shitz), S. On the minimum mean p-th error in Gaussian noise channels and its applications. arXiv 2016, arXiv:1607.01461. [Google Scholar]
Sherman, S. Non-mean-square error criteria. IRE Trans. Inf. Theor. 1958, 4, 125–126. [Google Scholar] [CrossRef]
Akyol, E.; Viswanatha, K.B.; Rose, K. On conditions for linearity of optimal estimation. IEEE Trans. Inf. Theor. 2012, 58, 3497–3508. [Google Scholar] [CrossRef]
Bustin, R.; Schaefer, R.F.; Poor, H.V.; Shamai (Shitz), S. On the SNR-evolution of the MMSE function of codes for the Gaussian broadcast and wiretap channels. IEEE Trans. Inf. Theor. 2016, 62, 2070–2091. [Google Scholar] [CrossRef]
Bustin, R.; Poor, H.V.; Shamai (Shitz), S. The effect of maximal rate codes on the interfering message rate. In Proceedings of the IEEE International Symposium on Information Theory, Honolulu, HI, USA, 29 June–4 July 2014; pp. 91–95. [Google Scholar]
Guo, D.; Wu, Y.; Shamai (Shitz), S.; Verdú, S. Estimation in Gaussian noise: Properties of the minimum mean-square error. IEEE Trans. Inf. Theor. 2011, 57, 2371–2385. [Google Scholar]
Wu, Y.; Verdú, S. Functional properties of minimum mean-square error and mutual information. IEEE Trans. Inf. Theor. 2012, 58, 1289–1301. [Google Scholar] [CrossRef]
Wu, Y.; Verdú, S. MMSE dimension. IEEE Trans. Inf. Theor. 2011, 57, 4857–4879. [Google Scholar] [CrossRef]
Bustin, R.; Payaró, M.; Palomar, D.P.; Shamai (Shitz), S. On MMSE crossing properties and implications in parallel vector Gaussian channels. IEEE Trans. Inf. Theor. 2013, 59, 818–844. [Google Scholar] [CrossRef]
Bustin, R.; Schaefer, R.F.; Poor, H.V.; Shamai (Shitz), S. On MMSE properties of optimal codes for the Gaussian wiretap channel. In Proceedings of the IEEE Information Theory Workshop, Jerusalem, Israel, 26 April–1 May 2015; pp. 1–5. [Google Scholar]
Verdú, S.; Guo, D. A simple proof of the entropy-power inequality. IEEE Trans. Inf. Theor. 2006, 52, 2165–2166. [Google Scholar] [CrossRef]
Guo, D.; Shamai (Shitz), S.; Verdú, S. Proof of entropy power inequalities via MMSE. In Proceedings of the IEEE International Symposium on Information Theory, Seattle, WA, USA, 9–14 July 2006; pp. 1011–1015. [Google Scholar]
Tulino, A.M.; Verdú, S. Monotonic decrease of the non-Gaussianness of the sum of independent random variables: A simple proof. IEEE Trans. Inf. Theor. 2006, 52, 4295–4297. [Google Scholar] [CrossRef]
Dytso, A.; Bustin, R.; Poor, H.V.; Shamai (Shitz), S. Comment on the equality condition for the I-MMSE proof of the entropy power inequality. arXiv 2017, arXiv:1703.07442. [Google Scholar]
Cover, T.; Thomas, J. Elements of Information Theory; Wiley: Hoboken, NJ, USA, 2006. [Google Scholar]
Shannon, C. A mathematical theory of communication. Available online: http://math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf (accessed on 3 August 2017).
Csiszar, I.; Körner, J. Information Theory: Coding Theorems for Discrete Memoryless Systems; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
Gamal, A.E.; Kim, Y.H. Network Information Theory; Cambridge University Press: Cambridge, UK, 2012. [Google Scholar]
Abbe, E.; Zheng, L. A coordinate system for Gaussian networks. IEEE Trans. Inf. Theor. 2012, 58, 721–733. [Google Scholar] [CrossRef] [Green Version]
Bennatan, A.; Shamai (Shitz), S.; Calderbank, A. Soft-decoding-based strategies for relay and interference channels: analysis and achievable rates using LDPC codes. IEEE Trans. Inf. Theor. 2014, 60, 1977–2009. [Google Scholar] [CrossRef]
Dytso, A.; Tuninetti, D.; Devroye, N. Interference as noise: Friend or foe? IEEE Trans. Inf. Theor. 2016, 62, 3561–3596. [Google Scholar] [CrossRef]
Moshksar, K.; Ghasemi, A.; Khandani, A. An alternative to decoding interference or treating interference as Gaussian noise. IEEE Trans. Inf. Theor. 2015, 61, 305–322. [Google Scholar] [CrossRef]
Shamai (Shitz), S. From constrained signaling to network interference alignment via an information-estimation perspective. IEEE Inf. Theor. Soc. Newslett. 2012, 62, 6–24. [Google Scholar]
Ungerboeck, G. Channel coding with multilevel/phase signals. IEEE Trans. Inf. Theor. 1982, 28, 55–67. [Google Scholar] [CrossRef]
Dytso, A.; Tuninetti, D.; Devroye, N. On the two-user interference channel with lack of knowledge of the interference codebook at one receiver. IEEE Trans. Inf. Theor. 2015, 61, 1257–1276. [Google Scholar] [CrossRef]
Dytso, A.; Bustin, R.; Tuninetti, D.; Devroye, N.; Poor, H.V.; Shamai (Shitz), S. New bounds on MMSE and applications to communication with the disturbance constraint. Available online: https://arxiv.org/pdf/1603.07628.pdf (accessed on 3 August 2017).
Dong, Y.; Farnia, F.; Özgür, A. Near optimal energy control and approximate capacity of energy harvesting communication. IEEE J. Selected Areas Commun. 2015, 33, 540–557. [Google Scholar] [CrossRef]
Shaviv, D.; Nguyen, P.M.; Özgür, A. Capacity of the energy-harvesting channel with a finite battery. IEEE Trans. Inf. Theor. 2016, 62, 6436–6458. [Google Scholar] [CrossRef]
Bloch, M.; Barros, J.; Rodrigues, M.R.; McLaughlin, S.W. Wireless information-theoretic security. IEEE Trans. Inf. Theor. 2008, 54, 2515–2534. [Google Scholar] [CrossRef]
Dytso, A.; Bustin, R.; Tuninetti, D.; Devroye, N.; Poor, H.V.; Shamai, S. On the applications of the minimum mean p-th error (MMPE) to information theoretic quantities. In Proceedings of the IEEE Information Theory Workshop, Cambridge, UK, 11–14 September 2016; pp. 66–70. [Google Scholar]
Forney, G.D., Jr. On the role of MMSE estimation in approaching the information theoretic limits of linear Gaussian channels: Shannon meets Wiener. In Proceedings of the 41st Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 1–3 October 2003. [Google Scholar]
Alvarado, A.; Brannstrom, F.; Agrell, E.; Koch, T. High-SNR asymptotics of mutual information for discrete constellations with applications to BICM. IEEE Trans. Inf. Theor. 2014, 60, 1061–1076. [Google Scholar] [CrossRef]
Wu, Y.; Verdú, S. The impact of constellation cardinality on Gaussian channel capacity. In Proceedings of the 48th Annual Allerton Conference on Communication, Control and Computing, Monticello, IL, USA, 29 September–11 October 2010; pp. 620–628. [Google Scholar]
Dytso, A.; Tuninetti, D.; Devroye, N. On discrete alphabets for the two-user Gaussian interference channel with one receiver lacking knowledge of the interfering codebook. In Proceedings of the Information Theory and Applications Workshop, San Diego, CA, USA, 9–14 February 2014; pp. 1–8. [Google Scholar]
Dytso, A.; Goldenbaum, M.; Poor, H.V.; Shamai (Shitz), S. A generalized Ozarow-Wyner capacity bound with applications. In Proceedings of the IEEE International Symposium on Information Theory, Aachen, Germany, 25–30 June 2017; pp. 1058–1062. [Google Scholar]
Dytso, A.; Bustin, R.; Poor, H.V.; Shamai (Shitz), S. On additive channels with generalized Gaussian noise. In Proceedings of the IEEE International Symposium on Information Theory, Aachen, Germany, 25–30 June 2017. [Google Scholar]
Dytso, A.; Goldenbaum, M.; Shamai (Shitz), S.; Poor, H.V. Upper and lower bounds on the capacity of amplitude-constrained MIMO channels. In Proceedings of the IEEE Global Communications Conference, Singapore, 4–8 December 2017. [Google Scholar]
Peleg, M.; Sanderovich, A.; Shamai (Shitz), S. On extrinsic information of good codes operating over memoryless channels with incremental noisiness. In Proceedings of the IEEE 24th Convention of Electrical and Electronics Engineers in Israel, Eilat, Israel, 15–17 November 2006; pp. 290–294. [Google Scholar]
Merhav, N.; Guo, D.; Shamai (Shitz), S. Statistical physics of signal estimation in Gaussian noise: Theory and examples of phase transitions. IEEE Trans. Inf. Theor. 2010, 56, 1400–1416. [Google Scholar] [CrossRef]
Wyner, A.D. The wire-tap channel. Bell Lab. Tech. J. 1975, 54, 1355–1387. [Google Scholar] [CrossRef]
Csiszár, I.; Korner, J. Broadcast channels with confidential messages. IEEE Trans. Inf. Theor. 1978, 24, 339–348. [Google Scholar] [CrossRef]
Leung-Yan-Cheong, S.; Hellman, M. The Gaussian wire-tap channel. IEEE Trans. Inf. Theor. 1978, 24, 451–456. [Google Scholar] [CrossRef]
Massey, J.L. A simplified treatment of Wyner’s wire-tap channel. In Proceedings of the 21st Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 3–4 October 1983; pp. 268–276. [Google Scholar]
Bloch, M.; Barros, J. Physical-Layer Security: From Information Theory to Security Engineering; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
Bandemer, B.; El Gamal, A. Communication with disturbance constraints. IEEE Trans. Inf. Theor. 2014, 60, 4488–4502. [Google Scholar] [CrossRef]
Bustin, R.; Shamai (Shitz), S. MMSE of ‘bad’ codes. IEEE Trans. Inf. Theor. 2013, 59, 733–743. [Google Scholar] [CrossRef]
Shang, X.; Kramer, G.; Chen, B. A new outer bound and the noisy-interference sum-rate capacity for Gaussian interference channels. IEEE Trans. Inf. Theor. 2009, 55, 689–699. [Google Scholar] [CrossRef]
Motahari, A.S.; Khandani, A.K. Capacity bounds for the Gaussian interference channel. IEEE Trans. Inf. Theor. 2009, 55, 620–643. [Google Scholar] [CrossRef]
Annapureddy, V.S.; Veeravalli, V.V. Gaussian interference networks: Sum capacity in the low-interference regime and new outer bounds on the capacity region. IEEE Trans. Inf. Theor. 2009, 55, 3032–3050. [Google Scholar] [CrossRef]
Han, T.; Kobayashi, K. A new achievable rate region for the interference channel. IEEE Trans. Inf. Theor. 1981, 27, 49–60. [Google Scholar] [CrossRef]
Sato, H. The capacity of Gaussian interference channel under strong interference. IEEE Trans. Inf. Theor. 1981, 27, 786–788. [Google Scholar] [CrossRef]
Etkin, R.; Tse, D.; Wang, H. Gaussian interference channel capacity to within one bit. IEEE Trans. Inf. Theor. 2008, 54, 5534–5562. [Google Scholar] [CrossRef]
Ahlswede, R. Multi-way communication channels. In Proceedings of the IEEE International Symposium on Information Theory, Ashkelon, Israel, 25–29 June 1973; pp. 23–52. [Google Scholar]
Gherekhloo, S.; Chaaban, A.; Sezgin, A. Expanded GDoF-optimality regime of treating interference as noise in the M×2 X-channel. IEEE Trans. Inf. Theor. 2016, 63, 355–376. [Google Scholar] [CrossRef]
Cheng, R.; Verdú, S. On limiting characterizations of memoryless multiuser capacity regions. IEEE Trans. Inf. Theor. 1993, 39, 609–612. [Google Scholar] [CrossRef]
Blasco-Serrano, R.; Thobaben, R.; Skoglund, M. Communication and interference coordination. In Proceedings of the Information Theory and Applications Workshop, San Diego, CA, USA, 9–14 February 2014; pp. 1–8. [Google Scholar]
Huleihel, W.; Merhav, N. Analysis of mismatched estimation errors using gradients of partition functions. IEEE Trans. Inf. Theor. 2014, 60, 2190–2216. [Google Scholar] [CrossRef]
Dytso, A.; Bustin, R.; Tuninetti, D.; Devroye, N.; Poor, H.V.; Shamai (Shitz), S. On communications through a Gaussian noise channel with an MMSE disturbance constraint. In Proceedings of the Information Theory and Applications Workshop, San Diego, CA, USA, 1–5 February 2016; pp. 1–8. [Google Scholar]
Cover, T. Broadcast channels. IEEE Trans. Inf. Theor. 1972, 18, 2–14. [Google Scholar] [CrossRef]
Cover, T. Comments on broadcast channels. IEEE Trans. Inf. Theor. 1998, 44, 2524–2530. [Google Scholar] [CrossRef]
Gallager, R.G. Capacity and coding for degraded broadcast channels. Problemy Peredachi Informatsii 1974, 10, 3–14. [Google Scholar]
Bergmans, P. A simple converse for broadcast channels with additive white Gaussian noise. IEEE Trans. Inf. Theor. 1974, 20, 279–280. [Google Scholar] [CrossRef]
Bustin, R.; Schaefer, R.F.; Poor, H.V.; Shamai (Shitz), S. On MMSE properties of “good” and “bad” codes for the Gaussian broadcast channel. In Proceedings of the IEEE International Symposium on Information Theory, Hong Kong, China, 14–19 June 2015; pp. 14–19. [Google Scholar]
Bustin, R.; Schaefer, R.F.; Poor, H.V.; Shamai (Shitz), S. An I-MMSE based graphical representation of rate and equivocation for the Gaussian broadcast channel. In Proceedings of the IEEE Conference on Communications and Network Security, Florence, Italy, 28–30 September 2015; pp. 53–58. [Google Scholar]
Zou, S.; Liang, Y.; Lai, L.; Shamai (Shitz), S. Degraded broadcast channel: Secrecy outside of a bounded range. In Proceedings of the IEEE Information Theory Workshop, Jerusalem, Israel, 26 April–1 May 2015; pp. 1–5. [Google Scholar]
Wu, Y.; Shamai (Shitz), S.; Verdú, S. Information dimension and the degrees of freedom of the interference channel. IEEE Trans. Inf. Theor. 2015, 61, 256–279. [Google Scholar] [CrossRef]
Carleial, A. A case where interference does not reduce capacity. IEEE Trans. Inf. Theor. 1975, 21, 569–570. [Google Scholar] [CrossRef]
Costa, M.H.; El Gamal, A. The capacity region of the discrete memoryless interference channel with strong interference. IEEE Trans. Inf. Theor. 1987, 33, 710–711. [Google Scholar] [CrossRef]
El Gamal, A.; Costa, M. The capacity region of a class of deterministic interference channels. IEEE Trans. Inf. Theor. 1982, 28, 343–346. [Google Scholar] [CrossRef]
Bresler, G.; Tse, D. The two-user Gaussian interference channel: a deterministic view. Trans. Emerg. Telecommun. Technol. 2008, 19, 333–354. [Google Scholar] [CrossRef]
Telatar, E.; Tse, D. Bounds on the capacity region of a class of interference channels. In Proceedings of the IEEE International Symposium on Information Theory, Toronto, ON, Canada, 6–11 July 2008; pp. 2871–2874. [Google Scholar]
Nair, C.; Xia, L.; Yazdanpanah, M. Sub-optimality of Han-Kobayashi achievable region for interference channels. In Proceedings of the IEEE International Symposium on Information Theory, Hong Kong, China, 14–19 June 2015; pp. 2416–2420. [Google Scholar]
Tse, D. It’s easier to approximate. IEEE Infor. Theor. Soc. Newslett. 2010, 60, 6–11. [Google Scholar]
Cadambe, V.R.; Jafar, S.A. Interference alignment and degrees of freedom of the K-user interference channel. IEEE Trans. Inf. Theor. 2008, 54, 3425–3441. [Google Scholar] [CrossRef]
Huang, C.; Cadambe, V.; Jafar, S. On the capacity and generalized degrees of freedom of the X channel. arXiv 2008, arXiv:0810.4741. [Google Scholar]
Jafar, S.A. Interference alignment—A new look at signal dimensions in a communication network. Found. Trend. Commun. Inf. Theor. 2011, 7, 1–134. [Google Scholar] [CrossRef]
Bustin, R.; Liu, R.; Poor, H.V.; Shamai (Shitz), S. An MMSE approach to the secrecy capacity of the MIMO Gaussian wiretap channel. EURASIP J. Wirel. Commun. Netw. 2009, 1, 370970. [Google Scholar] [CrossRef]
Liu, R.; Liu, T.; Poor, H.V.; Shamai (Shitz), S. A vector generalization of Costa’s entropy-power inequality with applications. IEEE Trans. Inf. Theor. 2010, 56, 1865–1879. [Google Scholar]
Pérez-Cruz, F.; Rodrigues, M.R.; Verdú, S. MIMO Gaussian channels with arbitrary inputs: Optimal precoding and power allocation. IEEE Trans. Inf. Theor. 2010, 56, 1070–1084. [Google Scholar] [CrossRef]
Wu, Y.; Verdú, S. Optimal phase transitions in compressed sensing. IEEE Trans. Inf. Theor. 2012, 58, 6241–6263. [Google Scholar] [CrossRef]
Chechik, G.; Globerson, A.; Tishby, N.; Weiss, Y. Information bottleneck for Gaussian variables. J. Mach. Learn. Res. 2005, 6, 165–188. [Google Scholar]
Liu, T.; Viswanath, P. An extremal inequality motivated by multiterminal information-theoretic problems. IEEE Trans. Inf. Theor. 2007, 53, 1839–1851. [Google Scholar] [CrossRef]
Weingarten, H.; Steinberg, Y.; Shamai (Shitz), S. The capacity region of the Gaussian multiple-input multiple-output broadcast channel. IEEE Trans. Inf. Theor. 2006, 52, 3936–3964. [Google Scholar] [CrossRef]
Liu, T.; Shamai (Shitz), S. A note on the secrecy capacity of the multiple-antenna wiretap channel. IEEE Trans. Inf. Theor. 2009, 55, 2547–2553. [Google Scholar] [CrossRef]
Rini, S.; Tuninetti, D.; Devroye, N. On the capacity of the Gaussian cognitive interference channel: New inner and outer bounds and capacity to within 1 bit. IEEE Trans. Inf. Theor. 2012, 58, 820–848. [Google Scholar] [CrossRef]
Wu, Y.; Guo, D.; Verdú, S. Derivative of mutual information at zero SNR: The Gaussian-noise case. IEEE Trans. Inf. Theor. 2011, 57, 7307–7312. [Google Scholar] [CrossRef]

Figure 1. A point-to-point communication system. (a) A memoryless point-to-point channel with a transition probability

P_{Y | X}

; (b) A Gaussian point-to-point channel.

Figure 1. A point-to-point communication system. (a) A memoryless point-to-point channel with a transition probability

P_{Y | X}

; (b) A Gaussian point-to-point channel.

Figure 2. Gaps in Equations (33a) and (35) vs.

snr

.

Figure 2. Gaps in Equations (33a) and (35) vs.

snr

.

Figure 3. SNR evolution of the MMSE for

snr = 3

.

Figure 3. SNR evolution of the MMSE for

snr = 3

.

Figure 4. Wiretap Channels. (a) The Wiretap Channel; (b) The Gaussian Wiretap Channel.

Figure 5. The above figure depicts the behavior of

{mmse}_{\infty} (X; γ)

as a function of

γ

assuming

d_{\max}

(in dotted blue), the behavior

{mmse}_{\infty} (X; γ | W^{s})

assuming complete secrecy (in dashed red) and the behavior of

{mmse}_{\infty} (X; γ | W)

for some arbitrary code of rate above secrecy capacity and below point-to-point capacity (in dash-dot black). We mark twice the rate as the area between

{mmse}_{\infty} (X; γ)

and

{mmse}_{\infty} (X; γ | W)

(in magenta). Parameters are

{snr}_{0} = 2

and

snr = 2.5

.

Figure 5. The above figure depicts the behavior of

{mmse}_{\infty} (X; γ)

as a function of

γ

assuming

d_{\max}

(in dotted blue), the behavior

{mmse}_{\infty} (X; γ | W^{s})

assuming complete secrecy (in dashed red) and the behavior of

{mmse}_{\infty} (X; γ | W)

for some arbitrary code of rate above secrecy capacity and below point-to-point capacity (in dash-dot black). We mark twice the rate as the area between

{mmse}_{\infty} (X; γ)

and

{mmse}_{\infty} (X; γ | W)

(in magenta). Parameters are

{snr}_{0} = 2

and

snr = 2.5

.

Figure 6. Channels with disturbance constraints. (a) A point-to-point channel with a disturbance constraint; (b) A Gaussian point-to-point channel with the disturbance constraint.

Figure 7. Plot of

\frac{C_{\infty} (snr, {snr}_{0}, β)}{\frac{1}{2} \log (1 + snr)}

vs.

snr

in dB, for

β = 0.01

,

{snr}_{0} = 5 = 6.989

dB.

Figure 7. Plot of

\frac{C_{\infty} (snr, {snr}_{0}, β)}{\frac{1}{2} \log (1 + snr)}

vs.

snr

in dB, for

β = 0.01

,

{snr}_{0} = 5 = 6.989

dB.

Figure 8. Upper bounds on

M_{n} (snr, {snr}_{0}, β)

vs.

snr

. (a) For

{snr}_{0} = 5

and

β = 0.01

. Here

n = 1

; (b) For

{snr}_{0} = 5

and

β = 0.05

. Several values of n.

Figure 8. Upper bounds on

M_{n} (snr, {snr}_{0}, β)

vs.

snr

. (a) For

{snr}_{0} = 5

and

β = 0.01

. Here

n = 1

; (b) For

{snr}_{0} = 5

and

β = 0.05

. Several values of n.

Figure 9. Upper and lower bounds on

M_{1} (snr, {snr}_{0}, β)

vs.

snr

, for

β = 0.01

,

{snr}_{0} = 10

.

Figure 9. Upper and lower bounds on

M_{1} (snr, {snr}_{0}, β)

vs.

snr

, for

β = 0.01

,

{snr}_{0} = 10

.

Figure 10. Upper and lower bounds on

C_{n = 1} (snr, {snr}_{0}, β)

vs.

snr

, for

β = 0.001

and

{snr}_{0} = 60 = 17.6815

dB.

Figure 10. Upper and lower bounds on

C_{n = 1} (snr, {snr}_{0}, β)

vs.

snr

, for

β = 0.001

and

{snr}_{0} = 60 = 17.6815

dB.

Figure 11. Two-receiver broadcast channel. (a) A general BC; (b) A Gaussian BC.

Figure 12. In the above figure we consider the SNR-evolution of

{mmse}_{\infty} (X; γ)

(in dashed blue) and

{mmse}_{\infty} (X; γ | W_{2})

(in solid red) required from an asymptotically capacity achieving code sequence for the Gaussian BC (rate on the boundary of the capacity region). Twice

R_{2}

is marked as the area between these two functions (in magenta). The parameters are

{snr}_{1} = 2.5

,

{snr}_{2} = 2

, and

α = 0.4

.

Figure 12. In the above figure we consider the SNR-evolution of

{mmse}_{\infty} (X; γ)

(in dashed blue) and

{mmse}_{\infty} (X; γ | W_{2})

(in solid red) required from an asymptotically capacity achieving code sequence for the Gaussian BC (rate on the boundary of the capacity region). Twice

R_{2}

is marked as the area between these two functions (in magenta). The parameters are

{snr}_{1} = 2.5

,

{snr}_{2} = 2

, and

α = 0.4

.

Figure 13. The above figure depicts a general transmission of

(W_{1}, W_{2}, W_{3})

independent messages, each required to be reliably decoded at the respective SNR

({snr}_{1}, {snr}_{2}, {snr}_{3}) = (1 / 2, 1, 3 / 2)

. The rates are defined by the areas. (a) We observe that due to reliable decoding, the respective conditional MMSE converges to the MMSE; (b) we examine the same transmission as in (a), however here we observe the respective rates. The rates are defined by the areas. As an example we mark

2 R_{2}

- twice the rate of message

W_{2}

. Similarly one can mark the other rates

2 R_{1}

and

2 R_{3}

.

Figure 13. The above figure depicts a general transmission of

(W_{1}, W_{2}, W_{3})

independent messages, each required to be reliably decoded at the respective SNR

({snr}_{1}, {snr}_{2}, {snr}_{3}) = (1 / 2, 1, 3 / 2)

. The rates are defined by the areas. (a) We observe that due to reliable decoding, the respective conditional MMSE converges to the MMSE; (b) we examine the same transmission as in (a), however here we observe the respective rates. The rates are defined by the areas. As an example we mark

2 R_{2}

- twice the rate of message

W_{2}

. Similarly one can mark the other rates

2 R_{1}

and

2 R_{3}

.

Figure 14. The above figure depicts a general transmission of independent messages

(W_{1}, W_{2}, W_{3})

, each required to be reliably decoded at the respective SNR

({snr}_{1}, {snr}_{2}, {snr}_{3}) = (1 / 2, 1, 3 / 2)

. Here we denote two equivocation measures

2 H (W_{2} | Y ({snr}_{1}))

and

2 H (W_{3} | Y ({snr}_{2}))

according to Theorem 15.

Figure 14. The above figure depicts a general transmission of independent messages

(W_{1}, W_{2}, W_{3})

, each required to be reliably decoded at the respective SNR

({snr}_{1}, {snr}_{2}, {snr}_{3}) = (1 / 2, 1, 3 / 2)

. Here we denote two equivocation measures

2 H (W_{2} | Y ({snr}_{1}))

and

2 H (W_{3} | Y ({snr}_{2}))

according to Theorem 15.

Figure 15. Two user interference channels. (a) A general interference channel; (b) The Gaussian interference channel.

Figure 16. gDoF of the G-IC.

Table 1. Parameters of the mixed input in (71) used in the proof of Proposition 11.

Regime	Input Parameters
Weak Interference ( $snr \geq {snr}_{0}$ )	$N = ⌊\sqrt{1 + c_{1} \frac{(1 - δ) {snr}_{0}}{1 + δ {snr}_{0}}}⌋,$ $c_{1} = \frac{3}{2 \log (\frac{12 (1 - δ) (1 + β {snr}_{0})}{(1 + {snr}_{0} δ) (β - δ)})}$ , $δ = β \frac{{snr}_{0}}{1 + {snr}_{0}}$
Strong Interference ( $snr \leq {snr}_{0}$ )	$N = ⌊\sqrt{1 + c_{2} snr}⌋$ , $c_{2} = \frac{3}{2 \log (\frac{12 (1 + β {snr}_{0})}{β})}$ , $δ = 0$

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dytso, A.; Bustin, R.; Poor, H.V.; Shamai, S. A View of Information-Estimation Relations in Gaussian Networks. Entropy 2017, 19, 409. https://doi.org/10.3390/e19080409

AMA Style

Dytso A, Bustin R, Poor HV, Shamai S. A View of Information-Estimation Relations in Gaussian Networks. Entropy. 2017; 19(8):409. https://doi.org/10.3390/e19080409

Chicago/Turabian Style

Dytso, Alex, Ronit Bustin, H. Vincent Poor, and Shlomo Shamai (Shitz). 2017. "A View of Information-Estimation Relations in Gaussian Networks" Entropy 19, no. 8: 409. https://doi.org/10.3390/e19080409

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A View of Information-Estimation Relations in Gaussian Networks

Abstract

1. Introduction

1.1. Notation

2. Estimation and Information Theoretic Tools

2.1. Estimation Theoretic Measures

2.2. Mutual Information and the I-MMSE

2.3. Single Crossing Point Property

2.4. Complementary SCPP Bounds

2.5. Bounds on Differential Entropy

3. Point-to-Point Channels

3.1. A Gaussian Point-to-Point Channel

3.2. Generalized Ozarow-Wyner Bound

3.3. SNR Evolution of Optimal Codes

4. Applications to the Wiretap Channel

4.1. Converse of the Gaussian Wiretap Channel

4.2. SNR Evolution of Optimal Wiretap Codes

5. Communication with a Disturbance Constraint

5.1. Max-I Problem

5.2. Characterization of C n ( snr , snr 0 , β ) as n → ∞

5.3. Proof of the Disturbance Constraint Problem with a Mutual Information Constraint

5.4. Max-MMSE Problem

5.5. Mixed Inputs

6. Applications to the Broadcast Channel

6.1. Converse for the Gaussian Broadcast Channel

6.2. SNR Evolution of Optimal BC Codes

7. Multi-Receiver SNR-Evolution

8. Interference Channels

8.1. Gaussian Interference Channel

8.2. Generalized Degrees of Freedom

8.3. Treating Interference as Noise

9. Concluding Remarks and Future Directions

Acknowledgments

Author Contributions

Conflicts of Interest

Appendix A. Proof of Proposition 3

Appendix B. Proof of Proposition 5

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

5.2. Characterization of $C_{n} (snr, {snr}_{0}, β)$ as $n \to \infty$