Source Coding with a Causal Helper

Bross, Shraga I.

doi:10.3390/info11120553

Open AccessArticle

Source Coding with a Causal Helper

by

Shraga I. Bross

Faculty of Engineering, Bar-Ilan University, Ramat-Gan 52900, Israel

Information 2020, 11(12), 553; https://doi.org/10.3390/info11120553

Submission received: 9 November 2020 / Revised: 23 November 2020 / Accepted: 25 November 2020 / Published: 27 November 2020

(This article belongs to the Special Issue Statistical Communication and Information Theory)

Download

Browse Figure

Versions Notes

Abstract

:

A multi-terminal network, in which an encoder is assisted by a side-information-aided helper, describes a memoryless identically distributed source to a receiver, is considered. The encoder provides a non-causal one-shot description of the source to both the helper and the receiver. The helper, which has access to causal side-information, describes the source to the receiver sequentially by sending a sequence of causal descriptions depending on the message conveyed by the encoder and the side-information subsequence it has observed so far. The receiver reconstructs the source causally by producing on each time unit an estimate of the current source symbol based on what it has received so far. Given a reconstruction fidelity measure and a maximal allowed distortion, we derive the rates-distortion region for this setting and express it in terms of an auxiliary random variable. When the source and side-information are drawn from an independent identically distributed Gaussian law and the fidelity measure is the squared-error distortion we show that for the evaluation of the rates-distortion region it suffices to choose the auxiliary random variable to be jointly Gaussian with the source and side-information pair.

Keywords:

source coding; causal helper; causal side-information

1. Introduction

In the classical source coding with decoder side information problem, the source and side information are generated by independent drawings

(X_{k}, Y_{k})

of the pair

(X, Y) \sim P_{X Y}

. The encoder forms a description of the source sequence

X^{n} = (X_{1}, \dots, X_{n})

using a map

f^{(n)} : X^{n} \to {1, \dots, ⌊ 2^{n R} ⌋}

, while the decoder forms its reconstruction

{\hat{X}}^{n}

depending on both the side-information sequence

Y^{n}

and the index

T \in {1, \dots, ⌊ 2^{n R} ⌋}

conveyed by the encoder. In their seminal work [1], Wyner and Ziv derived the rate distortion function for this setting, given a fidelity measure, when

{\hat{X}}^{n}

can depend on

Y^{n}

in an arbitrary manner. Yet, Wyner–Ziv source coding with non-causal decoder side-information involves binning the implementation of which is complex.

A successive refinement for the Wyner–Ziv problem with side-information

(Y, Z)

is a variant of the Wyner–Ziv model, in which the encoder provides a two-layer description of the source sequence

X^{n}

to a pair of decoders. Decoder 1, which obtains just the course description layer, has available as non-causal side-information the memoryless vector

Z^{n}

, while Decoder 2, which obtains both description layers, has available as non-causal side-information the memoryless vector

Y^{n}

. It is further assumed that the reconstruction formed by Decoder 2 should be of smaller distortion compared to that formed by Decoder 1. Such a model has been considered in [2], wherein a complete characterization of the rates-distortion region has been obtained for the case where Z is stochastically degraded to Y with respect to X—i.e.,

X \leftrightarrow Y \leftrightarrow Z

forms a Markov chain.

The work [3] studies an extension of the model of [2] where a conference link of given capacity allows the unidirectional cooperation between Decoder 1 and Decoder 2—i.e., Decoder 1 functions also as a helper. The results in [3] are partially tight in the sense that the characterization of the encoder rates is conclusive, the remaining gap being in the characterization of the helper’s rate. Thus, with non-causal side-information at both decoders, the successive refinement for the Wyner–Ziv problem with a helper is yet unsolved.

Motivated by practical delay-constrained sequential source coding with decoder side information Weissman and El Gamal considered in [4], a scheme with causal side information at the decoder, where the sequence of reconstructions

{\hat{X}}^{n} = ({\hat{X}}_{1}, \dots, {\hat{X}}_{n})

is formed sequentially in a causal manner according to

{\hat{X}}_{k} = {\hat{X}}_{k} (T, Y^{k})

, and derived the corresponding rate distortion function. Similar to [1], the rate distortion function in [4] is expressed in terms of an auxiliary random variable, thus leaving the optimal choice of which an open issue depending on the specifics of the model. For the Gaussian setting where

(X, Y)

are Gaussian, the authors in [4] compute an upper bound on the rate distortion function by choosing the auxiliary random variable to be jointly Gaussian with X, while leaving the question of whether this choice is optimal yet an open problem.

With the vision that modern network design will support the use of cooperation links in favor of reduction of encoding/decoding complexity and network deployment constraints, this work considers an extension of the model [4], involving a causal helper and causal side information at the decoder, which is described as follows. The components of a trivariate independent identically distributed (IID) finite-alphabet source

{\{X_{k}, Y_{k}, Z_{k}\}}_{k = 1}^{\infty}

are observed by three terminals. The source component

{\{X_{k}\}}_{k = 1}^{\infty}

is observed by the encoder, while the source component

{\{Y_{k}\}}_{k = 1}^{\infty}

is observed by the helper. Both the encoder and the helper describe the length-n source sequence

X^{n}

according to a given fidelity to the decoder in two steps. First, the encoder when given

X^{n}

sends a rate-R description of it to both the helper and the decoder. Then, the helper sends to the decoder, per each source symbol

Y_{k}

, a causal description depending on the message it had received from the encoder and the source subsequence

Y^{k}

it had observed so far, with the aggregate rate not exceeding

R_{h}

. The decoder, which observes the source component

{\{Z_{k}\}}_{k = 1}^{\infty}

in a causal manner, per channel use k, uses the descriptions it had received so far and the source subsequence

Z^{k}

to form its reconstruction

{\hat{X}}_{k}

for the source symbol

X_{k}

. Given a fidelity measure and a maximal allowed distortion, the goal is to determine the set of all rate pairs

(R, R_{h})

that satisfy the distortion constraint.

Causal decoder side information has been considered as well in context with successive refinement. With the aim of reducing encoder/decoder complexity, a two layer description model with successive refinement has been considered in [5], under the setting that the side information is available causally at each of the decoders. A single-letter characterization of the rates-distortion region is obtained in [5], irrespective of the relative ordering of the side information quality at the decoders. Furthermore, the direct part in [5] demonstrates that similarly to [4] with causal side-information at the decoders the optimal code avoids binning, hence its implementation is practically appealing. The extension of the model [5] with a causal helper has recently been studied in [6].

2. Problem Formulation

Formally, our problem can be stated as follows. A discrete memoryless source (DMS)

(X, P_{X})

is an infinite sequence

{X_{i}}_{i = 1}^{\infty}

of independent copies of a random variable X taking values in the finite set

X

with generic law

P_{X}

. Similarly, a triple source

(X Y Z, P_{X Y Z})

is an infinite sequence of independent copies of the triplet of random variables

(X, Y, Z)

, taking values in the finite sets

X, Y

and

Z

, respectively, with generic joint law

P_{X Y Z}

. Since our goal is to encode the source X, let

\hat{X}

denote any finite reconstruction alphabet and let

d : X \times \hat{X} \to [0, \infty)

be a single-letter distortion measure. The vector distortion measure is defined as

d (x, \hat{x}) = \frac{1}{n} \sum_{i = 1}^{n} d (x_{i}, {\hat{x}}_{i}), \forall x \in X^{n}, \hat{x} \in {\hat{X}}^{n} .

A system diagram appears in Figure 1.

Definition 1.

An

(n, M^{(n)}, \prod_{k = 1}^{n} L_{k}^{(n)}, D)

code for the source X with causal side-information (SI)

(Y, Z)

and causal helper, consists of:

1.: An encoder mapping

$\begin{matrix} f^{(n)} : X^{n} \to {1, \dots, M^{(n)}} . \end{matrix}$

(1)
2.: A unidirectional conference between the helper and the decoder consisting of a sequence of causal descriptions

$\begin{matrix} f_{1, k}^{(n)} : {1, \dots, M^{(n)}} \times Y^{k} \to {1, \dots, L_{k}^{(n)}}, k = 1, \dots, n \end{matrix}$

(2)

and
3.: A sequence $g_{1}^{(n)}, \dots, g_{n}^{(n)}$ of decoder reconstructions

$\begin{matrix} g_{k}^{(n)} : {1, \dots, M^{(n)}} \times {1, \dots, L_{1}^{(n)}} \times \dots \times {1, \dots, L_{k}^{(n)}} \times Z^{k} \to \hat{X}, k = 1, \dots, n \end{matrix}$

(3)

such that

$\begin{matrix} E d [X^{n}, (g_{1}^{(n)} (f^{(n)} (X^{n}), f_{1, 1}^{(n)} (f^{(n)} (X^{n}), Y_{1}), Z_{1}), \dots, \\ g_{k}^{(n)} (f^{(n)} (X^{n}), {\{f_{1, j}^{(n)} (f^{(n)} (X^{n}), Y^{j})\}}_{j = 1}^{k}, Z^{k}), \dots, \\ g_{n}^{(n)} (f^{(n)} (X^{n}), {\{f_{1, j}^{(n)} (f^{(n)} (X^{n}), Y^{j})\}}_{j = 1}^{n}, Z^{n}))] \leq D . \end{matrix}$

(4)

The rate tuple

(R, R_{h})

of the code is

\begin{matrix} R & = & \frac{1}{n} log M^{(n)} \\ R_{h} & = & \frac{1}{n} \sum_{k = 1}^{n} log L_{k}^{(n)} . \end{matrix}

(5)

Given a non-negative distortion D, the tuple

(R, R_{h})

is said to be D-achievable for X with causal SI

(Y, Z)

if, for any

δ > 0

,

ϵ > 0

, and sufficiently large n, there exists an

(n, exp [n (R + δ)], exp [n (R_{h} + δ)], D + ϵ)

code for the source X with causal SI

(Y, Z)

and causal helper. The collection of all D-achievable rate tuples is the achievable source-coding region and is denoted by

R (D)

.

In this work, we provide a single-letter characterization for

R (D)

. In contrast to [5], a consequence of Definition 1 is that

R (D)

may depend on the joint law of the triple

P_{X Y Z}

, not only through the marginal laws

P_{X Y}

and

P_{X Z}

. This is due to the fact that the decoder acquires, in addition to its private side information Z, some additional side information on Y via the conference link. As a result, the expectation in (4), which takes into account the mapping

g_{k}^{(n)} (\cdot)

, is taken over the joint law

P_{X Y Z}

and not just over the marginal law

P_{X Z}

, as is the case in [5].

Finally, although not of the finite alphabet, of particular interest to us is the Gaussian source. This is a memoryless source, where

(X, Y, Z)

are centered jointly Gaussians with each pair

(X_{i}, Y_{i})

drawn such that

P_{X Y}

satisfies

\begin{matrix} Y_{i} = \sqrt{ρ} X_{i} + W_{i} \end{matrix}

(6)

where

ρ > 0

is a fixed constant,

X_{i} \sim N (0, 1)

and

W_{i} \sim N (0, 1)

are mutually independent. Moreover,

Z_{i}

is drawn according to

\begin{matrix} Z_{i} = a X_{i} + b Y_{i} + T_{i} \end{matrix}

(7)

where a and b are real numbers and

T_{i} \sim N (0, 1)

is independent of

(X_{i}, Y_{i})

. Furthermore, in this case, the fidelity measure will be

E d [X^{n}, {\hat{X}}^{n}] ≜ \frac{1}{n} \sum_{i = 1}^{n} E [{(X_{i} - {\hat{X}}_{i})}^{2}],

(8)

in which case we may restrict the reproduction functions

g_{k}^{(n)}

to be the MMSE estimates of

X_{k}, k = 1, \dots, n

given

f^{(n)} (X^{n})

, the sequence of causal descriptions

{\{f_{1, j}^{(n)} (f^{(n)} (X^{n}), Y^{j})\}}_{j = 1}^{k}

, and the side-information

Z^{k}

. That is,

\begin{matrix} {\hat{X}}_{k} = E [X_{k} |f^{(n)} (X^{n}), {\{f_{1, j}^{(n)} (f^{(n)} (X^{n}), Y^{j})\}}_{j = 1}^{k}, Z^{k}] . \end{matrix}

In the Gaussian network setting, our focus will be on determining the optimal choice of the auxiliary random variable by means of which the rates-distortion region is defined. Specifically, we will show that choosing it to be jointly Gaussian with X is optimal.

3. Main Results

Given a maximal allowed distortion D, define

R^{*} (D)

to be the set of all rate pairs

(R, R_{h})

for which there exist random variables

(U, V)

, taking values in finite alphabets

U, V

, respectively, such that

$U \leftrightarrow X \leftrightarrow (Y, Z)$ forms a Markov chain.
Conditioned on U, $X \leftrightarrow Y \leftrightarrow V$ forms a Markov chain.
There exist deterministic maps

$\begin{matrix} {\tilde{f}}_{1} & : & U \times Y \to V \\ g & : & U \times V \times Z \to \hat{X} \end{matrix}$

(9)

such that, with $V ≜ {\tilde{f}}_{1} (U, Y)$ ,

$E d (X, g (U, V, Z)) \leq D .$

(10)
The alphabets $U, V$ satisfy

$\begin{matrix} | U | & \leq | X | + 2 \\ | V | & \leq | X | (| X | + 2) + 1 . \end{matrix}$

(11)
The rates R and $R_{h}$ satisfy

$\begin{matrix} R & \geq & I (X; U) \end{matrix}$

(12a)

$\begin{matrix} R_{h} & \geq & H (V | U) = I (Y; V | U) . \end{matrix}$

(12b)

Our first result is a single-letter characterization of the rates-distortion region.

Theorem 1.

R (D) = R^{*} (D)

.

Proof.

See Section 4.1. □

Remark 1.

The converse holds as well for the setting where the causal helper and the reconstructor benefit from causal disclosure—i.e., are cognizant of the past realizations of the source sequence, hence they are allowed to depend also on

X^{k - 1}

, when forming

f_{1, k}^{(n)}

and

{\hat{X}}_{k}

, respectively. That is,

\begin{matrix} f_{1, k}^{(n)} & = & f_{1, k}^{(n)} (f^{(n)} (X^{n}), Y^{k}, X^{k - 1}), \\ {\hat{X}}_{k} & = & g_{k}^{(n)} (f^{(n)} (X^{n}), {\{f_{1, j}^{(n)} (f^{(n)} (X^{n}), Y^{i}, X^{i - 1})\}}_{j = 1}^{k}, Z^{k}, X^{k - 1}) . \end{matrix}

(13)

3.1. The Gaussian Setting with $Z = \emptyset$ .

To simplify the presentation, we consider first the Gaussian setting with

Z = \emptyset

. In this case, the region

R (D)

is defined as the set of rate pairs

(R, R_{h})

for which there exist random variables

(U, V)

, taking real values, such that

$U \leftrightarrow X \leftrightarrow Y$ forms a Markov chain.
Conditioned on U, $X \leftrightarrow Y \leftrightarrow V$ forms a Markov chain.
There exist deterministic maps

$\begin{matrix} {\tilde{f}}_{1} & : & U \times Y \to V \\ g & : & U \times V \to \hat{X} \end{matrix}$

(14)

such that, with $V ≜ {\tilde{f}}_{1} (U, Y)$ ,

$D \geq E [{(X - g (U, V))}^{2}] ≜ σ_{X | U V}^{2} .$

(15)
The rates R and $R_{h}$ satisfy (12a) and (12b), respectively.

Our second result characterizes the optimal choice of

P_{X U}

in the Gaussian setting.

Theorem 2.

For the evaluation of

R (D)

when

(X, Y)

are Gaussian, it suffices to assume that

(X, U)

are jointly Gaussian.

Proof.

For the treatment to follow, let us define the Gaussian channel

Y = \sqrt{ϱ} X + W

where X has an arbitrary law with

E [X^{2}] \leq 1

,

W \sim N (0, 1)

is independent of X, and

ϱ > 0

is the channel signal-to-noise ratio. We shall denote by

G_{Y | X}^{(ϱ)}

the conditional law of Y given X for this additive Gaussian model. Furthermore, the notation

(U_{n}, X_{n}, Y_{n}) \sim P_{U_{n} X_{n}} G_{Y_{n} | X_{n}}^{(ϱ)}

would imply that

(U_{n}, X_{n}) \sim P_{U_{n} X_{n}}

and

Y_{n} = \sqrt{ϱ} X_{n} + W_{n}

with

W_{n} \sim N (0, 1)

independent of

(U_{n}, X_{n})

. Using this notation, the law

P_{U X Y V}

defining

R (D)

(see also (45) ahead) may equivalently be expressed as

P_{X U Y V} = P_{X U} G_{Y | X}^{(ϱ)} P_{V | Y U} .

(16)

Henceforth, we denote a law which factors as in (16) by

{X \leftrightarrow Y \leftrightarrow V |}_{U}

—i.e., conditioned on U,

X \leftrightarrow Y \leftrightarrow V

forms a Markov chain and, furthermore,

P_{Y | X} = G_{Y | X}^{(ϱ)}

independently of the rest.

Define the region (12a) and (12b) subject to constraint (15) by

O_{K}

, where the subscript K denotes the covariance constraint (15). The region

O_{K}

is a closed convex set.

In line with [7,8], we define a

λ

-parametrized family of functions which are related to the sum rate associated with

R (D)

.

Fix some

λ > 1

, and consider the minimization of the

λ

-sum rate defined as

min_{(R, R_{h}) \in O_{K}} R + λ R_{h} .

(17)

Observe that

\begin{matrix} min_{(R, R_{h}) \in O_{K}} R + λ R_{h} \overset{(a)}{\geq} inf_{\binom{P_{X U Y V} {: X \leftrightarrow Y \leftrightarrow V |}_{U}}{σ_{X | U V}^{2} \leq D}} I (X; U) + λ I (Y; V | U), \end{matrix}

(18)

where

(a)

follows using the lower bounds (12a) and (12b). Since the marginal law of X in our model (6) is Gaussian, the differential entropy

h (X)

is fixed, thus for the minimization of (18) over a law of the form (16), we define the following functional of

P_{X U}

(i.e., of the conditional law

P_{X | U}

)

\begin{matrix} s_{λ} (X, ϱ | U) ≜ - h (X | U) + inf_{\binom{{V : X \leftrightarrow Y \leftrightarrow V |}_{U}}{σ_{X | U V}^{2} \leq D}} λ I (Y; V | U) . \end{matrix}

(19)

Thus, the minimum in (17) may be expressed as

V_{λ} (ϱ) ≜ inf_{P_{X U} : E [X^{2}] \leq 1} s_{λ} (X, ϱ | U) .

(20)

Henceforth, the set of laws

P_{U X Y V}

defined by (16) which satisfy

σ_{X | U V}^{2} \leq D

will be denoted by

Q

and will be attributed as the feasible set.

As shown below, with a proper choice of

λ

, the functional (19) exhibits a “pair grouping” property with respect to the input X in the sense that the value of

s_{λ}

does not increase under this operation. Having established that, we follow the same steps as in the proof of ([9] Theorem 9) to establish that the objective (20) is attained when

P_{X | U}

is Gaussian. More specifically,

Lemma 1 shows that the value of $s_{λ}$ “improves” under the pair grouping operation.
With the proper time-sharing of two distributions attaining the infimum in (20) and satisfying the extremal property defined in (23) ahead, Lemma 2 proves that the pair grouping operation exhibits Gaussianity in the sense of Bernstein’s characterization.

Lemma 1.

Let

P_{X U}

be a law on

X \times U

, let

P_{U X Y} = P_{X U} G_{Y | X}^{(ϱ)}

, and let

(U_{1}, X_{1}, Y_{1})

and

(U_{2}, X_{2}, Y_{2})

be two independent copies of

(U, X, Y)

. Define

\begin{matrix} X_{+} ≜ \frac{X_{1} + X_{2}}{\sqrt{2}}, X_{-} ≜ \frac{X_{1} - X_{2}}{\sqrt{2}} \end{matrix}

(21)

and similarly

\begin{matrix} Y_{+} ≜ \frac{Y_{1} + Y_{2}}{\sqrt{2}}, Y_{-} ≜ \frac{Y_{1} - Y_{2}}{\sqrt{2}} . \end{matrix}

with

U = (U_{1}, U_{2})

, there exists some

λ^{*} > 1

such that for any

λ \geq λ^{*}

, the following inequalities hold

\begin{matrix} 2 s_{λ} (X, ϱ | U) & \geq & s_{λ} (X_{+}, ϱ | X_{-}, U) + s_{λ} (X_{-}, ϱ | Y_{+}, U) \end{matrix}

(22a)

\begin{matrix} 2 s_{λ} (X, ϱ | U) & \geq & s_{λ} (X_{+}, ϱ | Y_{-}, U) + s_{λ} (X_{-}, ϱ | X_{+}, U) . \end{matrix}

(22b)

Proof.

See Section 4.2. □

Assume that the infimum in (20) is attained, and let

P

denote the subset of laws

P_{X U}

achieving the minimum. Suppose further that there exists a law

P_{X U} \in P

such that, for

(Y, X, U) \sim P_{X U} G_{Y | X}^{(ϱ)}

, for any other

P_{X^{'} U^{'}} \in P

where

(Y^{'}, X^{'}, U^{'}) \sim P_{X^{'} U^{'}} G_{Y^{'} | X^{'}}^{(ϱ)}

h (Y | U) - h (X | U) \leq h (Y^{'} | U^{'}) - h (X^{'} | U^{'}) .

(23)

Denote the value of the LHS of (23) by

g^{*} (ϱ)

.

Lemma 2.

Fix

ϵ > 0

. Let

P_{X U}

be an admissible law such that

\begin{matrix} s_{λ} (X, ϱ | U) & \leq & V_{λ} (ϱ) + ϵ \end{matrix}

(24a)

\begin{matrix} h (Y | U) - h (X | U) & \leq & g^{*} (ϱ) + ϵ . \end{matrix}

(24b)

There exists a law

(Y^{'}, X^{'}, U^{'}) \sim P_{X^{'} U^{'}} G_{Y^{'} | X^{'}}^{(ϱ)}

satisfying

s_{λ} (X^{'}, ϱ | U^{'}) \leq V_{λ} (ϱ) + 2 ϵ

(25)

and

\begin{matrix} h (Y^{'} | U^{'}) - h (X^{'} | U^{'}) + \frac{1}{2} I (X_{1} + X_{2}; X_{1} - X_{2} | Y_{1}, Y_{2}, U_{1}, U_{2}) \leq g^{*} (ϱ) + ϵ, \end{matrix}

(26)

where

(U_{1}, X_{1}, Y_{1})

and

(U_{2}, X_{2}, Y_{2})

denote independent copies of

(U, X, Y)

.

Proof.

See Section 4.3. □

Inequality (26) combined with assumption (23) and Lemma 6 in ([9] Section VI.B) establish the following.

Lemma 3.

There exists a sequence

{X_{n}, U_{n}}

, such that for each

n \geq 1

,

(X_{n}, U_{n})

is feasible,

lim_{n \to \infty} s_{λ} (X_{n}, ϱ | U_{n}) = V_{λ} (ϱ)

(27)

and there exists a feasible law

P_{X_{*} U_{*}}

on

R \times U

such that

(X_{n}, U_{n}) \overset{D}{\to} (X_{*}, U_{*}) \sim P_{X_{*} U_{*}}

and with

U_{n} ≜ (U_{1, n}, U_{2, n})

,

Y_{n} ≜ (Y_{1, n}, Y_{2, n})

, where the tuple

(U_{1, n}, X_{1, n}, Y_{1, n})

and the tuple

(U_{2, n}, X_{2, n}, Y_{2, n})

are two independent copies of

(U_{n}, X_{n}, Y_{n}) \sim P_{U_{n} X_{n}} G_{Y_{n} | X_{n}}^{(ϱ)}

, we have

\begin{matrix} \underset{n \to \infty}{lim inf} I (X_{1, n} + X_{2, n}; X_{1, n} - X_{2, n} | Y_{n}, U_{n} = u) = 0 for P_{U_{*}} \times P_{U_{*}} - a . e u . \end{matrix}

(28)

Finally, we apply to identity (28) the extended form of Bernstein’s theorem (see [10,11]), which is proved in ([9] Appendix A), to conclude that

X_{1, *} |_{{U_{1, *} = u_{1}}}

(i.e., the conditional random variable:

X_{1, *}

conditioned on

U_{1, *} = u_{1}

) and

X_{2, *} |_{{U_{2, *} = u_{2}}}

are independent Gaussian random variables of equal variance.

Since the marginal of X is Gaussian, with no loss in generality, we may choose

(X, U)

to be jointly Gaussian, which establishes our claim in Theorem 2. □

3.2. The Gaussian Setting with Decoder Side-Information Z

In the Gaussian setting where the decoder side-information Z in non-void, using our previous definition

Y = \sqrt{ρ} X + W

, let

Z = a X + b Y + T

(29)

for a pair of real numbers

(a, b)

where

T \sim N (0, 1)

is independent of

(X, Y)

and U. For arbitrary X with

E [X^{2}] \leq 1

, let us define by

{\tilde{G}}_{Y Z | X}^{(ϱ, a, b)}

the conditional law obtained by forming the pair

(Y, Z)

, when given X, via the additive independent Gaussian pair

(W, T)

as described above.

The rates-distortion region

R (D)

is defined in Theorem 1 with

E d (X, g (U, V, Z)) = E [{(X - g (U, V, Z))}^{2}] ≜ σ_{X | U V Z}^{2} \leq D

(30)

and it is evaluated over a law of the form

P_{X U Y Z V} = P_{X U} {\tilde{G}}_{Y Z | X}^{(ϱ, a, b)} P_{V | Y U} .

(31)

Consequently, for the minimization of (20), we consider the functional

\begin{matrix} s_{λ} (X, ϱ | U) ≜ - h (X | U) + inf_{\binom{{V : X \leftrightarrow Y \leftrightarrow V |}_{U}}{σ_{X | U V Z}^{2} \leq D}} λ I (Y; V | U) . \end{matrix}

(32)

under a law of the form (31). Next, in Lemma 1, for

P_{X U}

, a law on

X \times U

and

P_{U X Y Z} = P_{X U} {\tilde{G}}_{Y Z | X}^{(ϱ, a, b)}

we let

(U_{1}, X_{1}, Y_{1}, Z_{1})

and

(U_{2}, X_{2}, Y_{2}, Z_{2})

be two independent copies of

(U, X, Y, Z)

.

Upon defining

(X_{-}, X_{+})

and

(Y_{-}, Y_{+})

as before, we also define

\begin{matrix} Z_{+} ≜ \frac{Z_{1} + Z_{2}}{\sqrt{2}}, Z_{-} ≜ \frac{Z_{1} - Z_{2}}{\sqrt{2}} . \end{matrix}

It can be verified that

\begin{matrix} Z_{+} = a X_{+} + b Y_{+} + \frac{1}{\sqrt{2}} (T_{1} + T_{2}) ≜ a X_{+} + b Y_{+} + T_{+} \\ Z_{-} = a X_{-} + b Y_{-} + \frac{1}{\sqrt{2}} (T_{1} - T_{2}) ≜ a X_{-} + b Y_{-} + T_{-} \end{matrix}

(33)

where

T_{-}

and

T_{+}

are independent. Furthermore, the pair

(T_{-}, T_{+})

is independent of

(U, X_{-}, X_{+}, Y_{-}, Y_{+})

and is equal in distribution to the pair

(T_{1}, T_{2})

. Thus, the simultaneous unitary transformation

(Y_{1}, Y_{2}) \mapsto (Y_{-}, Y_{+})

and

(Z_{1}, Z_{2}) \mapsto (Z_{-}, Z_{+})

preserves the Gaussian nature of the channel and factors according to

P_{U X_{-} X_{+} Y_{-} Y_{+} Z_{-} Z_{+} V} = P_{X_{-} X_{+} U} {\tilde{G}}_{Y_{-} Z_{-} | X_{-}}^{(ϱ, a, b)} {\tilde{G}}_{Y_{+} Z_{+} | X_{+}}^{(ϱ, a, b)} P_{V | U Y_{-} Y_{+}} .

(34)

Observe that with the choice of

U ≜ U = (U_{1}, U_{2})

and

V = (V_{1}, V_{2})

where

(U_{1}, X_{1}, Y_{1}, Z_{1}, V_{1})

and

(U_{2}, X_{2}, Y_{2}, Z_{2}, V_{2})

are two independent copies of

(\tilde{U}, X, Y, Z, \tilde{V})

such that

X \leftrightarrow Y \leftrightarrow \tilde{V} |_{\tilde{U}}

and

σ_{X | \tilde{U} \tilde{V} Z}^{2} \leq D

–i.e.,

P_{\tilde{U} X Y Z \tilde{V}}

is feasible for the minimization of (20), we have

\begin{matrix} σ_{X_{+} | U Y_{-} Z_{+} V}^{2} & \leq & σ_{X_{+} | U V Z_{+}}^{2} \leq D \\ σ_{X_{-} | U Y_{+} Z_{-} V}^{2} & \leq & σ_{X_{-} | U V Z_{-}}^{2} \leq D . \end{matrix}

(35)

Thus, Lemma 1 holds for this setting with decoder side-information Z as well.

4. Proofs

4.1. Proof of Theorem 1

Converse: Assume that the pair

(R, R_{h})

is D-achievable. For

j = 1, \dots, n

with the convention

Z^{0} ≜ \emptyset

, define the RV’s

\begin{matrix} T & ≜ & f^{(n)} (X^{n}) \end{matrix}

(36a)

\begin{matrix} V_{j} & ≜ & f_{1, j}^{(n)} (f^{(n)} (X^{n}), Y^{j}), j = 1, \dots, n \end{matrix}

(36b)

\begin{matrix} U_{j} & ≜ & (T, X^{j - 1}, Y^{j - 1}, Z^{j - 1}), j = 1, \dots, n . \end{matrix}

(36c)

The rate R is lower bounded as follows

\begin{matrix} n R & \geq & log M^{(n)} \geq H (T) \\ = & I (X^{n}; T) = \sum_{k = 1}^{n} I (X_{k}; T | X^{k - 1}) \\ \overset{(a)}{=} & \sum_{k = 1}^{n} I (X_{k}; T X^{k - 1}) \\ \overset{(b)}{=} & \sum_{k = 1}^{n} I (X_{k}; T X^{k - 1} Y^{k - 1} Z^{k - 1}) \\ = & \sum_{k = 1}^{n} I (X_{k}; U_{k}) . \end{matrix}

(37)

Here,

(a)

follows since

X^{n}

is memoryless, and

(b)

follows since

X_{k} \leftrightarrow (T, X^{k - 1}) \leftrightarrow (Y^{k - 1}, Z^{k - 1})

forms a Markov chain.

We may now lower bound

R_{h}

as follows

\begin{matrix} n R_{h} & \geq & \sum_{k = 1}^{n} log L_{k}^{(n)} \geq H (V_{1}, V_{2}, \dots, V_{n}) \\ \geq & H (V_{1}, V_{2}, \dots, V_{n} | T) \\ = & \sum_{k = 1}^{n} H (V_{k} | T V^{k - 1}) \\ \geq & \sum_{k = 1}^{n} H (V_{k} | T V^{k - 1} X^{k - 1} Y^{k - 1} Z^{k - 1}) \\ \overset{(c)}{=} & \sum_{k = 1}^{n} H (V_{k} | T X^{k - 1} Y^{k - 1} Z^{k - 1}) \\ \overset{(c)}{=} & \sum_{k = 1}^{n} I (Y_{k}; V_{k} | T X^{k - 1} Y^{k - 1} Z^{k - 1}) \\ = & \sum_{k = 1}^{n} I (Y_{k}; V_{k} | U_{k}), \end{matrix}

(38)

where

(c)

follows since

V_{j}

is a deterministic function of

(T, Y^{j})

.

The sum-rate can be lower bounded as follows

\begin{matrix} n (R + R_{h}) & \geq & log M^{(n)} + \sum_{k = 1}^{n} log L_{k}^{(n)} \\ \geq & H (T) + H (V_{1}, \dots, V_{n}) \\ \geq & H (T) + H (V_{1}, \dots, V_{n} | T) \\ = & I (T; X^{n}) + H (V_{1}, \dots, V_{n} | T) \\ = & \sum_{k = 1}^{n} I (X_{k}; T | X^{k - 1}) + H (V_{1}, \dots, V_{n} | T) \\ \overset{(d)}{\geq} & \sum_{k = 1}^{n} [I (X_{k}; U_{k}) + I (Y_{k}; V_{k} | U_{k})], \end{matrix}

(39)

where

(d)

follows by equality (37) and inequality (38).

Draw J uniformly from

{1, \dots, n}

independently of

{(X_{k}, Y_{k}, Z_{k}, V_{k}, U_{k})}_{k = 1}^{n}

, and define the RV’s

U = (U_{J}, J)

,

V = V_{J}

,

Z = Z_{J}

,

Y = Y_{J}

, and

X = X_{J}

. Using J, we may express (37) as follows

\begin{matrix} R & \geq & \frac{1}{n} \sum_{k = 1}^{n} I (X_{k}; U_{k}) = I (X_{J}; U_{J} | J) \\ = & I (X_{J}; U_{J}, J) - I (X_{J}; J) \\ = & I (X_{J}; U_{J}, J) = I (X; U), \end{matrix}

(40)

and we may express (38) as follows

\begin{matrix} R_{h} & \geq & \frac{1}{n} \sum_{k = 1}^{n} I (Y_{k}; V_{k} | U_{k}) \\ = & I (Y_{J}; V_{J} | U_{J}, J) = I (Y; V | U) = H (V | U) . \end{matrix}

(41)

With regard to the expected distortion, we may write

\begin{matrix} D & \geq & \frac{1}{n} \sum_{i = 1}^{n} E [d (X_{i}, {\hat{X}}_{i})] \\ = & \frac{1}{n} \sum_{i = 1}^{n} E [d (X_{i}, g_{i}^{(n)} (T, V_{1}, \dots, V_{i}, Z^{i}))] \\ \overset{(e)}{\geq} & \frac{1}{n} \sum_{i = 1}^{n} E [d (X_{i}, g_{i}^{*} (T, X^{i - 1}, Y^{i - 1}, Z^{i - 1}, V_{i}, Z_{i}))] \\ = & \frac{1}{n} \sum_{i = 1}^{n} E [d (X_{i}, g_{i}^{*} (U_{i}, V_{i}, Z_{i}))] \\ = & E [d (X_{J}, g (U_{J}, J, V_{J}, Z_{J}))] \\ = & E [d (X, g (U, V, Z))] . \end{matrix}

(42)

Step

(e)

is justified as follows: Since

V_{1}, \dots, V_{i - 1}

are deterministic functions of

(T, Y^{i - 1})

(X^{i - 1}, Z^{i - 1}, Z_{i}) \leftrightarrow (T, Y^{i - 1}) \leftrightarrow (V_{1}, \dots, V_{i - 1})

is a Markov chain and, given

(U_{k}, V_{k}, Z_{k})

, the tuple

(T, V_{1}, \dots, V_{k}, Z^{k})

is independent of

(V_{1}, \dots, V_{k - 1})

. As a consequence of that, Lemma 1 in ([12] Section II.B) guarantees the existence of a reconstruction

{\hat{X}}_{k}^{*} (U_{k}, V_{k}, Z_{k})

which dominates

{\hat{X}}_{k}

in the sense that

\begin{matrix} E [d (X_{k}, {\hat{X}}_{k}^{*} (U_{k}, V_{k}, Z_{k}))] \leq E [d (X_{k}, {\hat{X}}_{k} (T, V_{1}, \dots, V_{k}, Z_{k}, Z^{k - 1}))] . \end{matrix}

(43)

This observation interpreted as the “data processing inequality” for estimation has already been made in ([12] Lemma 1).

Furthermore,

\begin{matrix} V_{J} & = & f_{1, J}^{(n)} (f^{(n)} (X^{n}), Y^{J}) = f_{1, J}^{(n)} (f^{(n)} (X^{n}), Y^{J - 1}, Y_{J}) \\ = & {\tilde{f}}_{1, J}^{(n)} (T, X^{J - 1}, Y^{J - 1}, Z^{J - 1}, Y_{J}) = {\tilde{f}}_{1, J}^{(n)} (U_{J}, J, Y_{j}) . \end{matrix}

(44)

By (1) and the memoryless property of the sequence

(X_{k}, Y_{k}, Z_{k}), k = 1, \dots, n

one can verify the Markov relation

U_{k} \leftrightarrow X_{k} \leftrightarrow (Y_{k}, Z_{k})

which implies the Markov relation

U \leftrightarrow X \leftrightarrow (Y, Z)

. Similarly, the definitions of

U_{k}

and

V_{k}

and the memoryless property of

(X_{k}, Y_{k}), k = 1, \dots, n

imply that, conditioned on

U_{k}

,

X_{k} \leftrightarrow Y_{k} \leftrightarrow V_{k}

forms a Markov chain.

Thus, conditioned on U,

X \leftrightarrow Y \leftrightarrow V

forms a Markov chain, hence

P_{X U Y V} = P_{X U} P_{Y | X} P_{V | Y U},

(45)

where

P_{Y | X}

denotes the conditional law induced by the marginal law

P_{X Y}

. The combination of (40)–(42) and (44) together with the latter Markov relations establish the converse.

We shall now obtain an alternative characterization for the lower bound (38). For a law

P_{U X Z Y V} = P_{U} P_{X | U} P_{Z Y | X} P_{V | Y U},

and its induced conditional law

P_{X Z Y V | U}

, let

\begin{matrix} Q (P_{X Z Y V | U}, \tilde{D}) ≜ inf_{g^{*} : U \times V \times Z \to \hat{X} : E [d (X, g^{*} (U, V, Z)) | U = u] \leq \tilde{D}} H (V | U = u), \end{matrix}

and let

\bar{Q} (P_{X Z Y V | U}, \cdot)

denote the lower convex envelope of

Q (P_{X Z Y V | U}, \cdot)

.

Define

\begin{matrix} Q_{s} (P_{U X Z Y V}, D) ≜ inf_{ρ (u) : \int ρ (u) d P_{U} (u) \leq D} \int_{U} \bar{Q} (P_{X Z Y V | U}, ρ (u)) d P_{U} (u), \end{matrix}

then by ([13] Section III.C, Lemma 1)

Q_{s} (P_{U X Z Y V}, \cdot)

is convex.

Note that

\begin{matrix} H (V_{k} | T, X^{k - 1}, Y^{k - 1}, Z^{k - 1}) & = & H (f_{1, k}^{(n)} (T, Y^{k - 1}, Y_{k}) | U_{k}) \\ = & \int H (f_{1, k}^{(n)} (t, y^{k - 1}, Y_{k}) | U_{k} = u_{k}) d μ (u_{k}) . \end{matrix}

(46)

The integrand on the RHS of (46) is the entropy of the scalar quantizer of

V_{k} ≜ f_{1, k}^{(n)} (T, Y^{k - 1}, Y_{k})

conditioned on

U_{k} = u_{k}

where

U_{k}

is defined in (36c). Now, conditioned on

U_{k} = u_{k}

, Lemma 1 in ([12] Section II.B) ensures that

\begin{matrix} E [d (X_{k}, {\hat{X}}_{k} (t, y^{k - 1}, f_{1, k}^{(n)} (t, y^{k - 1}, Y_{k}), Z^{k})) | U_{k} = u_{k}] \\ \geq E [d (X_{k}, g_{k}^{*} (U_{k}, V_{k}, Z_{k})) | U_{k} = u_{k}] . \end{matrix}

Consequently, we may lower bound the RHS of (46) as follows

\begin{matrix} \int H (f_{1, k}^{(n)} (t, y^{k - 1}, Y_{k}) | U_{k} = u_{k}) d μ (u_{k}) \overset{(a)}{\geq} \int \bar{Q} (P_{X Z Y V | U}, E [d (X_{k}, {\hat{X}}_{k}) | U_{k} = u_{k}]) d μ (u_{k}) \\ \geq \int [\int \bar{Q} (P_{X Z Y V | U}, E [d (X_{k}, g (u, V, z)) | U_{k} = u]) d μ (z | u)] d μ (u) \\ \overset{(b)}{\geq} \int \bar{Q} (P_{X Z Y V | U}, E [d (X_{k}, g (u, V, Z)) | U_{k} = u]) d μ (u) \\ \geq Q_{s} (P_{U X Z Y V}, D), \end{matrix}

(47)

where in

(a)

we used the definition of Q and in

(b)

its convexity. The lower bound (47) may be interpreted as follows. Fix

ρ : U \to R^{+}

and consider, for each

u \in U

, time shraing of at most two scalar quantizers for the “source”

P_{V | U = u}

attaining a distortion level

ρ (u)

. The optimal helper time shares side-information-dependent scalar quantizers of

V_{k}

(at most two per each side-information symbol

U_{k}

), while the reconstruction at the decoder is a function of

(U, V, Z)

.

Direct: To establish the achievability of

R^{*} (D)

, consider the codebook construction as follows. The codebook

A = {u_{1}, \dots, u_{M}}, u_{k} \in U^{n}

is obtained by drawing the n-length sequences

u_{k}

independently of

T_{P_{U}}^{δ}

. (For the definition of

T_{P_{U}}^{δ}

—the set of

δ

-strongly typical n-sequences corresponding to a marginal law

P_{U}

and a few properties of these sequences see [14,15,16]).

Given the source sequence x,

f^{(n)} (x)

is defined as follows.

If $x \in T_{P_{X}}^{δ}$ the encoder searches for the first sequence $U_{k} = u$ in $A$ such that (s.t.) $(x, u) \in T_{P_{X U}}^{2 δ}$ and sets $f^{(n)} (x) = k$ .
If $x \notin T_{P_{X}}^{δ}$ , or $∄ U_{k} \in A$ s.t. $(x, u) \in T_{P_{X U}}^{2 δ}$ , an encoding error is declared.

Given

f^{(n)} (x) = k

, the helper forms the sequence of descriptions

V_{i} = {\tilde{f}}_{1} (U_{k, i}, Z_{i}), V_{i} \in [1, \dots, L_{i}]

that is sent causally to the decoder.

Decoding: Given

f^{(n)} (x) = k

as well as the sub-sequence

V_{1}, \dots, V_{i}

, the decoder forms the reconstruction sequence

{\hat{X}}_{i} = g (U_{k, i}, V_{i}, Z_{i}), i \in [1, \dots, n]

.

Given that

(U, X)

are jointly typical since

(X, Y, Z)

is memoryless, the Markov lemma guarantees that, for large n, with high probability

(U, X, Y, Z)

are also jointly typical. Since

X \leftrightarrow (U, Y) \leftrightarrow V

forms a Markov chain, by the Markov lemma, with high probability,

(X, V)

as well as

(X, V, Z)

are jointly typical. Thus, with high probability,

(X, \hat{X})

are jointly typical, hence the distortion constraint (10) is fulfilled for large n. That the sequence

V_{1}, \dots, V_{n}

can be described at a conditional entropy rate satisfying (12b)–(47) can be established along similar lines as the proof of the direct part of Theorem 2 in ([13] Section III.C). Finally, standard error probability analysis verifies that, with high probability,

(U, X)

are jointly typical as long as (12a) holds.

4.2. Proof of Lemma 1

If

Y_{i} = \sqrt{ρ} X_{i} + W_{i}, i = 1, 2

, then

\begin{matrix} Y_{+} & = & \frac{\sqrt{ρ}}{\sqrt{2}} (X_{1} + X_{2}) + \frac{1}{\sqrt{2}} (W_{1} + W_{2}) ≜ \sqrt{ρ} X_{+} + W_{+} \end{matrix}

(48a)

\begin{matrix} Y_{-} & = & \frac{\sqrt{ρ}}{\sqrt{2}} (X_{1} - X_{2}) + \frac{1}{\sqrt{2}} (W_{1} - W_{2}) ≜ \sqrt{ρ} X_{-} + W_{-} \end{matrix}

(48b)

where

W_{-}

and

W_{+}

are independent and the pair

(W_{-}, W_{+})

is equal in distribution to the pair

(W_{1}, W_{2})

. Thus, the unitary transformation

(Y_{1}, Y_{2}) \mapsto (Y_{-}, Y_{+})

preserves the Gaussian nature of the channel and factors according to (see (53) ahead)

P_{U X_{-} X_{+} Y_{-} Y_{+} V} = P_{X_{-} X_{+} U} G_{Y_{-} | X_{-}}^{(ϱ)} G_{Y_{+} | X_{+}}^{(ϱ)} P_{V | U Y_{-} Y_{+}} .

(49)

To show (22a), consider the sequence of identities

\begin{matrix} λ I (Y_{+} Y_{-}; V | U) - h (X_{+} X_{-} | U) \\ = λ I (Y_{+}; V | U) + λ I (Y_{-}; V | U Y_{+}) - h (X_{-} | U Y_{+}) - h (X_{+} | U X_{-}) - I (X_{-}; Y_{+} | U) \\ = λ I (Y_{+}; V | U X_{-}) + λ I (Y_{-}; V | U Y_{+}) - h (X_{+} | U X_{-}) - h (X_{-} | U Y_{+}) \\ + (λ - 1) [I (Y_{+}; X_{-} | U) - I (Y_{+}; X_{-} | U V)] - I (Y_{+}; X_{-} | U V) . \end{matrix}

(50)

Moreover, to show (22b), consider the sequence of identities

\begin{matrix} λ I (Y_{+} Y_{-}; V | U) - h (X_{+} X_{-} | U) \\ = λ I (Y_{-}; V | U) + λ I (Y_{+}; V | U Y_{-}) - h (X_{+} | U Y_{-}) - h (X_{-} | U X_{+}) - I (X_{+}; Y_{-} | U) \\ = λ I (Y_{-}; V | U X_{+}) + λ I (Y_{+}; V | U Y_{-}) - h (X_{+} | U Y_{-}) - h (X_{-} | U X_{+}) \\ + (λ - 1) [I (Y_{-}; X_{+} | U) - I (Y_{-}; X_{+} | U V)] - I (Y_{-}; X_{+} | U V) . \end{matrix}

(51)

Starting with (50), consider the difference

I (Y_{+}; X_{-} | U) - I (Y_{+}; X_{-} | U V) = I (X_{-}; V | U) - I (X_{-}; V | U Y_{+})

(52)

under a law of the form

P_{U X_{-} X_{+} Y_{-} Y_{+} V} = P_{X_{-} X_{+} U} P_{Y_{-} | X_{-}} P_{Y_{+} | X_{+}} P_{V | U Y_{-} Y_{+}}

(53)

i.e., that

X_{-} \leftrightarrow Y_{-} {\leftrightarrow V |}_{(U, Y_{+})}

forms a Markov chain (see also ([9] Section VI.A, Lemma 4)).

We distinguish between the two cases:

(1): In case that $I (X_{-}; Y_{+} | U V) = 0$ , the non-negativity of mutual information implies that the expression (52) is non-negative, hence the inequalities (63) ahead hold for any $λ > 1$ .
(2): In case that $I (X_{-}; Y_{+} | U V) > 0$ , we prove first that for the set of laws which are feasible for the optimization problem (20), the expression (52) is strictly positive.

Observe that with the choice of

U ≜ U = (U_{1}, U_{2})

and

V = (V_{1}, V_{2})

where

(U_{1}, X_{1}, Y_{1}, V_{1})

and

(U_{2}, X_{2}, Y_{2}, V_{2})

are two independent copies of

(\tilde{U}, X, Y, \tilde{V})

such that

X \leftrightarrow Y \leftrightarrow \tilde{V} |_{\tilde{U}}

and

σ_{X | \tilde{U} \tilde{V}}^{2} \leq D

i.e.,

P_{\tilde{U} X Y \tilde{V}} \in Q

, we have

\begin{matrix} σ_{X_{+} | U Y_{-} V}^{2} & \leq & σ_{X_{+} | U V}^{2} = \frac{1}{2} σ_{X_{1} | U_{1} V_{1}}^{2} + \frac{1}{2} σ_{X_{2} | U_{2} V_{2}}^{2} \leq D \\ σ_{X_{-} | U Y_{+} V}^{2} & \leq & σ_{X_{-} | U V}^{2} = \frac{1}{2} σ_{X_{1} | U_{1} V_{1}}^{2} + \frac{1}{2} σ_{X_{2} | U_{2} V_{2}}^{2} \leq D . \end{matrix}

(54)

Thus, the unitary transformation

(Y_{1}, Y_{2}) \mapsto (Y_{-}, Y_{+})

picks a pair of independent copies of a law

P_{\tilde{U} X Y \tilde{V}} \in Q

and “creates” a pair of laws,

X_{-} \leftrightarrow Y_{-} {\leftrightarrow V |}_{U Y_{+}}

and

X_{+} \leftrightarrow Y_{+} {\leftrightarrow V |}_{U Y_{-}}

, which factor jointly according to (53) with

\begin{matrix} P_{Y_{-} | X_{-}} & = & G_{Y_{-} | X_{-}}^{(ϱ)} \\ P_{Y_{+} | X_{+}} & = & G_{Y_{+} | X_{+}}^{(ϱ)}, \end{matrix}

hence are symmetric w.r.t. the inputs

X_{-}

and

X_{+}

. We shall define the latter set of laws by

P^{*}

and, as shown above,

P^{*} \subseteq Q

.

Remark 2.

Suppose that

(\tilde{U}, X, Y) \sim P_{X \tilde{U}} G_{Y | X}^{(ϱ)}

with

\tilde{U} \in U

then, for both laws

X_{-} \leftrightarrow Y_{-} {\leftrightarrow V |}_{U Y_{+}}

and

X_{+} \leftrightarrow Y_{+} {\leftrightarrow V |}_{U Y_{-}}

, we have

(U, Y_{+}) \in U \times U \times R

and

(U, Y_{-}) \in U \times U \times R

. Since the image of the map

P_{X \tilde{U}} \mapsto (E [X^{2}], σ_{X | \tilde{U} V}^{2}, s_{λ} (X, ϱ | \tilde{U}))

(55)

is a convex set, standard dimensionality reduction argument can be used to establish the existence of a law

P_{X U}

where

P_{U}

is supported on a finite set and achieves any point in the image of the map (55) (See ([9] Section IV, Remark 2)).

By rate-distortion theory, the constraint

σ_{X_{-} | U, V}^{2} \leq D

with both U and V non-void implies that

\begin{matrix} I (X_{-}; U) < I (X_{-}; U V) \Rightarrow I (X_{-}; V | U) > 0 . \end{matrix}

(56)

Now, for any

P_{U X_{-} X_{+} Y_{-} Y_{+} V}

as per (49), conditioned on

(U, Y_{+})

, the random variable V is dependent on

Y_{-}

i.e.

I (Y_{-}; V | U Y_{+}) > 0

. As a consequence of that, since the mutual information

I (X_{-}; Y_{-})

is strictly positive and, conditioned on

(U, Y_{+})

,

X_{-} \leftrightarrow Y_{-} \leftrightarrow V

forms a Markov chain

I (X_{-}; V | U Y_{+}) > 0 .

(57)

Since a law of the form (53) dictates

I (X_{-}; V | U Y_{-} Y_{+}) = 0,

(58)

the combination of (56)–(58) yields that

I (X_{-}; V | U) \geq I (X_{-}; V | U Y_{+}) > I (X_{-}; V | U Y_{-} Y_{+}) = 0 .

Thus, the conditional mutual information

I (X_{-}; V | U)

is non-increasing under the conditioning on

(Y_{-}, Y_{+})

.

By the symmetry of the pair of laws

X_{-} \leftrightarrow Y_{-} {\leftrightarrow V |}_{U Y_{+}}

and

X_{+} \leftrightarrow Y_{+} {\leftrightarrow V |}_{U Y_{-}}

induced by

P_{U X_{-} X_{+} Y_{-} Y_{+} V}

, conditioned on

(U, Y_{-})

, the random variable V is dependent on

Y_{+}

, hence

I (Y_{+}; V | U Y_{-}) > 0 .

(59)

Now

\begin{matrix} P_{U X_{-} X_{+} Y_{-} Y_{+} V} & = & P_{X_{-} X_{+} U} P_{Y_{-} | X_{-}} P_{Y_{+} | X_{+}} P_{V | U Y_{-} Y_{+}} \\ = & P_{X_{-} X_{+} U} P_{Y_{+} | X_{+}} P_{V Y_{-} | U X_{-} Y_{+}}, \end{matrix}

(60)

hence

\begin{matrix} P_{U X_{-} X_{+} Y_{+} V} & = & \sum_{Y_{-}} P_{X_{-} X_{+} U Y_{-} Y_{+} V} \\ = & \sum_{Y_{-}} P_{X_{-} X_{+} U} P_{Y_{+} | X_{+}} P_{V Y_{-} | U X_{-} Y_{+}} \\ = & P_{X_{-} X_{+} U} P_{Y_{+} | X_{+}} P_{V | U X_{-} Y_{+}} . \end{matrix}

(61)

However, with the latter factorization, if indeed

I (X_{-}; V | U) = I (X_{-}; V | U Y_{+})

, i.e.,

I (X_{-}; V | U) = I (X_{-}; V | U Y_{+}) > I (X_{-}; V | U Y_{-} Y_{+}) = 0

then, since the conditional mutual information

I (X_{-}; V | U)

is non-increasing under the conditioning on

(Y_{-}, Y_{+})

, it follows that

P_{V | U X_{-} Y_{+}} = P_{V | U X_{-}}

, which is in contradiction with (59). Consequently, for

P_{U X_{-} X_{+} Y_{-} Y_{+} V} \in P^{*}

,

I (X_{-}; V | U) - I (X_{-}; V | U Y_{+}) = Δ > 0

, thus establishing that the expression (52) is positive.

Consequently, there exists some

λ^{*} > 1

, such that for any

λ \geq λ^{*} ≜ \frac{I (Y_{+}; X_{-} | U V)}{Δ} + 1

(λ - 1) [I (Y_{+}; X_{-} | U) - I (Y_{+}; X_{-} | U V)] - I (Y_{+}; X_{-} | U V) > 0 .

(62)

The combination of (50) and (62) implies that, for any

λ \geq λ^{*}

,

\begin{matrix} λ I (Y_{+} Y_{-}; V | U) - h (X_{+} X_{-} | U) \\ \overset{(a)}{\geq} - [h (X_{+} | U X_{-}) + h (X_{-} | U Y_{+})] + λ [I (Y_{+}; V | U X_{-}) + I (Y_{-}; V | U Y_{+})] \\ \overset{(b)}{\geq} s_{λ} (X_{+}, ϱ | X_{-}, U) + s_{λ} (X_{-}, ϱ | Y_{+}, U) . \end{matrix}

(63)

Here,

(a)

follows by (50) and (62), and

(b)

is true, since the set of laws over which the infimum on the RHS of (63) is evaluated is not empty. Indeed, the choice of

U = (U_{1}, U_{2})

and

V = (V_{1}, V_{2})

where

(U_{1}, X_{1}, Y_{1}, V_{1})

and

(U_{2}, X_{2}, Y_{2}, V_{2})

are two independent copies of

(\tilde{U}, X, Y, \tilde{V})

such that

X \leftrightarrow Y \leftrightarrow \tilde{V} |_{\tilde{U}}

and

σ_{X | \tilde{U} \tilde{V}}^{2} \leq D

–i.e.,

P_{\tilde{U} X Y \tilde{V}} \in Q

(hence it is feasible for

s_{λ} (X, ϱ | U)

in (19)), satisfies (54), hence it belongs to the feasible set.

On the other hand, since both mappings

(X_{1}, X_{2}) \mapsto (X_{+}, X_{-})

and

(Y_{1}, Y_{2}) \mapsto (Y_{+}, Y_{-})

are invertible, then with

U = (U_{1}, U_{2})

\begin{matrix} λ I (Y_{+} Y_{-}; V | U) - h (X_{+} X_{-} | U) & = & λ I (Y_{1} Y_{2}; V | U) - h (X_{1} X_{2} | U) \\ = & λ I (Y_{1}; V | U) + λ I (Y_{2}; V | U Y_{1}) - h (X_{1} | U) - h (X_{2} | U X_{1}) \\ \overset{(a)}{=} & λ I (Y_{1}; V | U) + λ I (Y_{2}; V | U Y_{1}) - h (X_{1} | U) - h (X_{2} | U) \\ = & λ I (Y_{1}; V | U) + λ I (Y_{2}; V | U) - h (X_{1} | U) - h (X_{2} | U) \\ - λ [I (Y_{2}; V | U) - I (Y_{2}; V | U Y_{1})] \\ = & λ I (Y_{1}; V | U) + λ I (Y_{2}; V | U) - h (X_{1} | U) - h (X_{2} | U) \\ - λ [I (Y_{2}; V | U) - h (Y_{2} | U Y_{1}) + h (Y_{2}; | U V Y_{1})] \\ \overset{(b)}{=} & λ I (Y_{1}; V | U) + λ I (Y_{2}; V | U) - h (X_{1} | U) - h (X_{2} | U) \\ - λ [I (Y_{2}; V | U) - h (Y_{2} | U) + h (Y_{2}; | U V Y_{1})] \\ = & λ I (Y_{1}; V | U) + λ I (Y_{2}; V | U) - h (X_{1} | U) - h (X_{2} | U) \\ + λ I (Y_{2}; Y_{1} | U V) . \end{matrix}

(64)

Here,

(a)

follows since, conditioned on

U

,

X_{1}

and

X_{2}

are independent, and

(b)

follows since, conditioned on

U

,

Y_{1}

and

Y_{2}

are independent.

When

U = (U_{1}, U_{2})

and

V = (V_{1}, V_{2})

where

(U_{1}, X_{1}, Y_{1}, V_{1})

and

(U_{2}, X_{2}, Y_{2}, V_{2})

are two independent copies of

(\tilde{U}, X, Y, \tilde{V})

, such that

X \leftrightarrow Y \leftrightarrow \tilde{V} |_{\tilde{U}}

and

σ_{X | \tilde{U} \tilde{V}}^{2} \leq D

, then

Y_{2} \leftrightarrow (U, V) \leftrightarrow Y_{1}

forms a Markov chain. Thus,

I (Y_{1}; Y_{2} | U V) = 0

and the RHS of (64) becomes

λ I (Y_{1}; V_{1} | U_{1}) + λ I (Y_{2}; V_{2} | U_{2}) - h (X_{1} | U_{1}) - h (X_{2} | U_{2})

thus establishing that,

\begin{matrix} inf_{\binom{{V : X \leftrightarrow Y \leftrightarrow V |}_{U}}{σ_{X | U V}^{2} \leq D}} \{- h (X_{1}, X_{2} | U) + λ I (Y_{1}, Y_{2}; V | U)\} \\ \leq \sum_{i = 1}^{2} inf_{\binom{V : X_{i} \leftrightarrow Y_{i} {\leftrightarrow V |}_{U_{i}}}{σ_{X_{i} | U_{i} V}^{2} \leq D}} \{- h (X_{i} | U_{i}) + λ I (Y_{i}; V | U_{i})\} = 2 s_{λ} (X, ϱ | U), \end{matrix}

(65)

where the inequality follows, since the infimum on the LHS is taken over a larger set than that on the RHS. The combination of (63) and (65) establishes (22a) for

λ \geq λ^{*}

.

Now, returning to (51), consider the difference

I (Y_{-}; X_{+} | U) - I (Y_{-}; X_{+} | U V) = I (X_{+}; V | U) - I (X_{+}; V | U Y_{-})

(66)

under the law (53) i.e., that

X_{+} \leftrightarrow Y_{+} {\leftrightarrow V |}_{(U, Y_{-})}

forms a Markov chain. By rate-distortion theory, the constraint

σ_{X_{+} | U, V}^{2} \leq D

with both U and V non-void implies that

\begin{matrix} I (X_{+}; U) < I (X_{+}; U V) \Rightarrow I (X_{+}; V | U) > 0 . \end{matrix}

(67)

Moreover, an argument similar to that leading to the conclusion that (52) is non-negative establishes that (66) is non-negative.

Consequently, there exist some

λ^{*} > 1

such that for any

λ \geq λ^{*}

(λ - 1) [I (Y_{-}; X_{+} | U) - I (Y_{-}; X_{+} | U V)] - I (Y_{-}; X_{+} | U V) > 0 .

(68)

In addition, the combination of (51) and (68) implies that, for any

λ \geq λ^{*}

,

\begin{matrix} λ I (Y_{+} Y_{-}; V | U) - h (X_{+} X_{-} | U) \geq s_{λ} (X_{-}, ϱ | X_{+}, U) + s_{λ} (X_{+}, ϱ | Y_{-}, U) . \end{matrix}

(69)

The combination of (69) and (65) establishes (22b) for

λ \geq λ^{*}

.

4.3. Proof of Lemma 2

Consider the tuple

(X_{+}, X_{-}, Y_{+}, Y_{-}, U)

as defined in Lemma 1 that is constructed from independent copies of

(U, X, Y) \sim P_{X U} G_{Y | X}^{(ϱ)}

. With the transformation (21)

X_{+}

and

X_{-}

preserve the variance of

X_{i}, i = 1, 2

and

W_{+}

and

W_{-}

preserve the variance of

W_{i}, i = 1, 2

. Furthermore, as shown in the proof of Lemma 1, the unitary transformation

(Y_{1}, Y_{2}) \mapsto (Y_{-}, Y_{+})

picks a pair of independent copies of a law

P_{U X Y V} \in Q

and “creates” a pair of laws,

X_{-} \leftrightarrow Y_{-} {\leftrightarrow V |}_{U Y_{+}}

and

X_{+} \leftrightarrow Y_{+} {\leftrightarrow V |}_{U Y_{-}}

, which factor jointly according to (53) with

\begin{matrix} P_{Y_{-} | X_{-}} & = & G_{Y_{-} | X_{-}}^{(ϱ)} \\ P_{Y_{+} | X_{+}} & = & G_{Y_{+} | X_{+}}^{(ϱ)}, \end{matrix}

hence are symmetric w.r.t., the inputs

X_{-}

and

X_{+}

and both are in the feasible set. Similarly, the unitary transformation

(Y_{1}, Y_{2}) \mapsto (Y_{-}, Y_{+})

picks a pair of independent copies of a law

P_{U X Y V} \in Q

and “creates” a pair of laws,

X_{-} \leftrightarrow Y_{-} {\leftrightarrow V |}_{U X_{+}}

and

X_{+} \leftrightarrow Y_{+} {\leftrightarrow V |}_{U X_{-}}

, which are both in the feasible set (see the factorization (61)).

Consequently, by (20), each of the quantities

s_{λ} (X_{+}, ϱ | X_{-}, U)

,

s_{λ} (X_{-}, ϱ | Y_{+}, U)

,

s_{λ} (X_{-}, ϱ | X_{+}, U)

and

s_{λ} (X_{+}, ϱ | Y_{-}, U)

is at least

V_{λ} (ϱ)

. The assumption

P_{X U} \in P

leads by (22) of Lemma 1 to an opposite conclusion. Therefore, all four quantities are equal and coincide with the minimum, i.e.,

\begin{matrix} s_{λ} (X_{-}, ϱ | Y_{+}, U) & = & s_{λ} (X_{+}, ϱ | Y_{-}, U) = V_{λ} (ϱ) \end{matrix}

(70a)

\begin{matrix} s_{λ} (X_{-}, ϱ | X_{+}, U) & = & s_{λ} (X_{+}, ϱ | X_{-}, U) = V_{λ} (ϱ) . \end{matrix}

(70b)

Let

B \sim B e r (1 / 2)

be a Bernoulli random variable with values in the set

{+, -}

independent of

(X_{-}, X_{+}, Y_{-}, Y_{+}, U)

. Let

\bar{B}

be the complement of B in the set

{+, -}

and define

(\tilde{X}, \tilde{U})

by

\tilde{X} ≜ X_{B}

and

\tilde{U} ≜ (B, Y_{\bar{B}}, U)

. Then

\begin{matrix} s_{λ} (\tilde{X}, ϱ | \tilde{U}) & = & \frac{1}{2} s_{λ} (X_{-}, ϱ | Y_{+}, U) + \frac{1}{2} s_{λ} (X_{+}, ϱ | Y_{-}, U) \\ \leq & V_{λ} (ϱ) + 2 ϵ, \end{matrix}

(71)

where the inequality follows by assumption (24a).

Furthermore,

E [{(\tilde{X})}^{2}] = E [X^{2}]

and by (54)

σ_{\tilde{X} | \tilde{U} V}^{2} \leq \frac{1}{2} σ_{X_{+} | U V}^{2} + \frac{1}{2} σ_{X_{-} | U V}^{2} \leq D .

(72)

\begin{matrix} h (\tilde{Y} | \tilde{U}) & = & h (Y_{B} | B, Y_{\bar{B}}, U) = \frac{1}{2} h (Y_{+} | Y_{-} U) + \frac{1}{2} h (Y_{-} | Y_{+} U) \\ h (\tilde{X} | \tilde{U}) & = & h (X_{B} | B, Y_{\bar{B}}, U) = \frac{1}{2} h (X_{+} | Y_{-} U) + \frac{1}{2} h (X_{-} | Y_{+} U) . \end{matrix}

Thus

\begin{matrix} h (\tilde{Y} | \tilde{U}) - h (\tilde{X} | \tilde{U}) & = & \frac{1}{2} [h (Y_{+} | Y_{-} U) + h (Y_{-} | Y_{+} U)] - \frac{1}{2} [h (X_{+} | Y_{-} U) + h (X_{-} | Y_{+} U)] \\ = & \frac{1}{2} [h (Y_{+} Y_{-} | U) - h (Y_{-} | U) + h (Y_{-} | Y_{+} U)] - \frac{1}{2} [h (X_{+} | Y_{-} U) + h (X_{-} | Y_{+} U)] \\ = & \frac{1}{2} [h (Y_{+} Y_{-} | U) - h (X_{+} X_{-} | U)] - \frac{1}{2} I (Y_{-}; Y_{+} | U) \\ + \frac{1}{2} [h (X_{+} | U) + h (X_{-} | X_{+} U)] - \frac{1}{2} [h (X_{+} | Y_{-} U) + h (X_{-} | Y_{+} U)] \\ \overset{(a)}{=} & \frac{1}{2} [h (Y_{+} Y_{-} | U) - h (X_{+} X_{-} | U)] - \frac{1}{2} I (Y_{-}; Y_{+} | U) + \frac{1}{2} I (X_{+}; Y_{-} | U) \\ + \frac{1}{2} [h (X_{-} | X_{+} Y_{+} U) - h (X_{-} | Y_{+} U)] \\ \overset{(b)}{=} & \frac{1}{2} [h (Y_{+} Y_{-} | U) - h (X_{+} X_{-} | U)] + \frac{1}{2} [h (Y_{-} | Y_{+} U) - h (Y_{-} | X_{+} Y_{+} U)] \\ - \frac{1}{2} I (X_{-}; X_{+} | Y_{+} U) \\ = & \frac{1}{2} [h (Y_{+} Y_{-} | U) - h (X_{+} X_{-} | U)] + \frac{1}{2} I (Y_{-}; X_{+} | Y_{+} U) - \frac{1}{2} I (X_{-}; X_{+} | Y_{+} U) \\ \overset{(c)}{=} & \frac{1}{2} [h (Y_{+} Y_{-} | U) - h (X_{+} X_{-} | U)] + \frac{1}{2} I (Y_{-}; X_{+} | Y_{+} U) - \frac{1}{2} I (X_{-} Y_{-}; X_{+} | Y_{+} U) \\ = & \frac{1}{2} [h (Y_{+} Y_{-} | U) - h (X_{+} X_{-} | U)] - \frac{1}{2} I (X_{-}; X_{+} | Y_{-} Y_{+} U) \\ = & h (Y | U) - h (X | U) - \frac{1}{2} I (X_{-}; X_{+} | Y_{-} Y_{+} U) . \end{matrix}

(73)

Here,

$(a)$: follows since $X_{-} \leftrightarrow (X_{+} U) \leftrightarrow Y_{+}$ forms a Markov chain,
$(b)$: follows since $Y_{-} \leftrightarrow (X_{+} U) \leftrightarrow Y_{+}$ forms a Markov chain, and
$(c)$: follows since $Y_{-} \leftrightarrow (X_{-} Y_{+} U) \leftrightarrow X_{+}$ forms a Markov chain.

The combination of identity (73) with inequality (24b) proves (26).

Since

\tilde{X} \in R

while

\tilde{U} \in {+, -} \times R \times U \times U

a dimensionality reduction argument as in Remark 2 establishes the existence of a law

P_{X^{'} U^{'}}

, where

U^{'}

has finite support, such that (25) and (26) are fulfilled, hence

(U^{'}, X^{'})

are in the feasible set.

Funding

This research received no external funding.

Conflicts of Interest

The author declares no conflict of interest.

References

Wyner, A.D.; Ziv, J. The rate-distortion function for source coding with side information at the receiver. IEEE Trans. Inform. Theory 1976, IT-22, 1–11. [Google Scholar] [CrossRef]
Steinberg, Y.; Merhav, N. On successive refinement for the Wyner-Ziv problem. IEEE Trans. Inform. Theory 2004, 50, 1636–1654. [Google Scholar] [CrossRef]
Bross, S.I.; Weissman, T. On successive refinement for the Wyner-Ziv problem with partially cooperating decoders. In Proceedings of the 2008 IEEE International Symposium on Information Theory (ISIT), Toronto, ON, Canada, 6–11 July 2008. [Google Scholar]
Weissman, T.; Gamal, A.E. Source coding with limited-look-ahead side information at the decoder. IEEE Trans. Inform. Theory 2006, 52, 5218–5239. [Google Scholar] [CrossRef]
Maor, A.; Merhav, N. On successive refinement with causal side information at the decoders. IEEE Trans. Inform. Theory 2008, 54, 332–343. [Google Scholar] [CrossRef] [Green Version]
Bross, S.I. Scalable source coding with causal side information and a causal helper. In Proceedings of the 2020 IEEE International Symposium on Information Theory (ISIT), Los Angeles, CA, USA, 21–26 June 2020. [Google Scholar]
Geng, Y.; Gohari, A.; Nair, C.; Yu, Y. The capacity region of classes of product broadcast channels. arXiv 2011, arXiv:1105.5438. [Google Scholar]
Geng, Y.; Nair, C. The capacity region of the two-receiver Gaussian vector broadcast channel with private and common messages. IEEE Trans. Inform. Theory 2014, IT-60, 2087–2104. [Google Scholar] [CrossRef]
Courtade, T.A. A strong entropy power inequality. IEEE Trans. Inform. Theory 2018, IT-64, 2173–2192. [Google Scholar] [CrossRef]
Bernstein, S.N. On a property characteristic of the normal law. Trudy Leningrad. Polytech. Inst. 1941, 3, 21–22. [Google Scholar]
Bryc, W. The Normal Distribution: Characterizations with Applications; Springer: New York, NY, USA, 2012; Volume 100. [Google Scholar]
Choudhuri, C.; Kim, Y.-H.; Mitra, U. Causal state communication. IEEE Trans. Inform. Theory 2013, 59, 3709–3719. [Google Scholar] [CrossRef] [Green Version]
Weissman, T.; Merhav, N. On causal source codes with side information. IEEE Trans. Inform. Theory 2005, 51, 4003–4013. [Google Scholar] [CrossRef]
Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley and Sons: New York, NY, USA, 1991. [Google Scholar]
Csiszár, I.; Körner, J. Information Theory: Coding Theorems for Discrete Memoryless Systems; Academic: New York, NY, USA, 1981. [Google Scholar]
Gamal, A.E.; Kim, Y.H. Network Information Theory; Cambridge University Press: Cambridge, UK, 2012. [Google Scholar]

Figure 1. The causal listening-helper model.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bross, S.I. Source Coding with a Causal Helper. Information 2020, 11, 553. https://doi.org/10.3390/info11120553

AMA Style

Bross SI. Source Coding with a Causal Helper. Information. 2020; 11(12):553. https://doi.org/10.3390/info11120553

Chicago/Turabian Style

Bross, Shraga I. 2020. "Source Coding with a Causal Helper" Information 11, no. 12: 553. https://doi.org/10.3390/info11120553

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Source Coding with a Causal Helper

Abstract

1. Introduction

2. Problem Formulation

3. Main Results

3.1. The Gaussian Setting with $Z = \emptyset$ .

3.2. The Gaussian Setting with Decoder Side-Information Z

4. Proofs

4.1. Proof of Theorem 1

4.2. Proof of Lemma 1

4.3. Proof of Lemma 2

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Source Coding with a Causal Helper

Abstract

1. Introduction

2. Problem Formulation

3. Main Results

3.1. The Gaussian Setting with Z = ∅ .

3.2. The Gaussian Setting with Decoder Side-Information Z

4. Proofs

4.1. Proof of Theorem 1

4.2. Proof of Lemma 1

4.3. Proof of Lemma 2

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.1. The Gaussian Setting with $Z = \emptyset$ .