The State-Dependent Channel with a Rate-Limited Cribbing Helper

Lapidoth, Amos; Steinberg, Yossef

doi:10.3390/e26070570

Open AccessFeature PaperArticle

The State-Dependent Channel with a Rate-Limited Cribbing Helper

by

Amos Lapidoth

^1,*

and

Yossef Steinberg

²

¹

Signal and Information Processing Laboratory, ETH Zurich, 8092 Zurich, Switzerland

²

Department of Electrical and Computer Engineering, Technion—Israel Institute of Technology, Haifa 3200003, Israel

^*

Author to whom correspondence should be addressed.

Entropy 2024, 26(7), 570; https://doi.org/10.3390/e26070570

Submission received: 29 April 2024 / Revised: 21 June 2024 / Accepted: 25 June 2024 / Published: 30 June 2024

(This article belongs to the Collection Feature Papers in Information Theory)

Download

Browse Figure

Versions Notes

Abstract

:

The capacity of a memoryless state-dependent channel is derived for a setting in which the encoder is provided with rate-limited assistance from a cribbing helper that observes the state sequence causally and the past channel inputs strictly causally. Said cribbing may increase capacity but not to the level achievable by a message-cognizant helper.

Keywords:

backward decoding; block Markov; causal; cribbing; helper; rate limited; state dependent channel

1. Introduction

An encoder for a state-dependent channel is said to have causal state information if the channel input

X_{i}

it produces at time i may depend, not only on the message m it wishes to transmit, but also on the present and past channel states

S_{i}

and

S^{i - 1}

(where

S^{i - 1}

stands for the states

S_{1}, \dots, S_{i - 1}

). Its state information is noncausal if, in addition to depending on the message, its time i input may depend on all the channel states: past

S^{i - 1}

, present

S_{i}

, and future

S_{i + 1}^{n}

(where n denotes the blocklength, and

S_{i + 1}^{n}

stands for

S_{i + 1}, \dots, S_{n}

).

The former case was studied by Shannon [1], who showed that capacity can be achieved by what-we-now-call Shannon strategies. The latter was studied by Gel’fand and Pinsker [2], who showed that the capacity, in this case, can be achieved using binning [3].

As of late, there has been renewed interest in the causal case, but when the state information must be quantized before it is provided to the encoder [4]. While still causal, the encoder is not provided now with the state sequence

{S_{i}}

directly, but rather with some “assistance sequence”

{T_{i}}

describing it. Its time i output

X_{i}

is now determined by the message m and by the present and past assistances

T^{i}

. The assistance sequence is produced by a helper, which observes the state sequence causally and produces the time i assistance

T_{i}

based on the present and past states

S^{i}

subject to the additional constraint that

T_{i}

take values in a given finite set

T

whose cardinality is presumably smaller than that of the state alphabet

S

. (If the cardinality of

T

is one, the problem reduces to the case of no assistance; if it exceeds or equals the cardinality of

S

, the problem reduces to Shannon’s original problem because, in this case,

T_{i}

can describe

S_{i}

unambiguously.) We refer to the base-2 logarithm of the cardinality of

T

as the “help rate” and denote it

R_{h}

:

\begin{matrix} R_{h} & = & {log}_{2} | T | . \end{matrix}

(1)

Three observations in [4] inspired the present paper:

Symbol-by-symbol quantizers are suboptimal: restricting $T_{i}$ to be a function of $S_{i}$ may reduce capacity.
Allowing $T_{i}$ to depend not only on $S^{i}$ but also on the message m may increase capacity.
If $T_{i}$ is allowed to depend on $S^{i}$ , as well as on the transmitted message, then message-cognizant symbol-by-symbol helpers achieve capacity: there is no loss in capacity in restricting $T_{i}$ to be a function of $(m, S_{i})$ .

Sandwiched between the message-oblivious helper and the message-cognizant helper is the cribbing helper whose time-i assistance

T_{i}

depends on

S^{i}

and on the past symbols produced by the encoder

\begin{matrix} T_{i} & = & T_{i} (S^{i}, X^{i - 1}) . \end{matrix}

(2)

Such a helper is depicted in Figure 1.

Since one can reproduce the channel inputs from the states and message, the cribbing helper cannot outperform the message-cognizant helper. And since the helper can ignore the past channel inputs, the cribbing capacity must be at least as high as that of the message-oblivious helper.

Here, we shall characterize the capacity with a cribbing helper and show that the above inequalities can be strict: the message-cognizant helper may outperform the cribbing helper, and the latter may outperform the message-oblivious helper (presumably because, thanks to the cribbing, it can learn something about the message). We further show that the capacity of the cribbing helper can be achieved using a block Markov coding scheme with backward decoding.

It is important to note that allowing the helper to crib does not render it a relay [5] because the helper does not communicate with the receiver. Therefore, our results do not have any bearing on the Relay problem.

It is also noteworthy that message-cognizant helpers are also advantageous in the noncausal case. For such helpers, capacity was recently computed in [6,7]. Cribbing, however, is somewhat less natural in this setting.

2. Problem Statement and Main Result

We are given a state-dependent discrete memoryless channel

W_{Y | X S}

of finite input, output, and state alphabets

X

,

Y

, and

S

. When its input is

x \in X

and its state is

s \in S

, the probability of its output being

y \in Y

is

W_{Y | X S} (y | x, s)

. The states

{S_{i}}

are drawn IID

\sim P_{S}

, where

P_{S}

is some given probability mass function (PMF) on the state alphabet

S

. Also given is some finite set

T

that we call the description alphabet. We shall assume throughout that its cardinality is at least 2

| T | \geq 2

(3)

because otherwise the helper cannot provide any assistance.

Given some blocklength n, a rate-R message set is a set

M

whose cardinality is

2^{n R}

(where we ignore the fact that the latter need not be an integer). A blocklength-n encoder for our channel comprises n mappings

\begin{matrix} f_{i} : M \times T^{i} \to X, i = 1, \dots, n \end{matrix}

(4)

with the understanding that if the message to be transmitted is

m \in M

, and if the assistance sequence produced by the helper is

t^{n} \in T^{n}

, then the time i channel input produced by the encoder is

\begin{matrix} x_{i} & = & f_{i} (m, t^{i}) \end{matrix}

(5)

which we also denote

x_{i} (m, t^{i})

. Here,

T^{i}

denotes the i-fold Cartesian product

\begin{matrix} T^{i} = \underset{i times}{\underset{︸}{T \times T \times \dots \times T}} \end{matrix}

(6)

and

t^{j}

denotes

t_{1}, \dots, t_{j}

. A blocklength-n cribbing helper comprises n mapping

\begin{matrix} h_{i} : X^{i - 1} \times S^{i} \to T, i = 1, \dots, n \end{matrix}

(7)

with the understanding that—after observing the channel inputs

x_{1}, \dots, x_{i - 1}

and the states

s_{1}, \dots, s_{i}

—the helper produces the time i assistance

\begin{matrix} t_{i} & = & h_{i} (x^{i - 1}, s^{i}) \end{matrix}

(8)

which we also denote

t_{i} (x^{i - 1}, s^{i})

.

Communication proceeds as follows: the helper produces the time-1 assistance

t_{1}

that is given by

h_{1} (s_{1})

, and the encoder then produces the first channel input

x_{1} = f_{1} (m, t_{1})

. The helper then produces the time-2 assistance

t_{2}

that is given by

h_{2} (x_{1}, s^{2})

, and the encoder then produces the second channel input

x_{2} = f_{2} (m, t^{2})

, and so on.

The decoder is cognizant neither of the state sequence

s^{n}

nor of the assistance sequence

t^{n}

: it is thus a mapping of the form

ϕ : Y^{n} \to M

(9)

with the understanding that, upon observing the output sequence

Y^{n}

, the decoder guesses that the transmitted message is

ϕ (Y^{n})

.

Let

P_{e} = Pr (ϕ (Y^{n}) \neq M)

denote the probability of a decoding error when the message M is drawn uniformly from

M

. If

P_{e} < ϵ

, then we say that the coding scheme is of parameters

(n, 2^{n R}, | T |, ϵ)

or that it is a

(n, 2^{n R}, | T |, ϵ)

-scheme. A rate R is said to be achievable if, for every

ϵ > 0

, there exist, for all sufficiently large n, schemes as above with

P_{e} < ϵ

. The capacity of the channel is defined as the supremum of all achievable rates R and is denoted C.

Define

C^{(I)} = max min \{I (U V; Y), I (U; X | V T)\}

(10)

where the maximum is over all finite sets

U

and

V

and over all joint distributions of the form

P_{S} P_{U V} P_{T | V S} P_{X | U V T} W_{Y | X S}

(11)

with T taking values in the assistance alphabet

T

. (When writing Markov conditions and information theoretic quantities such as entropy and mutual information, we do not separate the variables with commas. We thus write

H (X Y)

, and not

H (X, Y)

, for the joint entropy of X and Y. We do, however, introduce commas when this convention can lead to ambiguities; see, for example, (62).)

Our main result is stated next.

Theorem 1.

The capacity C of the memoryless state-dependent channel with rate-limited cribbing helper equals

C^{(I)}

:

C = C^{(I)} .

(12)

Moreover, the maximum in (10) can be achieved when:

1.: $P_{T | V S}$ and $P_{X | U V T}$ are both zero-one laws.
2.: The alphabet sizes of U and V are restricted to

$\begin{matrix} | V | & \leq & L^{2} | S | (| T | - 1) + L \\ | U | & \leq & L^{3} | T | (| X | - 1) + L \end{matrix}$

where $L = | X | | T | | S | + 1$ .
3.: The chain

$V ⊸ – U ⊸ – (X T S) ⊸ – Y$

(13)

is a Markov chain.

(Henceforth, we use

A ⊸ – B ⊸ – C

to indicate that A and C are conditionally independent given B and, more generally,

A ⊸ – B ⊸ – C ⊸ – D

to indicate that

A, B, C, D

forms a Markov chain.)

The proof is given in Section 4.

Remark 1.

The assumption of (3) notwithstanding, the theorem also holds in case

| T | = 1

, which corresponds to no help.

Proof of Remark 1.

When T is deterministic,

P_{X | U V T}

equals

P_{X | U V}

, and the data processing inequality implies that

I (U V; Y) \leq I (X; Y)

, thus establishing that, in this case,

C^{(I)}

is upper bounded by the capacity without state information, i.e.,

\begin{matrix} C^{(I)} & \leq & max_{P_{X}} I (X; Y) . \end{matrix}

(14)

Equality can be established by choosing V as null and U as X, a choice that results in

I (U V; Y)

being

I (X; Y)

and in

I (U; X | V T)

being

H (X)

. □

Remark 2.

As is to be expected, when

| T | \geq | S |

, i.e., when T can describe S precisely,

C^{(I)}

reduces to the Shannon strategies capacity

C^{(S h)}

of the channel with perfect causal state information at the transmitter:

C^{(Sh)} = max I (U; Y)

(15)

where the maximization is over all the joint PMFs of the form

P_{S} P_{U} P_{X | U S} W_{Y | X S}

(and where, without altering the result of the maximization, we can restrict

P_{X | U S}

to be zero–one valued).

Proof of Remark 2.

We first establish that

C^{(I)}

is greater-equal

C^{(Sh)}

. To that end, we set T to equal S and V to be null and then argue that, with this choice, the minimum of the two terms in (10) is the first, i.e.,

I (U V; Y)

(which, because V is null, equals

I (U; Y)

). To that end, we calculate

\begin{matrix} (16) & \begin{matrix} I (U; X | V T) & = & I (U; X | V T) \end{matrix} \\ (17) & \begin{matrix} = & I (U; X | S) \end{matrix} \\ (18) & \begin{matrix} = & I (U; X S) \end{matrix} \\ (19) & \begin{matrix} = & I (U; Y) \end{matrix} \end{matrix}

where the first equality holds because V is null, the second because T equals S, the third because U is independent of S, and the final inequality follows from the Data Processing inequality.

It remains to prove that

\begin{matrix} C^{(I)} & \leq & C^{(Sh)} \end{matrix}

(20)

which always holds. To simplify our analysis, we assume the Markov condition (13), and we then upper-bound

I (U V; Y)

(which is an upper bound on the minimum in the definition (10) of

C^{(I)}

). Under this Markov condition, the maximum of

I (U V; Y)

can be achieved with V null, which we proceed to assume. The joint PMF of the remaining variables is then of the form

P_{S} P_{U} P_{T | S} P_{X | U T} W_{Y | X S} .

(21)

We will show that—for every fixed

P_{U}

—to any choice of

P_{T | S}

and

P_{X | U T}

there corresponds a choice of

P_{X | U S}

, which is feasible for the maximization defining

C^{(Sh)}

in (15) and that thus proves that

C^{(Sh)} \geq I (U; Y)

.

To this end, we begin by expressing the channel from U to Y using (21) as

\begin{matrix} (22) & \begin{matrix} P_{Y | U} & = & \sum_{s} P_{S} P_{X | U S} W_{Y | X S} \end{matrix} \\ (23) & \begin{matrix} = & \sum_{s} P_{S} (\sum_{t} P_{T | U S} P_{X | U S T}) W_{Y | X S} \end{matrix} \\ (24) & \begin{matrix} = & \sum_{s} P_{S} (\sum_{t} P_{T | S} P_{X | U T}) W_{Y | X S} . \end{matrix} \end{matrix}

We then note that, for a fixed

P_{U}

, the mutual information

I (U; Y)

is thus determined by the

| S | \cdot | U |

conditional PMFs of X given

(S, U) = (s, u)

{\{\sum_{t} P_{T | U S} (t | u, s) P_{X | U T} (x | u, t)\}}_{(s, u) \in S \times U} .

These conditional PMFs are feasible for the maximization defining

C^{(Sh)}

in (15), thus demonstrating that

C^{(Sh)} \geq I (U; Y)

. □

3. Example

We next present an example where the message-cognizant helper outperforms the cribbing helper and the latter outperforms the plain-vanilla causal helper. It is trivial to find cases where the three perform identically, e.g., when the state does not affect the channel. The example is borrowed from ([4], Example 7) (from which we also lift the notation).

The channel inputs, states, and outputs are binary tuples

X = S = Y = {0, 1} \times {0, 1}

(25)

and are denoted

(A, B)

,

(S^{(0)}, S^{(1)})

, and

(Y^{(0)}, Y^{(1)})

respectively. The two components of the state are IID, each taking on the values 0 and 1 equiprobably. Given the state and input, the channel output is deterministically given by

Y = (A, B \oplus S^{(A)}) .

(26)

The assistance is one-bit assistance, so

T = {0, 1}

.

As shown in ([4], Claim 8), the capacity with a message-cognizant helper is 2 bits, and with a message-oblivious helper

log 3

. Here, we show that the capacity with a cribbing helper is strictly smaller than 2 bits and strictly larger than

log 3

. All logarithms in this section are base-2 logarithms, and all rates are in bits.

We begin by showing the former. Recall that if R is achievable, then it must satisfy the constraints

\begin{matrix} R & \leq & I (U V; Y) \end{matrix}

(27)

\begin{matrix} R & \leq & I (U; X | V T) . \end{matrix}

(28)

Recall also the form of the joint PMF

P_{S} P_{V} P_{T | V S} P_{U | V} P_{X | U V T} W_{Y | X S}

(29)

and that we may assume that

P_{X | U V T} (x | u, v, t)

is zero-one valued. Note that (29) implies

S T ⊸ – V ⊸ – U

(30)

and consequently

S ⊸ – T V ⊸ – U .

(31)

We will show that the above constraints cannot be both satisfied if

R = 2

. To that end, we assume that

\begin{matrix} I (U; X | V T) = 2 \end{matrix}

(32)

(it cannot be larger because

|X| = 4

) and prove that

\begin{matrix} I (U V; Y) < 2 . \end{matrix}

(33)

Since

Y

is of cardinality 4, it suffices to show that

H (Y | U V) > 0 .

(34)

In fact, it suffices to show that

H (Y | U V T) > 0

(35)

i.e., that there exist

u^{★}, v^{★}, t^{★}

of positive probability for which

H (Y | U = u^{★}, V = v^{★}, T = t^{★}) > 0 .

(36)

This is what we proceed to do. We first show the existence of

v^{★}

and

t^{★}

for which

H (S | V = v^{★}, T = t^{★}) \geq 1

. Once this is established, we proceed to pick

u^{★}

.

Since

|X| = 4

, (32) implies that

P_{X | V = v, T = t} is uniform \forall (v, t) .

(37)

Fix any

v^{★}

(of positive probability). As we next argue, there must exist some

t^{★}

for which

P_{S | V = v^{★}, T = t^{★}}

is not zero–one valued. Indeed, by (29),

V ⊥ ⊥ S

, so

H (S | V = v^{★}) = H (S) = 2

and

\begin{matrix} (38) & \begin{matrix} H (S | T, V = v^{★}) & = & H (S | V = v^{★}) - I (S; T | V = v^{★}) \end{matrix} \\ (39) & \begin{matrix} = & H (S) - I (S; T | V = v^{★}) \end{matrix} \\ (40) & \begin{matrix} \geq & H (S) - H (T | V = v^{★}) \end{matrix} \\ (41) & \begin{matrix} \geq & 2 - log |T| \end{matrix} \\ (42) & \begin{matrix} = & 1 \end{matrix} \end{matrix}

so there must exist some

t^{★}

for which

H (S | V = v^{★}, T = t^{★}) \geq 1 .

(43)

We next choose

u^{★}

as follows. Conditional on

V = v^{★}, T = t^{★}

, the chance variable U has some PMF

P_{U | V = v^{★}, T = t^{★}}

(equal to

P_{U | V = v^{★}}

by (29)) under which

X (U, v^{★}, t^{★})

is uniform; see (37). It follows that there exist

u_{0}

and

u_{1}

(both of positive conditional probability) such that

\begin{matrix} A (u_{0}, v^{★}, t^{★}) & = & 0 \end{matrix}

(44)

\begin{matrix} A (u_{1}, v^{★}, t^{★}) & = & 1 \end{matrix}

(45)

where we introduced the notation

X (u, v^{★}, t^{★}) = (A (u, v^{★}, t^{★}), B (u, v^{★}, t^{★})) .

(46)

Returning to (43), we note that it implies that

\begin{matrix} H (S^{(0)} | V = v^{★}, T = t^{★}) > 0 or H (S^{(1)} | V = v^{★}, T = t^{★}) > 0 . \end{matrix}

(47)

In the former case,

H (Y | U = u_{0}, V = v^{★}, T = t^{★})

is positive, and in the latter,

H (Y | U = u_{1}, V = v^{★}, T = t^{★})

is positive. This establishes the existence of a triple

(u^{★}, v^{★}, t^{★})

for which (36) holds, and thus concludes the proof that the capacity with a cribbing encoder is smaller than 2. We next show that it exceeds

log 3

.

To that end, we consider choosing

U = (A, \tilde{U})

(48)

to be uniform over

{0, 1} \times {0, 1}

, and we let

σ

be a Bernoulli–

α

random variable that is independent of U and of the channel, for some

α \in [0, 1]

to be specified later. We further define

\begin{matrix} \tilde{V} = \{\begin{matrix} A & if & σ = 1 \\ 0 (null) & if & σ = 0 \end{matrix} \end{matrix}

(49)

and

V = (\tilde{V}, σ) .

(50)

We choose the helper function

h (s, v)

—which can also be written as

h ((s^{(0)}, s^{(1)}), (\tilde{v}, σ))

—to equal

s^{(\tilde{v})}

, so

T = S^{(\tilde{V})}

(51)

and

T = \{\begin{matrix} S^{(A)} & w . p . α \\ S^{(0)} & w . p . 1 - α . \end{matrix}

(52)

Our encoder function

f (u, v, t)

ignores v and results in

X^{(0)} = A, X^{(1)} = \tilde{U} \oplus T

(53)

where

X = (X^{(0)}, X^{(1)})

. That is,

f ((A, \tilde{U}), T) = (A, \tilde{U} \oplus T) .

(54)

Note that with the variables defined in (49)–(53), the Markov relations in item 3 of Theorem 1 hold.

We now proceed to calculate the rate bounds. For the RHS of (27), we have

\begin{matrix} I (U V; Y) & = & I (U \tilde{V} σ; Y) \\ \geq & I (U \tilde{V}; Y | σ) \\ = & α I (U \tilde{V}; Y | σ = 1) + (1 - α) I (U \tilde{V}; Y | σ = 0) \\ = & α I (A \tilde{U}; Y | σ = 1) + (1 - α) I (A \tilde{U}; Y | σ = 0) \end{matrix}

(55)

where the last equality holds because if

σ = 0

, then

\tilde{V}

is null.

We next evaluate each of the terms on the RHS of (55) separately. When

σ = 1

,

T = S^{(A)}

T = S^{(A)} X^{(1)} = \tilde{U} \oplus S^{(A)}

Y^{(1)} = X^{(1)} \oplus S^{(A)} = \tilde{U} \oplus S^{(A)} \oplus S^{(A)} = \tilde{U}

(56)

so

Y = (Y^{(0)}, Y^{(1)}) = (A, \tilde{U})

(57)

and

\begin{matrix} I (A \tilde{U}; Y | σ = 1) & = & H (U | σ = 1) \\ = & H (U) \\ = & 2 \end{matrix}

(58)

where the second equality holds because

σ

is independent of U.

When

σ = 0

,

\begin{matrix} T & = & S^{(0)} \end{matrix}

\begin{matrix} X & = & (A, \tilde{U} \oplus S^{(0)}) \end{matrix}

\begin{matrix} Y & = & (A, \tilde{U} \oplus S^{(0)} \oplus S^{(1)}) \end{matrix}

(59)

so

\begin{matrix} I (A \tilde{U}; Y | σ = 0) & = & I (A \tilde{U}; Y^{(0)} Y^{(1)} | σ = 0) \\ = & I (A \tilde{U}; A, \tilde{U} \oplus S^{(0)} \oplus S^{(A)}) \\ = & I (A \tilde{U}; A) + I (A \tilde{U}; \tilde{U} \oplus S^{(0)} \oplus S^{(A)} | A) \\ = & H (A) + \frac{1}{2} I (\tilde{U}; \tilde{U} \oplus S^{(0)} \oplus S^{(0)} | A = 0) + \frac{1}{2} I (\tilde{U}; \tilde{U} \oplus S^{(0)} \oplus S^{(1)} | A = 1) \\ = & H (A) + \frac{1}{2} H (\tilde{U}) + 0 = \frac{3}{2} . \end{matrix}

(60)

From (58), (60), and (55), we obtain that the RHS of (27) satisfies

I (U V; Y) \geq 2 α + (1 - α) \frac{3}{2} = (α + 3) / 2 .

(61)

Next, we evaluate the RHS of (28):

\begin{matrix} I (U; X | V T) & = & I (U; X | \tilde{V}, σ, T) \\ = & α I (U; X | \tilde{V}, σ = 1, T) + (1 - α) I (U; X | \tilde{V}, σ = 0, T) \\ = & α I (A \tilde{U}; X | A, S^{(A)}, σ = 1) + (1 - α) I (A \tilde{U}; A, \tilde{U} \oplus S^{(0)} | S^{(0)}, σ = 0) \\ = & α I (\tilde{U}; A, \tilde{U} \oplus T | A S^{(A)}, σ = 1) + (1 - α) I (A \tilde{U}; A, \tilde{U} \oplus S^{(0)} | S^{(0)}, σ = 0) \\ = & α I (\tilde{U}; \tilde{U} \oplus T | A S^{(A)}, σ = 1) + (1 - α) H (A, \tilde{U}) \\ = & α H (\tilde{U}) + (1 - α) H (A, \tilde{U}) \\ = & α + (1 - α) 2 \\ = & 2 - α . \end{matrix}

(62)

In view of (61) and (62), any rate R satisfying

R \leq min {(α + 3) / 2, 2 - α}

(63)

is achievable. Choosing

α = 1 / 3

(which maximizes the RHS of (63)), demonstrates the achievability of

R = 5 / 3

(64)

which exceeds

log 3

.

4. Proof of Theorem 1

4.1. Direct Part

Pick a distribution as in (11), where

P_{T | S V}

and

P_{X | U V T}

are 0–1 laws, so

\begin{matrix} x & = & f (u, v, t) \end{matrix}

(65)

\begin{matrix} t & = & h (s, v) \end{matrix}

(66)

for some deterministic functions f and h. Extend these functions to act on n-tuples componentwise so that if

s, v

are n-tuples in

S^{n}

and

V^{n}

, then

t = h (s, v)

indicates that t is an n-tuple in

T^{n}

whose i-th component

t_{i}

equals

h (s_{i}, v_{i})

, where

s_{i}

and

v_{i}

are the corresponding components of s and v. Likewise, we write

x = f (u, v, t)

.

To prove achievability, we propose a block Markov coding scheme with the receiver performing backward decoding. Although only the receiver is required to decode the message, in our scheme, the helper does too (but not with backward decoding, which would violate causality).

The transmission comprises Bn-length sub-blocks, for a total of

B n

channel uses. The transmitted message m is represented by

B - 1

sub-messages

m_{1}, \dots, m_{B - 1}

, with each of the sub-messages taking values in the set

M \overset{Δ}{=} {1, 2, \dots, 2^{n R}}

. The overall transmission rate is thus

R (B - 1) / B

, which can be made arbitrarily close to R by choosing B very large. The

B - 1

sub-messages are transmitted in the first

B - 1

sub-blocks, with

m_{b}

transmitted in sub-block b (for

b \in [1 : B - 1]

). Hereafter, we use

s^{(b)}

to denote the state n-tuple affecting the channel in sub-block b and use

s_{i}^{(b)}

to denote its i-component (with

i \in [1 : n]

). Similar notation holds for

x^{(b)}

,

y^{(b)}

, etc.

We begin with an overview of the scheme, where we focus on the transmission in sub-blocks 2 through

B - 1

: the first and last sub-blocks must account for some edge effects that we shall discuss later. Let b be in this range. The coding we use in sub-block b is superposition coding with the cloud center determined by

m_{b - 1}

and the satellite by

m_{b}

.

Unlike the receiver, the helper, which must be causal, cannot employ backward decoding: it decodes each sub-message at the end of the sub-block in which it is transmitted. Consequently, when sub-block b begins, it already has a reliable guess

{\hat{m}}_{b - 1}

of

m_{b - 1}

(based on the previous channel inputs

x^{(b - 1)}

it cribbed). The encoder, of course, knows

m_{b - 1}

, so the two can agree on the cloud center

v^{(b)} (m_{b - 1})

indexed by

m_{b - 1}

. (We ignore for now the fact that

{\hat{m}}_{b - 1}

may, with small probability, differ from

m_{b - 1}

.) The satellite is computed by the encoder as

u^{(b)} (m_{b} | m_{b - 1})

; it is unknown to the helper. The helper produces the sub-block b assistance

t^{(b)}

based on the state sequence and the cloud center

t^{(b)} = h (s^{(b)}, v^{(b)} (m_{b - 1})) .

(67)

(Since

h (\cdot, \cdot)

acts componentwise, this help is causal with the i-th component of

t^{(b)}

being a function of the corresponding component

s_{i}^{(b)}

of the state sequence and

v^{(b)} (m_{b - 1})

; it does not require knowledge of future states.)

For its part, the encoder produces the n-tuple

x^{(b)} = f (u^{(b)} (m_{b} | m_{b - 1}), v^{(b)} (m_{b - 1}), t^{(b)})

(68)

with causality preserved because

u^{(b)} (m_{b} | m_{b - 1})

and

v^{(b)} (m_{b - 1})

can be computed from

m_{b - 1}

and

m_{b}

ahead of time, and because t is presented to the encoder causally and

f (\cdot)

operates componentwise.

As to the first and last sub-blocks: In the first, we set

m_{0}

as constant (e.g.,

m_{0} = 1

), so we have only one cloud center. In sub-block B, we send no fresh information, so each cloud center has only one satellite.

We now proceed to a more formal exposition. For this, we will need some notation. Given a joint distribution

P_{X Y Z}

, we denote by

T_{X Y}

the set of all jointly typical sequences

(x, y)

, where the length n is understood from the context, and we adopt the

δ

-convention of [8]. Similarly, given a sequence z,

T_{X Y Z} (z)

stands for the set of all pairs

(x, y)

that are jointly typical with the given sequence z.

To describe the first and last sub-blocks, we define

m_{0} = 1

and

m_{B} = 1

, respectively. The proof of the direct part is based on random coding and joint typicality decoding.

4.1.1. Code Construction

We construct B codebooks

{C_{b}}

,

b \in [1 : B]

, each of length n. Each codebook

C_{b}

is generated randomly and independently of the other codebooks as follows:

For every $b \in [1 : B]$ , generate $2^{n R}$ length-n cloud centers ${v^{(b)} (j)}$ , $j \in M$ independently, each with IID $\sim P_{V}$ components.
For every $b \in [1 : B]$ and $j \in M$ , generate $2^{n R}$ length-n satellites ${u^{(b)} (m | j)}$ , $m \in M$ conditionally independently given $v^{(b)} (j)$ , each according to

$\prod_{i = 1}^{n} P_{U | V} (\cdot | v_{i}^{(b)} (j)) .$

(69)

The codebook

C_{b}

is the collection

\{v^{(b)} (j), u^{(b)} (m | j)\}, (j, m) \in M \times M .

(70)

Reveal the codebooks to the encoder, decoder, and helper.

4.1.2. Operation of the code

We first describe the operation of the helper and encoder in the first sub-block.

Helper. In the first sub-block,

b = 1

, the helper produces

t^{(1)} = (t_{1}^{(1)}, t_{2}^{(1)}, \dots, t_{n}^{(1)})

(71)

where

t_{i}^{(1)} = h (s_{i}^{(1)}, v_{i}^{(1)} (m_{0})), 1 \leq i \leq n .

(72)

Note that

t^{(1)}

is causal in

s^{(1)}

.

Encoder. Set

u^{(1)} = u^{(1)} (m_{1} | m_{0})

and

v^{(1)} = v^{(1)} (m_{0})

. The input to the channel is

x^{(1)} = (x_{1}^{(1)}, x_{2}^{(1)}, \dots, x_{n}^{(1)})

(73)

where

\begin{matrix} x_{i}^{(1)} & = f (u_{i}^{(1)} (m_{1} | m_{0}), v_{i}^{(1)} (m_{0}), t_{i}^{(1)} (s_{i}^{(1)}, v_{i}^{(1)} (m_{0}))) \\ = f (u_{i}^{(1)}, v_{i}^{(1)}, t_{i}^{(1)}), 1 \leq i \leq n . \end{matrix}

(74)

Note that

x^{(1)}

is causal in

t^{(1)}

.

Helper at the end of the sub-block. Thanks to its cribbing, at the end of sub-block 1, the helper is cognizant of

x^{(1)}

. In addition, it knows

v^{(1)}

(since it is determined by

m_{0}

, which was set a priori) and

t^{(1)}

(since it was produced by itself). The helper now decodes the message

m_{1}

by looking for an index

j \in M

such that

(u^{(1)} (j | m_{0}), x^{(1)}) \in T_{U X V T} (v^{(1)}, t^{(1)}) .

(75)

If such an index j exists and is unique, the helper sets

{\hat{m}}_{1} = j

. Otherwise, an error is declared. By standard results, the probability of error is vanishingly small provided that

R < I (U; X | V T) .

(76)

Denote by

{\hat{m}}_{1}

the message decoded by the helper at the end of sub-block 1. We proceed to describe the operation of the helper and encoder in sub-block b, when

2 \leq b \leq B - 1

.

Helper,

2 \leq b \leq B - 1

. Denote by

{\hat{m}}_{b - 1}

the message decoded by the helper at the end of sub-block

(b - 1)

. In sub-block b, the helper produces

t^{(b)} = (t_{1}^{(b)}, t_{2}^{(b)}, \dots, t_{n}^{(b)})

(77)

where

t_{i}^{(b)} = h (s_{i}^{(b)}, v_{i}^{(b)} ({\hat{m}}_{b - 1})), 1 \leq i \leq n .

(78)

Encoder,

2 \leq b \leq B - 1

. Set

u^{(b)} = u^{(b)} (m_{b} | m_{b - 1})

and

v^{(b)} = v^{(b)} ({\hat{m}}_{b - 1})

. The input to the channel is

x^{(b)} = (x_{1}^{(b)}, x_{2}^{(b)}, \dots, x_{n}^{(b)})

(79)

where

\begin{matrix} x_{i}^{(b)} & = f (u_{i}^{(b)} (m_{b} | m_{b - 1}), v_{i}^{(b)} (m_{b - 1}), t_{i}^{(b)} (s_{i}^{(b)}, v_{i}^{(b)} ({\hat{m}}_{b - 1}))) \\ = f (u_{i}^{(b)}, v_{i}^{(b)}, t_{i}^{(b)}), 1 \leq i \leq n . \end{matrix}

(80)

Note that

t^{(b)}

and

x^{(b)}

are causal in

s^{(b)}

and

t^{(b)}

, respectively.

Helper at the end of the sub-block,

2 \leq b \leq B - 1

. At the end of sub-block b the helper has

x^{(b)}

at hand. In addition, it has

v^{(b)} ({\hat{m}}_{b - 1})

(since

{\hat{m}}_{b - 1}

was decoded at the end of the previous sub-block) and

t^{(b)}

(since it was produced by itself). The helper now decodes the message

m_{b}

. Assuming that

{\hat{m}}_{b - 1}

was decoded correctly, this can be done with a low probability of error if (37) is satisfied.

We proceed to the last sub-block, where no fresh information is sent. Here

m_{B} = 1

, and the operations of the helper and encoder proceed exactly as in (77)–(80), with

b = B

. Note that in sub-block B, the helper need not decode

m_{B}

since it is set a priori and known to all.

4.1.3. Decoding

At the destination, we employ backward decoding. Starting at sub-block B with

m_{B} = 1

, the decoder looks for an index

j \in M

such that

(u^{(B)} (1 | j), v^{(B)} (j), y^{(B)}) \in T_{U V Y} .

(81)

If such an index exists and is unique, the decoder sets

{\hat{\hat{m}}}_{B - 1} = j

. Otherwise, an error is declared. By standard result, the decoding is correct with probability approaching 1 provided

R < I (U V; Y) .

(82)

In the next (backward) decoding sub-blocks, the decoding proceeds as in (81), with the exception that the estimate

{\hat{\hat{m}}}_{b}

replaces the default value

m_{B} = 1

in (81). Thus, in sub-block

B - 1

, the decoder has at hand the estimate

{\hat{\hat{m}}}_{B - 1}

, and the channel output

y^{(B - 1)}

. It looks for an index j such that

(u^{(B - 1)} ({\hat{\hat{m}}}_{B - 1} | j), v^{(B - 1)} (j), y^{(B - 1)}) \in T_{U V Y} .

(83)

Similarly, for

2 \leq b \leq B - 1

, the decoder looks for an index j such that

(u^{(b)} ({\hat{\hat{m}}}_{b} | j), v^{(b)} (j), y^{(b)}) \in T_{U V Y} .

(84)

If such an index j exists and is unique, the decoder sets

{\hat{\hat{m}}}_{b - 1} = j

. Otherwise, an error is declared. Assuming that

m_{b}

was decoded correctly in the previous decoding stage, i.e.,

{\hat{\hat{m}}}_{b} = m_{b}

, the decoding of

m_{b - 1}

in sub-block b is correct with probability close to 1 provided that (82) holds. Note that

m_{1}

is decoded in sub-block

b = 2

, that is,

y^{(1)}

is not used at the destination. However, the transmission in sub-block 1 is not superfluous, as it is used by the helper to decode

m_{1}

at the end of the first sub-block. Since (76) and (82) are the two terms in (10), this concludes the proof of the direct part.

4.2. Converse Part

Fix

| T |

, and consider

(n, 2^{n R}, | T |, {\tilde{ϵ}}_{n})

-codes with

{\tilde{ϵ}}_{n} ↓ 0

. For each n, feed a random message M that is drawn equiprobably from the set

{1, 2, \dots, 2^{n R}}

to the encoder. By the channel model,

M ⊸ – (X^{n} S^{n}) ⊸ – Y^{n} .

(85)

Fano’s inequality and the fact that

{\tilde{ϵ}}_{n} ↓ 0

imply the existence of a sequence

ϵ_{n} ↓ 0

for which

\begin{matrix} n (R - ϵ_{n}) & \leq & I (M; Y^{n}) \\ \overset{(a)}{\leq} & I (M; X^{n} S^{n}) \\ = & I (M; X^{n} | S^{n}) \\ = & \sum_{i = 1}^{n} I (M; X_{i} | S^{n} X^{i - 1}) \\ \overset{(b)}{=} & \sum_{i = 1}^{n} I (M; X_{i} | S^{n} X^{i - 1} T_{i}) \\ \leq & \sum_{i = 1}^{n} I (M S_{i}^{n}; X_{i} | S^{i - 1} X^{i - 1} T_{i}) \\ \overset{(c)}{=} & \sum_{i = 1}^{n} I (M; X_{i} | S^{i - 1} X^{i - 1} T_{i}) \\ \leq & \sum_{i = 1}^{n} I (M Y^{i - 1}; X_{i} | S^{i - 1} X^{i - 1} T_{i}) \end{matrix}

(86)

where

(a)

follows from (85);

(b)

holds because

T_{i}

is a function of

X^{i - 1} S^{i}

(8); and

(c)

holds because

X_{i}

is a function of

M T^{i}

and hence of

M S^{i - 1} X^{i - 1} T_{i}

(so

I (S_{i}^{n}; X_{i} | M S^{i - 1} X^{i - 1} T_{i})

must be zero).

We proceed to derive the second bound. Starting again with Fano’s inequality,

\begin{matrix} n (R - ϵ_{n}) & \leq & I (M; Y^{n}) \\ = & \sum_{u = 1}^{n} I (M; Y_{i} | Y^{i - 1}) \\ \leq & \sum_{u = 1}^{n} I (M Y^{i - 1}; Y_{i}) . \end{matrix}

(87)

Defining

\begin{matrix} U_{i} & = & M Y^{i - 1} \end{matrix}

(88)

\begin{matrix} V_{i} & = & S^{i - 1} X^{i - 1} \end{matrix}

(89)

we can rewrite (86) and (87) as

\begin{matrix} R - ϵ_{n} & \leq & \sum_{i = 1}^{n} I (U_{i}; X_{i} | V_{i} T_{i}) \end{matrix}

(90)

\begin{matrix} R - ϵ_{n} & \leq & \sum_{i = 1}^{n} I (U_{i}; Y_{i}) \end{matrix}

(91)

Moreover, with

U_{i}

and

V_{i}

defined as above,

U_{i} V_{i}

and

S_{i}

are independent

(U_{i} V_{i}) ⊥ ⊥ S_{i}

(92)

and

\begin{matrix} T_{i} & = & h_{i} (S_{i}, V_{i}) \end{matrix}

(93)

\begin{matrix} X_{i} & = & f_{i} (U_{i}, V_{i}, T_{i}) \end{matrix}

(94)

where

h_{i}

and

f_{i}

are (blocklength dependent) deterministic functions. Indeed,

X_{i}

can be determined from

(U_{i}, V_{i}, T_{i})

because

U_{i}

determines the message M and

V_{i}

determined

T^{i - 1}

, so

(U_{i}, V_{i}, T_{i})

determines

(M, T^{i})

from which

X_{i}

can be computed using (5).

We next do away with the sums by conditioning on a time-sharing random variable. Let Q be a random variable uniformly distributed over

{1, 2, \dots, n}

independently of the channel and the state. Using Q, we can express the bounds (90), (91) as

\begin{matrix} R - ϵ_{n} & \leq & I (U_{Q}; X_{Q} | V_{Q} T_{Q} Q) \\ = & I (U_{Q} Q; X_{Q} | V_{Q} T_{Q} Q) \\ = & I (\tilde{U}; X | V T) \\ = & I (\tilde{U} V; X | V T) \\ = & I (U; X | V T) \end{matrix}

(95)

\begin{matrix} R - ϵ_{n} & \leq & I (U_{Q}; Y_{Q} | Q) \\ \leq & I (U_{Q} Q; Y_{Q}) \\ = & I (\tilde{U}; Y) \\ \leq & I (\tilde{U} V; Y) \\ = & I (U; Y) \end{matrix}

(96)

where we define

X = X_{Q}, Y = Y_{Q}, T = T_{Q}, S = S_{Q}

(97)

and the auxiliaries

\begin{matrix} V & = & (V_{Q} Q) \end{matrix}

(98)

\begin{matrix} \tilde{U} & = & (U_{Q} Q) \end{matrix}

(99)

\begin{matrix} U & = & (\tilde{U}, V) = (U_{Q} V_{Q} Q) . \end{matrix}

(100)

Note that the conditional law of Y given

(X T S)

is that of the channel, namely,

W_{Y | X S}

and that S is distributed like the channel state. Moreover,

V ⊸ – U ⊸ – (X T S) ⊸ – Y .

(101)

Since U and V contain the time sharing random variable Q, (93) and (94) imply that,

\begin{matrix} T & = & h (S, V) \end{matrix}

(102)

\begin{matrix} X & = & \tilde{f} (\tilde{U}, V, T) = f (U, T) \end{matrix}

(103)

for some deterministic functions h and f. Therefore, the joint distribution under which the RHS of (95) and the RHS of (96) are computed is of the form

P_{S \tilde{U} V T X Y} = P_{S} P_{\tilde{U} V} P_{T | S V} P_{X | \tilde{U} V T} W_{Y | X S}

(104)

where

P_{T | S V}

and

P_{X | \tilde{U} V T}

are zero-one laws, or

P_{S U V T X Y} = P_{S} P_{U} P_{V | U} P_{T | S V} P_{X | U T} W_{Y | X S}

(105)

where

P_{T | S V}

,

P_{X | U T}

, and

P_{V | U}

are zero-one laws.

The form (105) and the inequalities (95), (96) establish the converse.

4.3. Cardinality Bounds

We next proceed to bound the alphabet sizes of

U, V

in two steps. In the first, we do so while relaxing the zero-one-law requirements. In the second, we enlarge the alphabet to fulfill said requirements. Let

L = | X | | T | | S | + 1 .

(106)

Fix a conditional distribution

p (x, t, s | u)

, and define the L functions of

p (u | v)

:

\begin{matrix} p (x, t, s | v) & = & \sum_{u} p (x, t, s | u) p (u | v) (L - 2 functions) \end{matrix}

(107)

\begin{matrix} I (U; X | T, V = v) \end{matrix}

\begin{matrix} I (U; Y | V = v) \end{matrix}

(with the

L - 2

functions corresponding to all by one of the tuples

(x, t, s)

). By the support lemma [5,8], there exists a random variable

V^{'}

with alphabet

| V^{'} | \leq L

, such that

P_{X T S}

,

I (U; X | T V)

, and

I (U; Y)

are preserved. Denote by

U^{'}

the resulting random variable U, i.e.,

P_{U^{'}} (u^{'}) = \sum_{v^{'}} p (u^{'} | v) P_{V^{'}} (v^{'}) .

(108)

We next bound the alphabet size of

U^{'}

. For each

v^{'} \in V^{'}

, we define the L functions

\begin{matrix} p (x, t, s | v^{'}, u^{'}) (L - 2 functions) \end{matrix}

(109)

\begin{matrix} I (U^{'}; X | T, V^{'}) \end{matrix}

(110)

\begin{matrix} I (U^{'}; Y | V^{'}) . \end{matrix}

(111)

Applying again the support lemma, for every

v^{'}

there exists a random variable

U^{″}

with alphabet

| U^{″} | \leq L

such that (109)–(111) are preserved. If we multiply

U^{″}

| V^{'} |

times we can, with proper labeling of the elements of

U^{″}

, retain a Markov structure like (101). Now the alphabet sizes are fixed and independent of n. Thus, substituting

V^{'}, U^{″}

in (95), (96) and taking the limit

n \to \infty

we have the upper bound

\begin{matrix} R & \leq & I (U^{″}; X | V^{'} T) \end{matrix}

(112)

\begin{matrix} R & \leq & I (U^{″}; Y) \end{matrix}

(113)

where

\begin{matrix} P_{S U^{″} V^{'} T X Y} & = & P_{S} P_{U^{″} V^{'}} P_{T | S V^{'}} P_{X | U^{″} V^{'} T} W_{Y | X S} \end{matrix}

(114)

\begin{matrix} | V^{'} | \leq L, | U^{″} | \leq L^{2} \end{matrix}

(115)

and the following Markov chain holds:

V^{'} ⊸ – U^{″} ⊸ – (X T S) ⊸ – Y .

(116)

Note, however, that

P_{T | S V^{'}}

and

P_{X | U^{″} V^{'} T}

are no longer zero-one laws. We remedy this using the Functional Representation lemma (FRL) [5] (at the cost of increasing the alphabet sizes): a standard convexity argument will not do because—although

I (U; X | V T)

is a convex function of

P_{T | S V}

and also a convex function of

P_{X | U V T}

and likewise

I (U; Y)

—the minimum of two convex functions need not be convex.

The Functional Representation lemma implies that—without altering the conditional law of T given

S V^{'}

nor of X given

U^{″} V^{'} T

—the random variables T and X can be represented as

\begin{matrix} T & = & {\tilde{g}}_{1} (S V^{'}, Z_{1}) \end{matrix}

(117)

\begin{matrix} X & = & {\tilde{g}}_{2} (U^{″} V^{'} T, Z_{2}) \end{matrix}

(118)

where

{\tilde{g}}_{1}

,

{\tilde{g}}_{2}

are deterministic functions;

Z_{1}

and

Z_{2}

are independent random variables that are independent of

(S V^{'}, U^{″} V^{'} T)

; and their alphabets satisfy

\begin{matrix} | Z_{1} | & \leq & | S | | V^{'} | (| T | - 1) + 1 \end{matrix}

(119)

\begin{matrix} | Z_{2} | & \leq & | U^{″} | | V^{'} | | T | (| X | - 1) + 1 . \end{matrix}

(120)

At the expense of increased alphabets sizes, we now append

Z_{1}

to

V^{'}

and

Z_{2}

to

U^{″}

to form the new auxiliary random variables

\begin{matrix} \hat{V} & = & (V^{'} Z_{1}) \end{matrix}

(121)

\begin{matrix} \hat{U} & = & (U^{″} Z_{2}) \end{matrix}

(122)

with alphabet sizes

\begin{matrix} | \hat{V} | & \leq & | S | | V^{'} |^{2} (| T | - 1) + | V^{'} | \end{matrix}

(123)

\begin{matrix} | \hat{U} | & \leq & | U^{″} |^{2} | V^{'} | | T | (| X | - 1) + | U^{″} | . \end{matrix}

(124)

We set

P_{X | \hat{U} \hat{V} T} (x | u^{″}, z_{2}, v^{'}, z_{1}, t) = 1 \{x = {\tilde{g}}_{2} (u^{″}, z_{2}, v^{'}, t)\}

(125)

(irrespective of

z_{1}

) and

P_{T | \hat{V} S} (t | v^{'}, z_{1}, t) = 1 \{t = g_{1} (s, v^{'}, z_{1})\}

(126)

where

1 {statement}

equals 1 if the statement is true and equals 0 otherwise.

As we next argue, these auxiliary random variables and the above zero–one laws do not decrease the relevant mutual information expressions.

Beginning with

I (\hat{U}; X | \hat{V} T)

, we note that

H (X | \hat{V} T) = H (X | V^{'} T)

because we have preserved the joint law of

V^{'} T

and because

Z_{1}

does not influence the mapping (54) to X. Since

H (X | U^{″} Z_{2} V T) \leq H (X | H (X | U^{″} V T)

, this establishes that

I (\hat{U}; X | \hat{V} T) \geq I (U^{″}; X | V^{'} T) .

(127)

Likewise, our new auxiliary random variables and zero–one laws do not alter

H (Y)

, but

H (Y | \hat{U}) \leq H (Y | U^{″})

, so

I (\hat{U}; Y) \geq I (U^{″}; Y) .

(128)

This completes the proof of Theorem 1.

Author Contributions

Writing—original draft preparation, A.L. and Y.S.; writing—review and editing, A.L. and Y.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by the Swiss National Science Foundation (SNSF) under Grant 200021-215090.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

IID	Independent and Identically Distributed
FRL	Functional Representation Lemma
PMF	Probability Mass Function
RHS	Right-Hand Side

References

Shannon, C.E. Channels with side Information at the transmitter. IBM J. Res. Dev. 1958, 2, 289–293. [Google Scholar] [CrossRef]
Gel’fand, S.I.; Pinsker, M.S. Coding for channel with random parameters. Probl. Control. Inform. Theory 1980, 9, 19–31. [Google Scholar]
Keshet, G.; Steinberg, Y.; Merhav, N. Channel Coding in the Presence of Side Information. Found. Trends Commun. Inf. Theory 2008, 4, 1567–2190. [Google Scholar] [CrossRef]
Lapidoth, A.; Wang, L. State-Dependent DMC with a Causal Helper. IEEE Trans. Inf. Theory 2024, 70, 3162–3174. [Google Scholar] [CrossRef]
El Gamal, A.; Kim, Y. Network Information Theory; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
Lapidoth, A.; Wang, L.; Yan, Y. State-Dependent Channels with a Message-Cognizant Helper. arXiv 2023, arXiv:2311.082200. [Google Scholar] [CrossRef]
Lapidoth, A.; Wang, L.; Yan, Y. Message-Cognizant Assistance and Feedback for the Gaussian Channel. arXiv 2023, arXiv:2310.15768. [Google Scholar] [CrossRef]
Csiszár, I.; Körner, J. Information Theory: Coding Theorems for Discrete Memoryless Systems, 2nd ed.; Cambridge University Press: London, UK, 2011. [Google Scholar]

Figure 1. Communication over a state-dependent channel with a rate-limited causal cribbing helper.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lapidoth, A.; Steinberg, Y. The State-Dependent Channel with a Rate-Limited Cribbing Helper. Entropy 2024, 26, 570. https://doi.org/10.3390/e26070570

AMA Style

Lapidoth A, Steinberg Y. The State-Dependent Channel with a Rate-Limited Cribbing Helper. Entropy. 2024; 26(7):570. https://doi.org/10.3390/e26070570

Chicago/Turabian Style

Lapidoth, Amos, and Yossef Steinberg. 2024. "The State-Dependent Channel with a Rate-Limited Cribbing Helper" Entropy 26, no. 7: 570. https://doi.org/10.3390/e26070570

APA Style

Lapidoth, A., & Steinberg, Y. (2024). The State-Dependent Channel with a Rate-Limited Cribbing Helper. Entropy, 26(7), 570. https://doi.org/10.3390/e26070570

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The State-Dependent Channel with a Rate-Limited Cribbing Helper

Abstract

1. Introduction

2. Problem Statement and Main Result

3. Example

4. Proof of Theorem 1

4.1. Direct Part

4.1.1. Code Construction

4.1.2. Operation of the code

4.1.3. Decoding

4.2. Converse Part

4.3. Cardinality Bounds

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI