Next Article in Journal
Machine Learning Classifier-Based Metrics Can Evaluate the Efficiency of Separation Systems
Next Article in Special Issue
Contrast Information Dynamics: A Novel Information Measure for Cognitive Modelling
Previous Article in Journal
Study on the Stability of Complex Networks in the Stock Markets of Key Industries in China
Previous Article in Special Issue
Refinements and Extensions of Ziv’s Model of Perfect Secrecy for Individual Sequences
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The State-Dependent Channel with a Rate-Limited Cribbing Helper

1
Signal and Information Processing Laboratory, ETH Zurich, 8092 Zurich, Switzerland
2
Department of Electrical and Computer Engineering, Technion—Israel Institute of Technology, Haifa 3200003, Israel
*
Author to whom correspondence should be addressed.
Entropy 2024, 26(7), 570; https://doi.org/10.3390/e26070570
Submission received: 29 April 2024 / Revised: 21 June 2024 / Accepted: 25 June 2024 / Published: 30 June 2024
(This article belongs to the Collection Feature Papers in Information Theory)

Abstract

:
The capacity of a memoryless state-dependent channel is derived for a setting in which the encoder is provided with rate-limited assistance from a cribbing helper that observes the state sequence causally and the past channel inputs strictly causally. Said cribbing may increase capacity but not to the level achievable by a message-cognizant helper.

1. Introduction

An encoder for a state-dependent channel is said to have causal state information if the channel input X i it produces at time i may depend, not only on the message m it wishes to transmit, but also on the present and past channel states S i and S i 1 (where S i 1 stands for the states S 1 , , S i 1 ). Its state information is noncausal if, in addition to depending on the message, its time i input may depend on all the channel states: past S i 1 , present S i , and future S i + 1 n (where n denotes the blocklength, and S i + 1 n stands for S i + 1 , , S n ).
The former case was studied by Shannon [1], who showed that capacity can be achieved by what-we-now-call Shannon strategies. The latter was studied by Gel’fand and Pinsker [2], who showed that the capacity, in this case, can be achieved using binning [3].
As of late, there has been renewed interest in the causal case, but when the state information must be quantized before it is provided to the encoder [4]. While still causal, the encoder is not provided now with the state sequence { S i } directly, but rather with some “assistance sequence” { T i } describing it. Its time i output X i is now determined by the message m and by the present and past assistances T i . The assistance sequence is produced by a helper, which observes the state sequence causally and produces the time i assistance T i based on the present and past states S i subject to the additional constraint that T i take values in a given finite set T whose cardinality is presumably smaller than that of the state alphabet S . (If the cardinality of T is one, the problem reduces to the case of no assistance; if it exceeds or equals the cardinality of S , the problem reduces to Shannon’s original problem because, in this case, T i can describe S i unambiguously.) We refer to the base-2 logarithm of the cardinality of T as the “help rate” and denote it R h :
R h = log 2 | T | .
Three observations in [4] inspired the present paper:
  • Symbol-by-symbol quantizers are suboptimal: restricting T i to be a function of S i may reduce capacity.
  • Allowing T i to depend not only on S i but also on the message m may increase capacity.
  • If T i is allowed to depend on S i , as well as on the transmitted message, then message-cognizant symbol-by-symbol helpers achieve capacity: there is no loss in capacity in restricting T i to be a function of ( m , S i ) .
Sandwiched between the message-oblivious helper and the message-cognizant helper is the cribbing helper whose time-i assistance T i depends on S i and on the past symbols produced by the encoder
T i = T i S i , X i 1 .
Such a helper is depicted in Figure 1.
Since one can reproduce the channel inputs from the states and message, the cribbing helper cannot outperform the message-cognizant helper. And since the helper can ignore the past channel inputs, the cribbing capacity must be at least as high as that of the message-oblivious helper.
Here, we shall characterize the capacity with a cribbing helper and show that the above inequalities can be strict: the message-cognizant helper may outperform the cribbing helper, and the latter may outperform the message-oblivious helper (presumably because, thanks to the cribbing, it can learn something about the message). We further show that the capacity of the cribbing helper can be achieved using a block Markov coding scheme with backward decoding.
It is important to note that allowing the helper to crib does not render it a relay [5] because the helper does not communicate with the receiver. Therefore, our results do not have any bearing on the Relay problem.
It is also noteworthy that message-cognizant helpers are also advantageous in the noncausal case. For such helpers, capacity was recently computed in [6,7]. Cribbing, however, is somewhat less natural in this setting.

2. Problem Statement and Main Result

We are given a state-dependent discrete memoryless channel W Y | X S of finite input, output, and state alphabets X , Y , and S . When its input is x X and its state is s S , the probability of its output being y Y is W Y | X S ( y | x , s ) . The states { S i } are drawn IID P S , where P S is some given probability mass function (PMF) on the state alphabet S . Also given is some finite set T that we call the description alphabet. We shall assume throughout that its cardinality is at least 2
| T | 2
because otherwise the helper cannot provide any assistance.
Given some blocklength n, a rate-R message set is a set M whose cardinality is 2 n R (where we ignore the fact that the latter need not be an integer). A blocklength-n encoder for our channel comprises n mappings
f i : M × T i X , i = 1 , , n
with the understanding that if the message to be transmitted is m M , and if the assistance sequence produced by the helper is t n T n , then the time i channel input produced by the encoder is
x i = f i ( m , t i )
which we also denote x i ( m , t i ) . Here, T i denotes the i-fold Cartesian product
T i = T × T × × T i times
and t j denotes t 1 , , t j . A blocklength-n cribbing helper comprises n mapping
h i : X i 1 × S i T , i = 1 , , n
with the understanding that—after observing the channel inputs x 1 , , x i 1 and the states s 1 , , s i —the helper produces the time i assistance
t i = h i x i 1 , s i
which we also denote t i x i 1 , s i .
Communication proceeds as follows: the helper produces the time-1 assistance t 1 that is given by h 1 ( s 1 ) , and the encoder then produces the first channel input x 1 = f 1 ( m , t 1 ) . The helper then produces the time-2 assistance t 2 that is given by h 2 ( x 1 , s 2 ) , and the encoder then produces the second channel input x 2 = f 2 ( m , t 2 ) , and so on.
The decoder is cognizant neither of the state sequence s n nor of the assistance sequence t n : it is thus a mapping of the form
ϕ : Y n M
with the understanding that, upon observing the output sequence Y n , the decoder guesses that the transmitted message is ϕ ( Y n ) .
Let P e = Pr ϕ ( Y n ) M denote the probability of a decoding error when the message M is drawn uniformly from M . If P e < ϵ , then we say that the coding scheme is of parameters ( n , 2 n R , | T | , ϵ ) or that it is a ( n , 2 n R , | T | , ϵ ) -scheme. A rate R is said to be achievable if, for every ϵ > 0 , there exist, for all sufficiently large n, schemes as above with P e < ϵ . The capacity of the channel is defined as the supremum of all achievable rates R and is denoted C.
Define
C ( I ) = max min I ( U V ; Y ) , I ( U ; X | V T )
where the maximum is over all finite sets U and V and over all joint distributions of the form
P S P U V P T | V S P X | U V T W Y | X S
with T taking values in the assistance alphabet T . (When writing Markov conditions and information theoretic quantities such as entropy and mutual information, we do not separate the variables with commas. We thus write H ( X Y ) , and not H ( X , Y ) , for the joint entropy of X and Y. We do, however, introduce commas when this convention can lead to ambiguities; see, for example, (62).)
Our main result is stated next.
Theorem 1.
The capacity C of the memoryless state-dependent channel with rate-limited cribbing helper equals C ( I ) :
C = C ( I ) .
Moreover, the maximum in (10) can be achieved when:
1. 
P T | V S and P X | U V T are both zero-one laws.
2. 
The alphabet sizes of U and V are restricted to
| V | L 2 | S | | T | 1 + L | U | L 3 | T | | X | 1 + L
where L = | X | | T | | S | + 1 .
3. 
The chain
V   U   ( X T S )   Y
is a Markov chain.
(Henceforth, we use A   B   C to indicate that A and C are conditionally independent given B and, more generally, A   B   C   D to indicate that A , B , C , D forms a Markov chain.)
The proof is given in Section 4.
Remark 1.
The assumption of (3) notwithstanding, the theorem also holds in case | T | = 1 , which corresponds to no help.
Proof of Remark 1. 
When T is deterministic, P X | U V T equals P X | U V , and the data processing inequality implies that I ( U V ; Y ) I ( X ; Y ) , thus establishing that, in this case, C ( I ) is upper bounded by the capacity without state information, i.e.,
C ( I ) max P X I ( X ; Y ) .
Equality can be established by choosing V as null and U as X, a choice that results in I ( U V ; Y ) being I ( X ; Y ) and in I ( U ; X | V T ) being H ( X ) . □
Remark 2.
As is to be expected, when | T | | S | , i.e., when T can describe S precisely, C ( I ) reduces to the Shannon strategies capacity C ( S h ) of the channel with perfect causal state information at the transmitter:
C ( Sh ) = max I ( U ; Y )
where the maximization is over all the joint PMFs of the form P S P U P X | U S W Y | X S (and where, without altering the result of the maximization, we can restrict P X | U S to be zero–one valued).
Proof of Remark 2. 
We first establish that C ( I ) is greater-equal C ( Sh ) . To that end, we set T to equal S and V to be null and then argue that, with this choice, the minimum of the two terms in (10) is the first, i.e., I ( U V ; Y ) (which, because V is null, equals I ( U ; Y ) ). To that end, we calculate
(16) I ( U ; X | V T ) = I ( U ; X | V T ) (17)                                       = I ( U ; X | S ) (18)                                       = I ( U ; X S ) (19)                                       = I ( U ; Y )
where the first equality holds because V is null, the second because T equals S, the third because U is independent of S, and the final inequality follows from the Data Processing inequality.
It remains to prove that
C ( I ) C ( Sh )
which always holds. To simplify our analysis, we assume the Markov condition (13), and we then upper-bound I ( U V ; Y ) (which is an upper bound on the minimum in the definition (10) of C ( I ) ). Under this Markov condition, the maximum of I ( U V ; Y ) can be achieved with V null, which we proceed to assume. The joint PMF of the remaining variables is then of the form
P S P U P T | S P X | U T W Y | X S .
We will show that—for every fixed P U —to any choice of P T | S and P X | U T there corresponds a choice of P X | U S , which is feasible for the maximization defining C ( Sh ) in (15) and that thus proves that C ( Sh ) I ( U ; Y ) .
To this end, we begin by expressing the channel from U to Y using (21) as
(22) P Y | U = s P S P X | U S W Y | X S (23)                 = s P S t P T | U S P X | U S T W Y | X S (24)                 = s P S t P T | S P X | U T W Y | X S .
We then note that, for a fixed P U , the mutual information I ( U ; Y ) is thus determined by the | S | · | U | conditional PMFs of X given ( S , U ) = ( s , u )
t P T | U S ( t | u , s ) P X | U T ( x | u , t ) ( s , u ) S × U .
These conditional PMFs are feasible for the maximization defining C ( Sh ) in (15), thus demonstrating that C ( Sh ) I ( U ; Y ) . □

3. Example

We next present an example where the message-cognizant helper outperforms the cribbing helper and the latter outperforms the plain-vanilla causal helper. It is trivial to find cases where the three perform identically, e.g., when the state does not affect the channel. The example is borrowed from ([4], Example 7) (from which we also lift the notation).
The channel inputs, states, and outputs are binary tuples
X = S = Y = { 0 , 1 } × { 0 , 1 }
and are denoted ( A , B ) , S ( 0 ) , S ( 1 ) , and Y ( 0 ) , Y ( 1 ) respectively. The two components of the state are IID, each taking on the values 0 and 1 equiprobably. Given the state and input, the channel output is deterministically given by
Y = A , B S ( A ) .
The assistance is one-bit assistance, so T = { 0 , 1 } .
As shown in ([4], Claim 8), the capacity with a message-cognizant helper is 2 bits, and with a message-oblivious helper log 3 . Here, we show that the capacity with a cribbing helper is strictly smaller than 2 bits and strictly larger than log 3 . All logarithms in this section are base-2 logarithms, and all rates are in bits.
We begin by showing the former. Recall that if R is achievable, then it must satisfy the constraints
R I ( U V ; Y )
R I ( U ; X | V T ) .
Recall also the form of the joint PMF
P S P V P T | V S P U | V P X | U V T W Y | X S
and that we may assume that P X | U V T ( x | u , v , t ) is zero-one valued. Note that (29) implies
S T   V   U
and consequently
S   T V   U .
We will show that the above constraints cannot be both satisfied if R = 2 . To that end, we assume that
I ( U ; X | V T ) = 2
(it cannot be larger because X = 4 ) and prove that
I ( U V ; Y ) < 2 .
Since Y is of cardinality 4, it suffices to show that
H ( Y | U V ) > 0 .
In fact, it suffices to show that
H ( Y | U V T ) > 0
i.e., that there exist u , v , t of positive probability for which
H ( Y | U = u , V = v , T = t ) > 0 .
This is what we proceed to do. We first show the existence of v and t for which H ( S | V = v , T = t ) 1 . Once this is established, we proceed to pick u .
Since X = 4 , (32) implies that
P X | V = v , T = t is   uniform   ( v , t ) .
Fix any v (of positive probability). As we next argue, there must exist some t for which P S | V = v , T = t is not zero–one valued. Indeed, by (29), V S , so H ( S | V = v ) = H ( S ) = 2 and
(38) H ( S | T , V = v ) = H ( S | V = v ) I ( S ; T | V = v ) (39)                                                       = H ( S ) I ( S ; T | V = v ) (40)                                                       H ( S ) H ( T | V = v ) (41)                                                       2 log T (42)                                                       = 1
so there must exist some t for which
H ( S | V = v , T = t ) 1 .
We next choose u as follows. Conditional on V = v , T = t , the chance variable U has some PMF P U | V = v , T = t (equal to P U | V = v by (29)) under which X ( U , v , t ) is uniform; see (37). It follows that there exist u 0 and u 1 (both of positive conditional probability) such that
A ( u 0 , v , t ) = 0
A ( u 1 , v , t ) = 1
where we introduced the notation
X ( u , v , t ) = A ( u , v , t ) , B ( u , v , t ) .
Returning to (43), we note that it implies that
H S ( 0 ) | V = v , T = t > 0 or H S ( 1 ) | V = v , T = t > 0 .
In the former case, H ( Y | U = u 0 , V = v , T = t ) is positive, and in the latter, H ( Y | U = u 1 , V = v , T = t ) is positive. This establishes the existence of a triple ( u , v , t ) for which (36) holds, and thus concludes the proof that the capacity with a cribbing encoder is smaller than 2. We next show that it exceeds log 3 .
To that end, we consider choosing
U = ( A , U ˜ )
to be uniform over { 0 , 1 } × { 0 , 1 } , and we let σ be a Bernoulli– α random variable that is independent of U and of the channel, for some α [ 0 , 1 ] to be specified later. We further define
V ˜ = A if σ = 1 0 ( null ) if σ = 0
and
V = ( V ˜ , σ ) .
We choose the helper function h ( s , v ) —which can also be written as h ( s ( 0 ) , s ( 1 ) ) , ( v ˜ , σ ) —to equal s ( v ˜ ) , so
T = S ( V ˜ )
and
T = S ( A ) w . p . α S ( 0 ) w . p . 1 α .
Our encoder function f ( u , v , t ) ignores v and results in
X ( 0 ) = A , X ( 1 ) = U ˜ T
where X = ( X ( 0 ) , X ( 1 ) ) . That is,
f ( A , U ˜ ) , T = A , U ˜ T .
Note that with the variables defined in (49)–(53), the Markov relations in item 3 of Theorem 1 hold.
We now proceed to calculate the rate bounds. For the RHS of (27), we have
I ( U V ; Y ) = I ( U V ˜ σ ; Y ) I ( U V ˜ ; Y | σ ) = α I ( U V ˜ ; Y | σ = 1 ) + ( 1 α ) I ( U V ˜ ; Y | σ = 0 ) = α I ( A U ˜ ; Y | σ = 1 ) + ( 1 α ) I ( A U ˜ ; Y | σ = 0 )
where the last equality holds because if σ = 0 , then V ˜ is null.
We next evaluate each of the terms on the RHS of (55) separately. When σ = 1 ,
T = S ( A )
T = S ( A ) X ( 1 ) = U ˜ S ( A )
Y ( 1 ) = X ( 1 ) S ( A ) = U ˜ S ( A ) S ( A ) = U ˜
so
Y = ( Y ( 0 ) , Y ( 1 ) ) = ( A , U ˜ )
and
I ( A U ˜ ; Y | σ = 1 ) = H ( U | σ = 1 ) = H ( U ) = 2
where the second equality holds because σ is independent of U.
When σ = 0 ,
T = S ( 0 )
X = ( A , U ˜ S ( 0 ) )
Y = ( A , U ˜ S ( 0 ) S ( 1 ) )
so
I ( A U ˜ ; Y | σ = 0 ) = I ( A U ˜ ; Y ( 0 ) Y ( 1 ) | σ = 0 ) = I ( A U ˜ ; A , U ˜ S ( 0 ) S ( A ) ) = I ( A U ˜ ; A ) + I ( A U ˜ ; U ˜ S ( 0 ) S ( A ) | A ) = H ( A ) + 1 2 I ( U ˜ ; U ˜ S ( 0 ) S ( 0 ) | A = 0 ) + 1 2 I ( U ˜ ; U ˜ S ( 0 ) S ( 1 ) | A = 1 ) = H ( A ) + 1 2 H ( U ˜ ) + 0 = 3 2 .
From (58), (60), and (55), we obtain that the RHS of (27) satisfies
I ( U V ; Y ) 2 α + ( 1 α ) 3 2 = ( α + 3 ) / 2 .
Next, we evaluate the RHS of (28):
I ( U ; X | V T ) = I ( U ; X | V ˜ , σ , T ) = α I ( U ; X | V ˜ , σ = 1 , T ) + ( 1 α ) I ( U ; X | V ˜ , σ = 0 , T ) = α I ( A U ˜ ; X | A , S ( A ) , σ = 1 ) + ( 1 α ) I ( A U ˜ ; A , U ˜ S ( 0 ) | S ( 0 ) , σ = 0 ) = α I ( U ˜ ; A , U ˜ T | A S ( A ) , σ = 1 ) + ( 1 α ) I ( A U ˜ ; A , U ˜ S ( 0 ) | S ( 0 ) , σ = 0 ) = α I ( U ˜ ; U ˜ T | A S ( A ) , σ = 1 ) + ( 1 α ) H ( A , U ˜ ) = α H ( U ˜ ) + ( 1 α ) H ( A , U ˜ ) = α + ( 1 α ) 2 = 2 α .
In view of (61) and (62), any rate R satisfying
R min { ( α + 3 ) / 2 , 2 α }
is achievable. Choosing α = 1 / 3 (which maximizes the RHS of (63)), demonstrates the achievability of
R = 5 / 3
which exceeds log 3 .

4. Proof of Theorem 1

4.1. Direct Part

Pick a distribution as in (11), where P T | S V and P X | U V T are 0–1 laws, so
x = f ( u , v , t )
t = h ( s , v )
for some deterministic functions f and h. Extend these functions to act on n-tuples componentwise so that if s , v are n-tuples in S n and V n , then t = h ( s , v ) indicates that t is an n-tuple in T n whose i-th component t i equals h ( s i , v i ) , where s i and v i are the corresponding components of s and v. Likewise, we write x = f ( u , v , t ) .
To prove achievability, we propose a block Markov coding scheme with the receiver performing backward decoding. Although only the receiver is required to decode the message, in our scheme, the helper does too (but not with backward decoding, which would violate causality).
The transmission comprises Bn-length sub-blocks, for a total of B n channel uses. The transmitted message m is represented by B 1 sub-messages m 1 , , m B 1 , with each of the sub-messages taking values in the set M = Δ { 1 , 2 , , 2 n R } . The overall transmission rate is thus R ( B 1 ) / B , which can be made arbitrarily close to R by choosing B very large. The B 1 sub-messages are transmitted in the first B 1 sub-blocks, with m b transmitted in sub-block b (for b [ 1 : B 1 ] ). Hereafter, we use s ( b ) to denote the state n-tuple affecting the channel in sub-block b and use s i ( b ) to denote its i-component (with i [ 1 : n ] ). Similar notation holds for x ( b ) , y ( b ) , etc.
We begin with an overview of the scheme, where we focus on the transmission in sub-blocks 2 through B 1 : the first and last sub-blocks must account for some edge effects that we shall discuss later. Let b be in this range. The coding we use in sub-block b is superposition coding with the cloud center determined by m b 1 and the satellite by m b .
Unlike the receiver, the helper, which must be causal, cannot employ backward decoding: it decodes each sub-message at the end of the sub-block in which it is transmitted. Consequently, when sub-block b begins, it already has a reliable guess m ^ b 1 of m b 1 (based on the previous channel inputs x ( b 1 ) it cribbed). The encoder, of course, knows m b 1 , so the two can agree on the cloud center v ( b ) ( m b 1 ) indexed by m b 1 . (We ignore for now the fact that m ^ b 1 may, with small probability, differ from m b 1 .) The satellite is computed by the encoder as u ( b ) ( m b | m b 1 ) ; it is unknown to the helper. The helper produces the sub-block b assistance t ( b ) based on the state sequence and the cloud center
t ( b ) = h s ( b ) , v ( b ) ( m b 1 ) .
(Since h ( · , · ) acts componentwise, this help is causal with the i-th component of t ( b ) being a function of the corresponding component s i ( b ) of the state sequence and v ( b ) ( m b 1 ) ; it does not require knowledge of future states.)
For its part, the encoder produces the n-tuple
x ( b ) = f u ( b ) ( m b | m b 1 ) , v ( b ) ( m b 1 ) , t ( b )
with causality preserved because u ( b ) ( m b | m b 1 ) and v ( b ) ( m b 1 ) can be computed from m b 1 and m b ahead of time, and because t is presented to the encoder causally and f ( · ) operates componentwise.
As to the first and last sub-blocks: In the first, we set m 0 as constant (e.g., m 0 = 1 ), so we have only one cloud center. In sub-block B, we send no fresh information, so each cloud center has only one satellite.
We now proceed to a more formal exposition. For this, we will need some notation. Given a joint distribution P X Y Z , we denote by T X Y the set of all jointly typical sequences ( x , y ) , where the length n is understood from the context, and we adopt the δ -convention of [8]. Similarly, given a sequence z, T X Y Z ( z ) stands for the set of all pairs ( x , y ) that are jointly typical with the given sequence z.
To describe the first and last sub-blocks, we define m 0 = 1 and m B = 1 , respectively. The proof of the direct part is based on random coding and joint typicality decoding.

4.1.1. Code Construction

We construct B codebooks { C b } , b [ 1 : B ] , each of length n. Each codebook C b is generated randomly and independently of the other codebooks as follows:
  • For every b [ 1 : B ] , generate 2 n R length-n cloud centers { v ( b ) ( j ) } , j M independently, each with IID P V components.
  • For every b [ 1 : B ] and j M , generate 2 n R length-n satellites { u ( b ) ( m | j ) } , m M conditionally independently given v ( b ) ( j ) , each according to
    i = 1 n P U | V · | v i ( b ) ( j ) .
The codebook C b is the collection
v ( b ) ( j ) , u ( b ) ( m | j ) , ( j , m ) M × M .
Reveal the codebooks to the encoder, decoder, and helper.

4.1.2. Operation of the code

We first describe the operation of the helper and encoder in the first sub-block.
Helper. In the first sub-block, b = 1 , the helper produces
t ( 1 ) = ( t 1 ( 1 ) , t 2 ( 1 ) , , t n ( 1 ) )
where
t i ( 1 ) = h ( s i ( 1 ) , v i ( 1 ) ( m 0 ) ) , 1 i n .
Note that t ( 1 ) is causal in s ( 1 ) .
Encoder. Set u ( 1 ) = u ( 1 ) ( m 1 | m 0 ) and v ( 1 ) = v ( 1 ) ( m 0 ) . The input to the channel is
x ( 1 ) = x 1 ( 1 ) , x 2 ( 1 ) , , x n ( 1 )
where
x i ( 1 ) = f u i ( 1 ) ( m 1 | m 0 ) , v i ( 1 ) ( m 0 ) , t i ( 1 ) s i ( 1 ) , v i ( 1 ) ( m 0 ) = f u i ( 1 ) , v i ( 1 ) , t i ( 1 ) , 1 i n .
Note that x ( 1 ) is causal in t ( 1 ) .
Helper at the end of the sub-block. Thanks to its cribbing, at the end of sub-block 1, the helper is cognizant of x ( 1 ) . In addition, it knows v ( 1 ) (since it is determined by m 0 , which was set a priori) and t ( 1 ) (since it was produced by itself). The helper now decodes the message m 1 by looking for an index j M such that
u ( 1 ) ( j | m 0 ) , x ( 1 ) T U X V T ( v ( 1 ) , t ( 1 ) ) .
If such an index j exists and is unique, the helper sets m ^ 1 = j . Otherwise, an error is declared. By standard results, the probability of error is vanishingly small provided that
R < I ( U ; X | V T ) .
Denote by m ^ 1 the message decoded by the helper at the end of sub-block 1. We proceed to describe the operation of the helper and encoder in sub-block b, when 2 b B 1 .
Helper, 2 b B 1 . Denote by m ^ b 1 the message decoded by the helper at the end of sub-block ( b 1 ) . In sub-block b, the helper produces
t ( b ) = ( t 1 ( b ) , t 2 ( b ) , , t n ( b ) )
where
t i ( b ) = h s i ( b ) , v i ( b ) ( m ^ b 1 ) , 1 i n .
Encoder, 2 b B 1 . Set u ( b ) = u ( b ) ( m b | m b 1 ) and v ( b ) = v ( b ) ( m ^ b 1 ) . The input to the channel is
x ( b ) = x 1 ( b ) , x 2 ( b ) , , x n ( b )
where
x i ( b ) = f u i ( b ) ( m b | m b 1 ) , v i ( b ) ( m b 1 ) , t i ( b ) s i ( b ) , v i ( b ) ( m ^ b 1 ) = f u i ( b ) , v i ( b ) , t i ( b ) , 1 i n .
Note that t ( b ) and x ( b ) are causal in s ( b ) and t ( b ) , respectively.
Helper at the end of the sub-block, 2 b B 1 . At the end of sub-block b the helper has x ( b ) at hand. In addition, it has v ( b ) ( m ^ b 1 ) (since m ^ b 1 was decoded at the end of the previous sub-block) and t ( b ) (since it was produced by itself). The helper now decodes the message m b . Assuming that m ^ b 1 was decoded correctly, this can be done with a low probability of error if (37) is satisfied.
We proceed to the last sub-block, where no fresh information is sent. Here m B = 1 , and the operations of the helper and encoder proceed exactly as in (77)–(80), with b = B . Note that in sub-block B, the helper need not decode m B since it is set a priori and known to all.

4.1.3. Decoding

At the destination, we employ backward decoding. Starting at sub-block B with m B = 1 , the decoder looks for an index j M such that
u ( B ) ( 1 | j ) , v ( B ) ( j ) , y ( B ) T U V Y .
If such an index exists and is unique, the decoder sets m ^ ^ B 1 = j . Otherwise, an error is declared. By standard result, the decoding is correct with probability approaching 1 provided
R < I ( U V ; Y ) .
In the next (backward) decoding sub-blocks, the decoding proceeds as in (81), with the exception that the estimate m ^ ^ b replaces the default value m B = 1 in (81). Thus, in sub-block B 1 , the decoder has at hand the estimate m ^ ^ B 1 , and the channel output y ( B 1 ) . It looks for an index j such that
u ( B 1 ) ( m ^ ^ B 1 | j ) , v ( B 1 ) ( j ) , y ( B 1 ) T U V Y .
Similarly, for 2 b B 1 , the decoder looks for an index j such that
u ( b ) ( m ^ ^ b | j ) , v ( b ) ( j ) , y ( b ) T U V Y .
If such an index j exists and is unique, the decoder sets m ^ ^ b 1 = j . Otherwise, an error is declared. Assuming that m b was decoded correctly in the previous decoding stage, i.e., m ^ ^ b = m b , the decoding of m b 1 in sub-block b is correct with probability close to 1 provided that (82) holds. Note that m 1 is decoded in sub-block b = 2 , that is, y ( 1 ) is not used at the destination. However, the transmission in sub-block 1 is not superfluous, as it is used by the helper to decode m 1 at the end of the first sub-block. Since (76) and (82) are the two terms in (10), this concludes the proof of the direct part.

4.2. Converse Part

Fix | T | , and consider ( n , 2 n R , | T | , ϵ ˜ n ) -codes with ϵ ˜ n 0 . For each n, feed a random message M that is drawn equiprobably from the set { 1 , 2 , , 2 n R } to the encoder. By the channel model,
M   ( X n S n )   Y n .
Fano’s inequality and the fact that ϵ ˜ n 0 imply the existence of a sequence ϵ n 0 for which
n ( R ϵ n ) I ( M ; Y n ) ( a ) I ( M ; X n S n ) = I ( M ; X n | S n ) = i = 1 n I ( M ; X i | S n X i 1 ) = ( b ) i = 1 n I ( M ; X i | S n X i 1 T i ) i = 1 n I ( M S i n ; X i | S i 1 X i 1 T i ) = ( c ) i = 1 n I ( M ; X i | S i 1 X i 1 T i ) i = 1 n I ( M Y i 1 ; X i | S i 1 X i 1 T i )
where ( a ) follows from (85); ( b ) holds because T i is a function of X i 1 S i (8); and ( c ) holds because X i is a function of M T i and hence of M S i 1 X i 1 T i (so I ( S i n ; X i | M S i 1 X i 1 T i ) must be zero).
We proceed to derive the second bound. Starting again with Fano’s inequality,
n ( R ϵ n ) I ( M ; Y n ) = u = 1 n I ( M ; Y i | Y i 1 ) u = 1 n I ( M Y i 1 ; Y i ) .
Defining
U i = M Y i 1
V i = S i 1 X i 1
we can rewrite (86) and (87) as
R ϵ n i = 1 n I ( U i ; X i | V i T i )
R ϵ n i = 1 n I ( U i ; Y i )
Moreover, with U i and V i defined as above, U i V i and S i are independent
U i V i S i
and
T i = h i ( S i , V i )
X i = f i ( U i , V i , T i )
where h i and f i are (blocklength dependent) deterministic functions. Indeed, X i can be determined from ( U i , V i , T i ) because U i determines the message M and V i determined T i 1 , so ( U i , V i , T i ) determines ( M , T i ) from which X i can be computed using (5).
We next do away with the sums by conditioning on a time-sharing random variable. Let Q be a random variable uniformly distributed over { 1 , 2 , , n } independently of the channel and the state. Using Q, we can express the bounds (90), (91) as
R ϵ n I ( U Q ; X Q | V Q T Q Q ) = I ( U Q Q ; X Q | V Q T Q Q ) = I ( U ˜ ; X | V T ) = I ( U ˜ V ; X | V T ) = I ( U ; X | V T )
R ϵ n I ( U Q ; Y Q | Q ) I ( U Q Q ; Y Q ) = I ( U ˜ ; Y ) I ( U ˜ V ; Y ) = I ( U ; Y )
where we define
X = X Q , Y = Y Q , T = T Q , S = S Q
and the auxiliaries
V = ( V Q Q )
U ˜ = ( U Q Q )
U = ( U ˜ , V ) = ( U Q V Q Q ) .
Note that the conditional law of Y given ( X T S ) is that of the channel, namely, W Y | X S and that S is distributed like the channel state. Moreover,
V   U   ( X T S )   Y .
Since U and V contain the time sharing random variable Q, (93) and (94) imply that,
T = h ( S , V )
X = f ˜ ( U ˜ , V , T ) = f ( U , T )
for some deterministic functions h and f. Therefore, the joint distribution under which the RHS of (95) and the RHS of (96) are computed is of the form
P S U ˜ V T X Y = P S P U ˜ V P T | S V P X | U ˜ V T W Y | X S
where P T | S V and P X | U ˜ V T are zero-one laws, or
P S U V T X Y = P S P U P V | U P T | S V P X | U T W Y | X S
where P T | S V , P X | U T , and P V | U are zero-one laws.
The form (105) and the inequalities (95), (96) establish the converse.

4.3. Cardinality Bounds

We next proceed to bound the alphabet sizes of U , V in two steps. In the first, we do so while relaxing the zero-one-law requirements. In the second, we enlarge the alphabet to fulfill said requirements. Let
L = | X | | T | | S | + 1 .
Fix a conditional distribution p ( x , t , s | u ) , and define the L functions of p ( u | v ) :
p ( x , t , s | v ) = u p ( x , t , s | u ) p ( u | v ) ( L 2 functions )
I ( U ; X | T , V = v )
I ( U ; Y | V = v )
(with the L 2 functions corresponding to all by one of the tuples ( x , t , s ) ). By the support lemma [5,8], there exists a random variable V with alphabet | V | L , such that P X T S , I ( U ; X | T V ) , and I ( U ; Y ) are preserved. Denote by U the resulting random variable U, i.e.,
P U ( u ) = v p ( u | v ) P V ( v ) .
We next bound the alphabet size of U . For each v V , we define the L functions
p ( x , t , s | v , u ) ( L 2 functions )
I ( U ; X | T , V )
I ( U ; Y | V ) .
Applying again the support lemma, for every v there exists a random variable U with alphabet | U | L such that (109)–(111) are preserved. If we multiply U | V | times we can, with proper labeling of the elements of U , retain a Markov structure like (101). Now the alphabet sizes are fixed and independent of n. Thus, substituting V , U in (95), (96) and taking the limit n we have the upper bound
R I ( U ; X | V T )
R I ( U ; Y )
where
P S U V T X Y = P S P U V P T | S V P X | U V T W Y | X S
| V | L , | U | L 2
and the following Markov chain holds:
V   U   ( X T S )   Y .
Note, however, that P T | S V and P X | U V T are no longer zero-one laws. We remedy this using the Functional Representation lemma (FRL) [5] (at the cost of increasing the alphabet sizes): a standard convexity argument will not do because—although I ( U ; X | V T ) is a convex function of P T | S V and also a convex function of P X | U V T and likewise I ( U ; Y ) —the minimum of two convex functions need not be convex.
The Functional Representation lemma implies that—without altering the conditional law of T given S V nor of X given U V T —the random variables T and X can be represented as
T = g ˜ 1 ( S V , Z 1 )
X = g ˜ 2 ( U V T , Z 2 )
where g ˜ 1 , g ˜ 2 are deterministic functions; Z 1 and Z 2 are independent random variables that are independent of ( S V , U V T ) ; and their alphabets satisfy
| Z 1 | | S | | V | | T | 1 + 1
| Z 2 | | U | | V | | T | | X | 1 + 1 .
At the expense of increased alphabets sizes, we now append Z 1 to V and Z 2 to U to form the new auxiliary random variables
V ^ = ( V Z 1 )
U ^ = ( U Z 2 )
with alphabet sizes
| V ^ | | S | | V | 2 | T | 1 + | V |
| U ^ | | U | 2 | V | | T | | X | 1 + | U | .
We set
P X | U ^ V ^ T ( x | u , z 2 , v , z 1 , t ) = 1 x = g ˜ 2 ( u , z 2 , v , t )
(irrespective of z 1 ) and
P T | V ^ S ( t | v , z 1 , t ) = 1 t = g 1 ( s , v , z 1 )
where 1 { statement } equals 1 if the statement is true and equals 0 otherwise.
As we next argue, these auxiliary random variables and the above zero–one laws do not decrease the relevant mutual information expressions.
Beginning with I ( U ^ ; X | V ^ T ) , we note that H ( X | V ^ T ) = H ( X | V T ) because we have preserved the joint law of V T and because Z 1 does not influence the mapping (54) to X. Since H ( X | U Z 2 V T ) H ( X | H ( X | U V T ) , this establishes that
I ( U ^ ; X | V ^ T ) I ( U ; X | V T ) .
Likewise, our new auxiliary random variables and zero–one laws do not alter H ( Y ) , but H ( Y | U ^ ) H ( Y | U ) , so
I ( U ^ ; Y ) I ( U ; Y ) .
This completes the proof of Theorem 1.

Author Contributions

Writing—original draft preparation, A.L. and Y.S.; writing—review and editing, A.L. and Y.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by the Swiss National Science Foundation (SNSF) under Grant 200021-215090.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
IIDIndependent and Identically Distributed
FRLFunctional Representation Lemma
PMFProbability Mass Function
RHSRight-Hand Side

References

  1. Shannon, C.E. Channels with side Information at the transmitter. IBM J. Res. Dev. 1958, 2, 289–293. [Google Scholar] [CrossRef]
  2. Gel’fand, S.I.; Pinsker, M.S. Coding for channel with random parameters. Probl. Control. Inform. Theory 1980, 9, 19–31. [Google Scholar]
  3. Keshet, G.; Steinberg, Y.; Merhav, N. Channel Coding in the Presence of Side Information. Found. Trends Commun. Inf. Theory 2008, 4, 1567–2190. [Google Scholar] [CrossRef]
  4. Lapidoth, A.; Wang, L. State-Dependent DMC with a Causal Helper. IEEE Trans. Inf. Theory 2024, 70, 3162–3174. [Google Scholar] [CrossRef]
  5. El Gamal, A.; Kim, Y. Network Information Theory; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
  6. Lapidoth, A.; Wang, L.; Yan, Y. State-Dependent Channels with a Message-Cognizant Helper. arXiv 2023, arXiv:2311.082200. [Google Scholar] [CrossRef]
  7. Lapidoth, A.; Wang, L.; Yan, Y. Message-Cognizant Assistance and Feedback for the Gaussian Channel. arXiv 2023, arXiv:2310.15768. [Google Scholar] [CrossRef]
  8. Csiszár, I.; Körner, J. Information Theory: Coding Theorems for Discrete Memoryless Systems, 2nd ed.; Cambridge University Press: London, UK, 2011. [Google Scholar]
Figure 1. Communication over a state-dependent channel with a rate-limited causal cribbing helper.
Figure 1. Communication over a state-dependent channel with a rate-limited causal cribbing helper.
Entropy 26 00570 g001
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lapidoth, A.; Steinberg, Y. The State-Dependent Channel with a Rate-Limited Cribbing Helper. Entropy 2024, 26, 570. https://doi.org/10.3390/e26070570

AMA Style

Lapidoth A, Steinberg Y. The State-Dependent Channel with a Rate-Limited Cribbing Helper. Entropy. 2024; 26(7):570. https://doi.org/10.3390/e26070570

Chicago/Turabian Style

Lapidoth, Amos, and Yossef Steinberg. 2024. "The State-Dependent Channel with a Rate-Limited Cribbing Helper" Entropy 26, no. 7: 570. https://doi.org/10.3390/e26070570

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop