4.1. Direct Part
Pick a distribution as in (
11), where
and
are 0–1 laws, so
for some deterministic functions
f and
h. Extend these functions to act on
n-tuples componentwise so that if
are
n-tuples in
and
, then
indicates that
t is an
n-tuple in
whose
i-th component
equals
, where
and
are the corresponding components of
s and
v. Likewise, we write
.
To prove achievability, we propose a block Markov coding scheme with the receiver performing backward decoding. Although only the receiver is required to decode the message, in our scheme, the helper does too (but not with backward decoding, which would violate causality).
The transmission comprises Bn-length sub-blocks, for a total of channel uses. The transmitted message m is represented by sub-messages , with each of the sub-messages taking values in the set . The overall transmission rate is thus , which can be made arbitrarily close to R by choosing B very large. The sub-messages are transmitted in the first sub-blocks, with transmitted in sub-block b (for ). Hereafter, we use to denote the state n-tuple affecting the channel in sub-block b and use to denote its i-component (with ). Similar notation holds for , , etc.
We begin with an overview of the scheme, where we focus on the transmission in sub-blocks 2 through : the first and last sub-blocks must account for some edge effects that we shall discuss later. Let b be in this range. The coding we use in sub-block b is superposition coding with the cloud center determined by and the satellite by .
Unlike the receiver, the helper, which must be causal, cannot employ backward decoding: it decodes each sub-message at the end of the sub-block in which it is transmitted. Consequently, when sub-block
b begins, it already has a reliable guess
of
(based on the previous channel inputs
it cribbed). The encoder, of course, knows
, so the two can agree on the cloud center
indexed by
. (We ignore for now the fact that
may, with small probability, differ from
.) The satellite is computed by the encoder as
; it is unknown to the helper. The helper produces the sub-block
b assistance
based on the state sequence and the cloud center
(Since
acts componentwise, this help is causal with the
i-th component of
being a function of the corresponding component
of the state sequence and
; it does not require knowledge of future states.)
For its part, the encoder produces the
n-tuple
with causality preserved because
and
can be computed from
and
ahead of time, and because
t is presented to the encoder causally and
operates componentwise.
As to the first and last sub-blocks: In the first, we set as constant (e.g., ), so we have only one cloud center. In sub-block B, we send no fresh information, so each cloud center has only one satellite.
We now proceed to a more formal exposition. For this, we will need some notation. Given a joint distribution
, we denote by
the set of all jointly typical sequences
, where the length
n is understood from the context, and we adopt the
-convention of [
8]. Similarly, given a sequence
z,
stands for the set of all pairs
that are jointly typical with the given sequence
z.
To describe the first and last sub-blocks, we define and , respectively. The proof of the direct part is based on random coding and joint typicality decoding.
4.1.1. Code Construction
We construct B codebooks , , each of length n. Each codebook is generated randomly and independently of the other codebooks as follows:
For every , generate length-n cloud centers , independently, each with IID components.
For every
and
, generate
length-
n satellites
,
conditionally independently given
, each according to
The codebook
is the collection
Reveal the codebooks to the encoder, decoder, and helper.
4.1.2. Operation of the code
We first describe the operation of the helper and encoder in the first sub-block.
Helper. In the first sub-block,
, the helper produces
where
Note that
is causal in
.
Encoder. Set
and
. The input to the channel is
where
Note that
is causal in
.
Helper at the end of the sub-block. Thanks to its cribbing, at the end of sub-block 1, the helper is cognizant of
. In addition, it knows
(since it is determined by
, which was set a priori) and
(since it was produced by itself). The helper now decodes the message
by looking for an index
such that
If such an index
j exists and is unique, the helper sets
. Otherwise, an error is declared. By standard results, the probability of error is vanishingly small provided that
Denote by
the message decoded by the helper at the end of sub-block 1. We proceed to describe the operation of the helper and encoder in sub-block
b, when
.
Helper, . Denote by
the message decoded by the helper at the end of sub-block
. In sub-block
b, the helper produces
where
Encoder, . Set
and
. The input to the channel is
where
Note that
and
are causal in
and
, respectively.
Helper at the end of the sub-block, . At the end of sub-block
b the helper has
at hand. In addition, it has
(since
was decoded at the end of the previous sub-block) and
(since it was produced by itself). The helper now decodes the message
. Assuming that
was decoded correctly, this can be done with a low probability of error if (
37) is satisfied.
We proceed to the last sub-block, where no fresh information is sent. Here , and the operations of the helper and encoder proceed exactly as in (77)–(80), with . Note that in sub-block B, the helper need not decode since it is set a priori and known to all.
4.1.3. Decoding
At the destination, we employ backward decoding. Starting at sub-block
B with
, the decoder looks for an index
such that
If such an index exists and is unique, the decoder sets
. Otherwise, an error is declared. By standard result, the decoding is correct with probability approaching 1 provided
In the next (backward) decoding sub-blocks, the decoding proceeds as in (
81), with the exception that the estimate
replaces the default value
in (
81). Thus, in sub-block
, the decoder has at hand the estimate
, and the channel output
. It looks for an index
j such that
Similarly, for
, the decoder looks for an index
j such that
If such an index
j exists and is unique, the decoder sets
. Otherwise, an error is declared. Assuming that
was decoded correctly in the previous decoding stage, i.e.,
, the decoding of
in sub-block
b is correct with probability close to 1 provided that (
82) holds. Note that
is decoded in sub-block
, that is,
is not used at the destination. However, the transmission in sub-block 1 is not superfluous, as it is used by the helper to decode
at the end of the first sub-block. Since (
76) and (
82) are the two terms in (
10), this concludes the proof of the direct part.
4.2. Converse Part
Fix
, and consider
-codes with
. For each
n, feed a random message
M that is drawn equiprobably from the set
to the encoder. By the channel model,
Fano’s inequality and the fact that
imply the existence of a sequence
for which
where
follows from (85);
holds because
is a function of
(8); and
holds because
is a function of
and hence of
(so
must be zero).
We proceed to derive the second bound. Starting again with Fano’s inequality,
Defining
we can rewrite (86) and (87) as
Moreover, with
and
defined as above,
and
are independent
and
where
and
are (blocklength dependent) deterministic functions. Indeed,
can be determined from
because
determines the message
M and
determined
, so
determines
from which
can be computed using (5).
We next do away with the sums by conditioning on a time-sharing random variable. Let
Q be a random variable uniformly distributed over
independently of the channel and the state. Using
Q, we can express the bounds (90), (91) as
where we define
and the auxiliaries
Note that the conditional law of
Y given
is that of the channel, namely,
and that
S is distributed like the channel state. Moreover,
Since
U and
V contain the time sharing random variable
Q, (93) and (94) imply that,
for some deterministic functions
h and
f. Therefore, the joint distribution under which the RHS of (95) and the RHS of (96) are computed is of the form
where
and
are zero-one laws, or
where
,
, and
are zero-one laws.
The form (
105) and the inequalities (95), (96) establish the converse.
4.3. Cardinality Bounds
We next proceed to bound the alphabet sizes of
in two steps. In the first, we do so while relaxing the zero-one-law requirements. In the second, we enlarge the alphabet to fulfill said requirements. Let
Fix a conditional distribution
, and define the
L functions of
:
(with the
functions corresponding to all by one of the tuples
). By the support lemma [
5,
8], there exists a random variable
with alphabet
, such that
,
, and
are preserved. Denote by
the resulting random variable
U, i.e.,
We next bound the alphabet size of
. For each
, we define the
L functions
Applying again the support lemma, for every
there exists a random variable
with alphabet
such that (109)–(111) are preserved. If we multiply
times we can, with proper labeling of the elements of
, retain a Markov structure like (
101). Now the alphabet sizes are fixed and independent of
n. Thus, substituting
in (95), (96) and taking the limit
we have the upper bound
where
and the following Markov chain holds:
Note, however, that
and
are no longer zero-one laws. We remedy this using the Functional Representation lemma (FRL) [
5] (at the cost of increasing the alphabet sizes): a standard convexity argument will not do because—although
is a convex function of
and also a convex function of
and likewise
—the minimum of two convex functions need not be convex.
The Functional Representation lemma implies that—without altering the conditional law of
T given
nor of
X given
—the random variables
T and
X can be represented as
where
,
are deterministic functions;
and
are independent random variables that are independent of
; and their alphabets satisfy
At the expense of increased alphabets sizes, we now append
to
and
to
to form the new auxiliary random variables
with alphabet sizes
We set
(irrespective of
) and
where
equals 1 if the statement is true and equals 0 otherwise.
As we next argue, these auxiliary random variables and the above zero–one laws do not decrease the relevant mutual information expressions.
Beginning with
, we note that
because we have preserved the joint law of
and because
does not influence the mapping (
54) to
X. Since
, this establishes that
Likewise, our new auxiliary random variables and zero–one laws do not alter
, but
, so
This completes the proof of Theorem 1.