1. Introduction
If a helper can observe the additive noise corrupting a channel and can describe it to the decoder, then the latter can subtract it and thus render the channel noiseless. However, for this to succeed, the description must be nearly lossless and hence possibly of formidable rate. It is thus of interest to study scenarios where the description rate is limited, and to understand how the rate of the help affects performance.
When performance is measured in terms of the Shannon capacity, the problem was solved for a number of channel models [
1,
2,
3], where the former two address assistance to the decoder and the latter to the encoder. When performance was measured in terms of the erasures-only capacity or the list-size capacity, the problem was solved in [
4,
5]. Error exponents with assistance were studied in [
6]. Here we study how rate-limited help affects the identification capacity [
7].
We focus on the memoryless modulo-additive channel (MMANC), whose time-
k output
corresponding to the time-
k input
is:
where
is the time-
k noise sample; the channel input
, the channel output
, and the noise
all take values in the set
—also denoted
, or
, or
—comprising the
elements
; and ⊕ and ⊖ denote mod-
addition and subtraction, respectively. The noise sequence
is IID
, where
is some PMF on
.
Irrespective of whether the help is provided to the encoder, to the decoder, or to both, the Shannon capacity of this channel coincides with its erasures-only capacity, and both are given by [
3] (Section V) and [
4] (Theorems 2 and 6):
where
denotes
, and
is the Shannon entropy of
.
Here we study two versions of the identification capacity of this channel: Ahlswede and Dueck’s original identification capacity
[
7], and the identification capacity subject to no missed-identifications
[
8]. Our main result is that—irrespective of whether the help is provided to the encoder, to the decoder, or to both—the two identification capacities coincide and both equal the right-hand side (RHS) of (
2).
2. Problem Formulation
The identification-over-a-channel problem is parameterized by the blocklength
n, which tends to infinity in the definition of the identification capacity. The
n-length noise sequence
is presented to a helper, which produces its
-bit description
:
where:
We refer to the set
as the set of identification messages and to its cardinality
as the number of identification messages. The identification rate is defined (for
sufficiently large) as:
A generic element of
—namely, a generic identification message—is denoted
i.
If no help is provided to the encoder, then the latter is specified by a family
of PMFs on
that are indexed by the identification messages, with the understanding that, to convey the identification message (IM)
i, the encoder transmits a random sequence in
that it draws according to the PMF
. If help
is provided to the encoder, then the encoder’s operation is specified by a family of PMFs
that is now indexed by pairs of identification messages and noise descriptions, with the understanding that, to convey IM
i given the description
, the encoder produces a random
n-length sequence of channel inputs that is distributed according to
. In either case, the channel output sequence
is:
componentwise.
If help is provided to the encoder, and if IM
i is to be conveyed, then the joint distribution of
has the form:
where
and where
because we are assuming that the noise description is a deterministic function of the noise sequence. (The results also hold if we allow randomized descriptions: our coding schemes employ deterministic descriptions and the converse allows for randomization.) Here
equals 1 if the statement holds and equals 0 otherwise. In the absence of help, the joint distribution has the form:
Based on the data available to it— in the absence of help to the decoder and in its presence—the receiver performs binary tests indexed by , where the i-th test is whether or not the IM is i. It accepts the hypothesis that the IM is i if is in its acceptance region, which we denote in the presence of decoder assistance and in its absence.
When the help
is provided to the receiver, the probability of missed detection associated with IM
is thus:
and the worst-case false alarm associated with it is:
Note that, given
, the acceptance regions
of the different tests need not be disjoint. We define:
and:
In the absence of help to the receiver, the probability of missed detection associated with IM
i is:
and the worst-case probability of false alarm associated with it is:
In this case, we define:
and:
In both cases we say that a scheme is of zero missed detectionsif is zero.
A rate
R is an achievable identification rate if, for every
and every
, there exists some positive integer
such that, for all blocklengths
n exceeding
, there exists a scheme with:
identification messages for which:
The supremum of achievable rates is the identification capacity with a helper
. Replacing requirement (
20) with:
leads to the definition of the zero missed-identification capacity
.
Remark 1. Writing out of (14) as:highlights that (prior to maximizing over i) we first maximize over j and then average the result over t. In this sense, the help—even if provided to both encoder and decoder—cannot be viewed as “common randomness” in the sense of [9,10,11] where the averaging over the common randomness is performed before taking the maximum. Our criterion is more demanding of the direct part (code construction) and less so of the converse. Both criteria are interesting. Ours allows for the notion of “outage”, namely, descriptions that indicate that identification might fail and that therefore call for retransmission. The other criterion highlights the interplay between the noise description and the generation of common randomness (particularly when the help is provided to both transmitter and receiver).
The following theorem is the main result of this paper.
Theorem 1. On the modulo additive noise channel—irrespective of whether the help is provided to the transmitter, to the receiver, or to both—the identification capacity with a helper and the zero missed-identification capacity with a helper are equal and coincide with the Shannon capacity:where the latter is given in (2). We prove this result by establishing in
Section 3 that
using a slight strengthening of recent results in [
4] in combination with the code construction proposed in [
8]. The converse is proved in
Section 4, where we use a variation on a theme by Watanabe [
12] to analyze the case where the assistance is provided to both transmitter and receiver.
3. Direct Part: Zero Missed Detection
In this section we prove that:
by proposing identification schemes of no missed detections and of rates approaching
. To this end, we extend to the helper setting the connection—due to Ahlswede, Cai, and Zhang [
8]—between the zero-missed-detection identification capacity
and the erasures-only capacity
. We then call on recent results [
4] to infer that, on the modulo-additive noise channel with a helper, the Erasures-Only capacity is equal to the Shannon capacity. We treat encoder-only assistance and decoder-only assistance separately. Either case also proves achievability when the assistance is provided to both encoder and decoder.
Recall that an erasures-only decoder produces a list
comprising the messages under which the observation is of positive likelihood and then act as follows: If the list contains only one message, it produces that message; otherwise, it declares an erasure. Since the list always contains the transmitted message, this decoder never errs. The erasures-only capacity is defined like the Shannon capacity, but with the additional requirement that the decoder be the erasures-only decoder. This notion extends in a natural way to settings with a helper [
4].
3.1. Encoder Assistance
A rate-
R, blocklength-
n, encoder-assisted, erasures-only transmission code comprises a message set
with
messages and a collection of
mappings
from
to
, indexed by
, with the understanding that to transmit Message
m after being presented with the help
, the encoder produces the
n-tuple of channel inputs
. Since the decoder observes only the channel outputs (and not the help), it forms the list:
The collection of output sequences that cause the erasures-only decoder to produce an erasure is:
The probability of erasure associated with the transmission of Message
m with encoder help
t is
. On the modulo additive noise channel with rate-
encoder assistance, the erasures-only capacity and the Shannon capacity coincide and [
4]:
We shall need the following slightly-stronger version of the achievability part of this result, where we swap the maximization over the messages with the expectation over the help:
Proposition 1. Consider the modulo additive noise channel with rate- encoder assistance. For any transmission rate R smaller than of (27)), there exists a sequence of rate-R transmission codes for which:A similar result holds for decoder assistance. Proof. The proof is presented in
Appendix A. It is based on the construction in [
4], but with a slightly finer analysis. □
The coding scheme we propose is essentially that of [
8]. We just need to account for the help. For each blocklength
n, we start out with a transmission code of roughly
codewords for which (
28) holds, and use Lemma 1 ahead to construct approximately
lightly-intersecting subsets of its message set. We then associate an IM with each of the subsets, with the understanding that to transmit an IM we pick uniformly at random one of the messages in the subset associated with it and transmit this message with the helper’s assistance.
Lemma 1 ([
7] Proposition 14)
. Let be a finite set, and let be given. If is sufficiently small so that:then there exist subsets of such that for all distinct the following hold: With the aid of this lemma, we can now prove the achievability of .
Proof. Given an erasures-only encoder-assisted transmission code
where
, we apply Lemma 1 to the transmission message set
with:
to infer, for large enough
n, the existence of subsets
such that for all distinct
with
:
Note that (36) implies that:
To send IM
i after obtaining the assistance
, the encoder picks a random element
M from
equiprobably and transmits
, so:
To guarantee no missed detections, we set the acceptance region of
i-th IM to be:
It now remains to analyze the scheme’s maximal false-alarm probability.
where in (41) we expressed
as
using (
7); in (42) we expressed
as the disjoint union of
and
; in (43) we used the trivial bound:
in (44) we used the fact that whenever
:
which holds because, by the definition of the set
, any output sequence
that contributes to the LHS of (50), i.e., that is in
with
, must also be in
; in (45) we used (35); in (46) we replaced each term in the sum with the global maximum (over
) and used (34); in (47) we used the trivial bound
; and in (48) we could simplify the expression because the dependence on
i and
j is no longer.
The above construction demonstrates that every transmission scheme that drives to zero induces a zero missed-identification scheme that drives the false-alarm probability to zero. Since the former exists for all rates up to , we conclude, by (37), that . This, in turn, implies that and hence concludes the achievability proof for encoder-assistance because, on the modulo additive noise channel, . □
3.2. Decoder Assistance
When, rather than to the encoder, the assistance is to the decoder, the transmission codewords are
n-tuples in
, and we denote the transmission codebook
. For the induced identification scheme we use the same message subsets as before, with IM
i being transmitted by choosing uniformly at random a message
M from the subset
and transmitting the codeword
. To avoid any missed detections, we set the acceptance region corresponding to IM
i and decoder assistance
t to be:
The analysis of the false-alarm probability is nearly identical to that with encoder assistance and is omitted.
4. Converse Part: Help Provided to Both Transmitter and Receiver
In this section we establish the converse for all the cases of interest by proving that the inequality:
holds even when the help is provided to both encoder and decoder. The RHS of (52) is the helper Shannon capacity, irrespective of whether the help is provided to the encoder, to the decoder, or to both [
3] (Section V).
There are two main steps to the proof. The first addresses the conditional probabilities of the two types of testing errors conditional on a given description
. It relates the two to the conditional entropy of the noise given the description, namely,
. Very roughly, this corresponds to proving the converse part of the ID-capacity theorem for the channel whose noise is distributed according to the conditional distribution of
given
. The difficulty in this step is that, given
, the noise is not memoryless, and the channel may not even be stable. Classical type-based techniques for proving the converse part of the ID-capacity theorem—such as those employed in [
7] (Theorem 12), [
13] (Section III), or [
14] (Section III)—are therefore not applicable. Instead, we extend to the helper setting Watanabe’s technique [
12], which is inspired by the partial channel resolvability method introduced by Steinberg [
15].
The second step in the proof addresses the unconditional error probabilities. This step is needed because, in the definition of achievability (see (
13) and (
14)), the error probabilities are averaged over the noise description
t. We will show that, when the identification rate exceeds the Shannon capacity, there exists an IM
for which the sum of the two types of errors is large whenever the description
t is in a subset
of
whose probability is bounded away from zero. This will imply that, for this IM
, the sum of the averaged probabilities of error is bounded away from zero, thus contradicting the achievability.
4.1. Additional Notation
Given a PMF and a conditional PMF , we write for the joint PMF that assigns the pair the probability . We use to denote the mutual information between X and Y under the joint distribution . The product PMF of marginals and is denoted ; it assigns the probability .
For the hypothesis testing problem of guessing whether some observation
X was drawn
(the “null hypothesis”) or
(the “alternative hypothesis”), we use
to denote a generic randomized test that, after observing
, guesses the null hypothesis (
) with probability
and the alternative (
) with probability
. (Here
for every
.) The type I error probability associated with
is:
and the type II:
For a given
we define:
to be the least type-II error probability that can be achieved under the constraint that the type-I error probability does not exceed
.
4.2. Conditional Missed-Detection and False-Alarm Probabilities
The following lemma follows directly from Watanabe’s work [
12].
Lemma 2 ([
12] Theorem 1 and Corollary 2)
. Let be the n-letter conditional distribution of the channel output sequence given that the noise description is and the input is . For any , > 0 with , any , and any fixed , the condition:implies:and hence:where:which—for any fixed —tends to 0 as n tends to ∞. Substituting
for
in the following theorem will allow us to link the RHS of (57) with the conditional mutual information between
and
given
. The theorem’s proof was inspired by the proof of [
16] (Theorem 8). See also [
17] (Lemma 1).
Theorem 2. Given any and any conditional PMF ,where is the binary entropy function. Proof. Applying the data-processing inequality for relative entropy to the binary hypothesis testing setting (see, e.g., [
18] (Thm. 30.12.5)) we conclude that for any randomized test
,
where:
denotes the binary divergence function. Since there exists a randomized test
for which
(see, e.g., [
18] (Lemma 30.5.4 and Proposition 30.8.1) we can apply (61) to
to conclude that:
(The above existence also holds when
is zero, but for this case we can verify (63) directly by noting that in this case, since
, the RHS of (63) is
.) The LHS of (63) can be lower bounded by lower-bounding the binary divergence function as:
It follows from (63) and (64) that:
so the infimum over
Q of the LHS is upper bounded by the infimum over
Q on the RHS. The latter (for fixed
) is achieved when
Q is the
Y-marginal of
, a marginal that we denote
:
This is a special case of a more general result on Rényi divergence [
19] (Theorem II.2). Here we give a simple proof for K-L divergence:
with equality if and only if
Q equals
.
From (63), (64), and (66) we obtain:
□
Applying Lemma 2 and Theorem 2 to our channel when its law is conditioned on yields the following corollary.
Corollary 1. On the MMANC
, for any , > 0 with , any , and any fixed , the condition:implies:where . Proof. Substituting
for
,
for
,
for
, and
for
in Theorem 2, we obtain:
Given
and
, the mutual information term in (76) can be upper-bounded as follows:
Applying (76) and (80) to (58) in Lemma 2 establishes Corollary 1. □
4.3. Averaging over T
Corollary 1 deals with identification for a given fixed
, but our definition of achievability in (
13) and (
14) entails averaging over
t, which we must thus study. We begin by lower-bounding the conditional entropy of the noise sequence
given the assistance
T:
We next define, for every
, the subset of descriptions:
These are poor noise descriptions in the sense that, after they are revealed, the remaining uncertainty about the noise is still large. Key is that their probability is bounded away from zero. In fact, as we next argue:
where in the second case the probability is 1 because when
the condition appearing in the definition of
in (84) translates to
. As to the first case, we begin with (83) to obtain:
from which the first case of the bound in (85) follows. Here (87) follows from expressing
as the disjoint union of
and
, and (88) follows from the definition of
and the bound
.
Inequality (85) establishes that the probability of a poor description is lower bounded by a positive constant that does not depend on n. Using Corollary 1 for such t’s will be the key to the converse.
Henceforth, we fix some sequence of identification codes of rate
R exceeding
, i.e., satisfying
, and show that
cannot tend to 0 as
n tends to
∞. For such a rate
R, there exist
,
; a pair
with
; and some
such that:
where
. Fix such
,
,
,
,
, and
.
Since the inequality on
in (89) is strict, and since
tends to zero with
n, it follows that the inequality continues to hold also when we add
to the RHS provided that
n is sufficiently large, i.e., that there exists some
such that:
It then follows from (90) and the definition of
in (84) that, whenever
,
exceeds the RHS of (75):
Corollary 1 thus implies that, for
:
However, we need a stronger statement because, in the above, the IM
i for which
depends on
t, whereas in our definition of achievability we are averaging over
T for fixed IM. The stronger result we will establish is that the condition on the LHS of (92) implies that, for all sufficiently large
n, there exists some IM
(that does not depend on
t) which performs poorly for
every t in
, i.e., for which:
That is, we will show that for sufficiently large
n:
To this end, define for each
:
and consider the identification code that results when we restrict our code to the IMs in
(while keeping the same acceptance regions). Applying Corollary 1 to this restricted code using (91), we obtain that:
Consequently,
where the second inequality holds by (96) and the fact that
is contained in
, and the latter’s cardinality is
.
Since
(89), there exists some
such that:
We can use this to upper-bound the RHS of (97) to obtain that, for
:
The complement (in
) of the union on the LHS of (99) is thus not empty, which proves the existence of some
for which (93) holds.
With
in hand, the converse follows from the fact that the probability that
T is in
is bounded away from zero (85), because for every
:
where (100) follows from the definitions in (
13) and (
14); in (101) we replaced the maximum with the IM
; and (103) follows from (93). Thus, any code of rate
with large enough
n must have
, and the latter is bounded away from zero. This concludes the proof of the converse part.