A Strong Converse Theorem for Hypothesis Testing Against Independence over a Two-Hop Network

Cao, Daming; Zhou, Lin; Tan, Vincent Y. F.

doi:10.3390/e21121171

Open AccessArticle

A Strong Converse Theorem for Hypothesis Testing Against Independence over a Two-Hop Network^†

by

Daming Cao

¹,

Lin Zhou

^2,*

and

Vincent Y. F. Tan

³

¹

School of Information Science and Engineering, Southeast University, Nanjing 210000, China

²

Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109, USA

³

Department of Electrical and Computer Engineering and Department of Mathematics, National University of Singapore, Singapore 119077, Singapore

^*

Author to whom correspondence should be addressed.

^†

This paper was partially presented at ISIT 2019, Paris, France, 7–12 July 2019.

Entropy 2019, 21(12), 1171; https://doi.org/10.3390/e21121171

Submission received: 29 October 2019 / Revised: 18 November 2019 / Accepted: 28 November 2019 / Published: 29 November 2019

(This article belongs to the Special Issue Multiuser Information Theory II)

Download

Browse Figures

Versions Notes

Abstract

:

By proving a strong converse theorem, we strengthen the weak converse result by Salehkalaibar, Wigger and Wang (2017) concerning hypothesis testing against independence over a two-hop network with communication constraints. Our proof follows by combining two recently-proposed techniques for proving strong converse theorems, namely the strong converse technique via reverse hypercontractivity by Liu, van Handel, and Verdú (2017) and the strong converse technique by Tyagi and Watanabe (2018), in which the authors used a change-of-measure technique and replaced hard Markov constraints with soft information costs. The techniques used in our paper can also be applied to prove strong converse theorems for other multiterminal hypothesis testing against independence problems.

Keywords:

strong converse; hypothesis testing with communication constraints; testing against independence; two-hop network; relay

1. Introduction

Motivated by situations where the source sequence is not available directly and can only be obtained through limited communication with the data collector, Ahlswede and Csiszár [1] proposed the problem of hypothesis testing with a communication constraint. In the setting of [1], there is one encoder and one decoder. The encoder has access to one source sequence

X^{n}

and transmits a compressed version of it to the decoder at a limited rate. Given the compressed version and the available source sequence

Y^{n}

(side information), the decoder knows that the pair of sequences

(X^{n}, Y^{n})

is generated i.i.d. from one of the two distributions and needs to determine which distribution the pair of sequences is generated from. The goal in this problem is to study the tradeoff between the compression rate and the exponent of the type-II error probability under the constraint that the type-I error probability is either vanishing or non-vanishing. For the special case of testing against independence, Ahlswede and Csiszár provided an exact characterization of the rate-exponent tradeoff. They also derived the so-called strong converse theorem for the problem. This states that the rate-exponent tradeoff cannot be improved even when one is allowed a non-vanishing type-I error probability. However, the characterization the rate-exponent tradeoff for the general case (even in the absence of a strong converse) remains open till date.

Subsequently, the work of Ahlswede and Csiszár was generalized to the distributed setting by Han in [2] who considered hypothesis testing over a Slepian-Wolf network. In this setting, there are two encoders, each of which observes one source sequence and transmits a compressed version of the source to the decoder. The decoder then performs a hypothesis test given these two compression indices. The goal in this problem is to study the tradeoff between the coding rates and the exponent of type-II error probability, under the constraint that the type-I error probability is either vanishing or non-vanishing. Han derived an inner bound to the rate-exponent region. For the special case of zero-rate communication, Shalaby and Papamarcou [3] applied the blowing-up lemma [4] judiciously to derive the exact rate-exponent region and a strong converse theorem. Further generalizations of the work of Ahlswede and Csiszár can be categorized into two classes: non-interactive models where encoders do not communicate with one another [5,6,7,8] and the interactive models where encoders do communicate [9,10].

We revisit one such interactive model as shown in Figure 1. This problem was considered by Salehkalaibar, Wigger and Wang in [11] and we term the problem as hypothesis testing over a two-hop network. The two-hop model considered here has potential applications in the Internet of Things (IoT) and sensor networks. In these scenarios, direct communication from the transmitter to the receiver might not be possible due to power constraints that result from limited resources such as finite battery power. However, it is conceivable in such a scenario to assume that there are relays—in our setting, there is a single relay—that aid in the communication or other statistical inference tasks (such as hypothesis testing) between the transmitter and receiver.

The main task in this problem is to construct two hypothesis tests between two joint distributions

P_{X Y Z}

and

Q_{X Y Z}

. One of these two distributions governs the law of

(X^{n}, Y^{n}, Z^{n})

where each copy

(X_{i}, Y_{i}, Z_{i})

is generated independently either from

P_{X Y Z}

and

Q_{X Y Z}

. As shown in Figure 1, the first terminal has knowledge of a source sequence

X^{n}

and sends an index

M_{1}

to the second terminal, which we call the relay; the relay, given side information

Y^{n}

and compressed index

M_{1}

, makes a guess of the hypothesis

{\hat{H}}_{Y}

and sends another index

M_{2}

to the third terminal; the third terminal makes another guess of the hypothesis

{\hat{H}}_{Z}

based on

M_{2}

and its own side information

Z^{n}

. The authors in [11] derived an inner bound for the rate-exponent region and showed that the bound is tight for several special cases, including the case of testing against independence in which

Q_{X Y Z} = P_{X} P_{Y} P_{Z}

. However, even in this simpler case of testing against independence, which is our main concern in this paper, the authors in [11] only established a weak converse.

In this paper, we strengthen the result by Salehkalaibar, Wigger and Wang in [11] by deriving a strong converse for the case of testing against independence. Our proof follows by combining two recently proposed strong converse techniques by Liu et al. in [12] and by Tyagi and Watanabe in [13]. In [12], the authors proposed a framework to prove strong converse theorems based on functional inequalities and reverse hypercontractivity of Markov semigroups. In particular, they applied their framework to derive strong converse theorems for a collection of problems including the hypothesis testing with communication constraints problem in [1]. In [13], the authors proposed another framework for strong converse proofs, where they used a change-of-measure technique and replaced hard Markov constraints with soft information costs. They also leveraged variational formulas for various information-theoretic quantities; these formulas were introduced by Oohama in [14,15,16].

Notation

Random variables and their realizations are in upper (e.g., X) and lower case (e.g., x) respectively. All sets are denoted in calligraphic font (e.g.,

X

). We use

X^{c}

to denote the complement of

X

. Let

X^{n} : = (X_{1}, \dots, X_{n})

be a random vector of length n and

x^{n}

its realization. Given any

x^{n}

, we use

{\hat{P}}_{x^{n}}

to denote its type (empirical distribution). All logarithms are base e. We use

R_{+}

and

N

to denote the set of non-negative real numbers and natural numbers respectively. Given any positive integer

a \in N

, we use

[a]

to denote

{1, \dots, a}

. We use

1 {\cdot}

to denote the indicator function and use standard asymptotic notation such as

O (\cdot)

. The set of all probability distributions on a finite set

X

is denoted as

P (X)

. Given any two random variables

(X, Y)

and any realization of x, we use

P_{Y | x} (\cdot)

to denote the conditional distribution

P_{Y | X} (\cdot | x)

. Given a distribution

P \in P (X)

and a function

f : X \to R

, we use

P (f)

to denote

E_{P} [f (X)]

. For information-theoretic quantities, we follow [17]. In particular, when the joint distribution of

(X, Y)

is

P_{X Y} \in P (X \times Y)

, we use

I_{P_{X Y}} (X; Y)

and

I_{P} (X; Y)

interchangeably. Throughout the paper, for ease of notation, we drop the subscript for distributions when there is no confusion. For example, when the joint distribution of

(X, Y, Z)

is

P_{X Y Z}

, we use

I_{P} (X; Y | Z)

and

I_{P_{X Y Z}} (X; Y | Z)

interchangeably. For any

(p, q) \in {[0, 1]}^{2}

, let

D_{b} (p ∥ q)

denote the binary divergence function, i.e.,

D_{b} (p ∥ q) = p log (p / q) + (1 - p) log ((1 - p) / (1 - q))

.

2. Problem Formulation and Existing Results

2.1. Problem Formulation

Fix a joint distribution

P_{X Y Z} \in P (X \times Y \times Z)

satisfying the Markov chain

X - Y - Z

, i.e.,

\begin{matrix} P_{X Y Z} (x, y, z) = P_{X} (x) P_{Y | X} (y | x) P_{Z | Y} (z | y) . \end{matrix}

(1)

Let

P_{X}

,

P_{Y}

and

P_{Z}

be induced marginal distributions of

P_{X Y Z}

. As shown in Figure 1, we consider a two-hop hypothesis testing problem with three terminals. The first terminal, which we term the transmitter, observes a source sequence

X^{n}

and sends a compression index

M_{1}

to the second terminal, which we term the relay. Given

M_{1}

and side information

Y^{n}

, the relay sends another compression index

M_{2}

to the third terminal, which we term the receiver. The main task in this problem is to construct hypothesis tests at both the relay and the receiver to distinguish between

\begin{matrix} H_{0} : (X^{n}, Y^{n}, Z^{n}) \sim P_{X Y Z}^{n} = P_{X} P_{Y | X}^{n} P_{Z | Y}^{n}, \end{matrix}

(2)

\begin{matrix} H_{1} : (X^{n}, Y^{n}, Z^{n}) \sim P_{X}^{n} P_{Y}^{n} P_{Z}^{n} . \end{matrix}

(3)

For subsequent analyses, we formally define a code for hypothesis testing over a two-hop network as follows.

Definition 1.

An

(n, N_{1}, N_{2})

-code for hypothesis testing over a two-hop network consists of

Two encoders:

$\begin{matrix} f_{1} & : X^{n} \to M_{1} : = {1, \dots, N_{1}}, \end{matrix}$

(4)

$\begin{matrix} f_{2} & : M_{1} \times Y^{n} \to M_{2} : = {1, \dots, N_{2}}, and \end{matrix}$

(5)
Two decoders

$\begin{matrix} g_{1} & : M_{1} \times Y^{n} \to {H_{0}, H_{1}}, \end{matrix}$

(6)

$\begin{matrix} g_{2} & : M_{2} \times Z^{n} \to {H_{0}, H_{1}} . \end{matrix}$

(7)

Given an

(n, N_{1}, N_{2})

-code with encoding and decoding functions

(f_{1}, f_{2}, g_{1}, g_{2})

, we define acceptance regions for the null hypothesis

H_{0}

at the relay and the receiver as

\begin{matrix} A_{Y, n} & : = {(m_{1}, y^{n}) : g_{1} (m_{1}, y^{n}) = H_{0}}, \end{matrix}

(8)

\begin{matrix} A_{Z, n} & : = {(m_{2}, z^{n}) : g_{2} (m_{2}, z^{n}) = H_{0}} \end{matrix}

(9)

respectively. We also define conditional distributions

\begin{matrix} P_{M_{1} | X^{n}} (m_{1} | x^{n}) & : = 1 {f_{1} (x_{1}^{n}) = m_{1}}, \end{matrix}

(10)

\begin{matrix} P_{M_{2} | Y^{n} M_{1}} (m_{2} | y^{n}, m_{1}) & : = 1 {f_{2} (m_{1}, y^{n}) = m_{2}} . \end{matrix}

(11)

Thus, for a

(n, N_{1}, N_{2})

-code characterized by

(f_{1}, f_{2}, g_{1}, g_{2})

, the joint distribution of random variables

(X^{n}, Y^{n}, Z^{n}, M_{1}, M_{2})

under the null hypothesis

H_{0}

is given by

\begin{matrix} P_{X^{n} Y^{n} Z^{n} M_{1} M_{2}} (x^{n}, y^{n}, z^{n}, m_{1}, m_{2}) & = P_{X Y Z}^{n} (x^{n}, y^{n}, z^{n}) P_{M_{1} | X^{n}} (m_{1} | x^{n}) P_{M_{2} | Y^{n} M_{1}} (m_{2} | y^{n}, m_{1}), \end{matrix}

(12)

and under the alternative hypothesis

H_{1}

is given by

\begin{matrix} {\bar{P}}_{X^{n} Y^{n} Z^{n} M_{1} M_{2}} (x^{n}, y^{n}, z^{n}, m_{1}, m_{2}) = P_{X}^{n} (x^{n}) P_{Y}^{n} (y^{n}) P_{Z}^{n} (z^{n}) P_{M_{1} | X^{n}} (m_{1} | x^{n}) P_{M_{2} | Y^{n} M_{1}} (m_{2} | y^{n}, m_{1}) . \end{matrix}

(13)

Now, let

P_{Y^{n} M_{1}}

and

P_{Z^{n} M_{2}}

be marginal distributions induced by

P_{X^{n} Y^{n} Z^{n} M_{1} M_{2}}

and let

{\bar{P}}_{Y^{n} M_{1}}

and

{\bar{P}}_{Z^{n} M_{2}}

be marginal distributions induced by

{\bar{P}}_{X^{n} Y^{n} Z^{n} M_{1} M_{2}}

. Then, we can define the type-I and type-II error probabilities at the relay as

\begin{matrix} β_{1} & : = P_{M_{1} Y^{n}} (A_{Y, n}^{c}), \end{matrix}

(14)

\begin{matrix} β_{2} & : = {\bar{P}}_{M_{1} Y^{n}} (A_{Y, n}) \end{matrix}

(15)

respectively and at the receiver as

\begin{matrix} η_{1} & : = P_{M_{2} Z^{n}} (A_{Z, n}^{c}), \end{matrix}

(16)

\begin{matrix} η_{2} & : = {\bar{P}}_{M_{2} Z^{n}} (A_{Z, n}) \end{matrix}

(17)

respectively. Clearly,

β_{1}, β_{2}, η_{1}

, and

η_{2}

are functions of n but we suppress these dependencies for brevity.

Given above definitions, the achievable rate-exponent region for the hypothesis testing problem in a two-hop network is defined as follows.

Definition 2.

Given any

(ε_{1}, ε_{2}) \in {(0, 1)}^{2}

, a tuple

(R_{1}, R_{2}, E_{1}, E_{2})

is said to be

(ε_{1}, ε_{2})

-achievable if there exists a sequence of

(n, N_{1}, N_{2})

-codes such that

\begin{matrix} \underset{n \to \infty}{lim sup} \frac{1}{n} log N_{i} & \leq R_{i}, \forall i \in {1, 2}, \end{matrix}

(18)

\begin{matrix} \underset{n \to \infty}{lim sup} β_{1} & \leq ε_{1}, \end{matrix}

(19)

\begin{matrix} \underset{n \to \infty}{lim sup} η_{1} & \leq ε_{2}, \end{matrix}

(20)

\begin{matrix} \underset{n \to \infty}{lim inf} - \frac{1}{n} log β_{2} & \geq E_{1}, \end{matrix}

(21)

\begin{matrix} \underset{n \to \infty}{lim inf} - \frac{1}{n} log η_{2} & \geq E_{2} . \end{matrix}

(22)

The closure of the set of all

(ε_{1}, ε_{2})

-achievable rate-exponent tuples is called the

(ε_{1}, ε_{2})

-rate-exponent region and is denoted as

R (ε_{1}, ε_{2})

. Furthermore, define the rate-exponent region as

\begin{matrix} R & : = R (0, 0) . \end{matrix}

(23)

2.2. Existing Results

In the following, we recall the exact characterization of

R

given by Salehkalaibar, Wigger and Wang ([11] (Corollary 1)). For this purpose, define the following set of joint distributions

\begin{matrix} Q : = {Q_{X Y Z U V} \in P (X \times Y \times Z \times U \times V) : Q_{X Y Z} = P_{X Y Z}, U - X - Y, V - Y - Z} . \end{matrix}

(24)

Given

Q_{X Y Z U V} \in Q

, define the following set

\begin{matrix} R (Q_{X Y Z U V}) : = {(R_{1}, R_{2}, E_{1}, E_{2}) : R_{1} & \geq I_{Q} (U; X), R_{2} \geq I_{Q} (V; Y), \\ E_{1} & \leq I_{Q} (U; Y), E_{2} \leq I_{Q} (U; Y) + I_{Q} (V; Z)} \end{matrix}

(25)

Finally, let

\begin{matrix} R^{*} & : = ⋃_{Q_{X Y Z U V} \in Q} R (Q_{X Y Z U V}) . \end{matrix}

(26)

Theorem 1.

The rate-exponent region

R

for the hypothesis testing over a two-hop network problem satisfies

\begin{matrix} R = R^{*} . \end{matrix}

(27)

In the following, inspired by Oohama’s variational characterization of rate regions for multiuser information theory [14,15,16], we provide an alternative characterization of

R^{*}

. For this purpose, given any

(b, c, d) \in R_{+}^{3}

and any

Q_{X Y Z U V} \in Q

, let

\begin{matrix} R_{b, c, d} (Q_{X Y Z U V}) & : = - I_{Q} (U; Y) + b I_{Q} (U; X) - c (I_{Q} (U; Y) + I_{Q} (V; Z)) + d I_{Q} (V; Y) . \end{matrix}

(28)

be a linear combination of the mutual information terms in (25). Furthermore, define

\begin{matrix} R_{b, c, d} & : = min_{Q_{X Y Z U V} \in Q} R_{b, c, d} (Q_{X Y Z U V}) . \end{matrix}

(29)

An alternative characterization of

R^{*}

is given by

\begin{matrix} R^{*} = ⋂_{(b, c, d) \in R_{+}^{3}} \{(R_{1}, R_{2}, E_{1}, E_{2}) : - E_{1} + b R_{1} - c E_{2} + d R_{2} \geq R_{b, c, d}\} . \end{matrix}

(30)

3. Strong Converse Theorem

3.1. The Case $ε_{1} + ε_{2} < 1$

Theorem 2.

Given any

(ε_{1}, ε_{2}) \in {(0, 1)}^{2}

such that

ε_{1} + ε_{2} < 1

and any

(b, c, d) \in R_{+}^{3}

, for any

(n, N_{1}, N_{2})

-code such that

β_{1} \leq ε_{1}

,

η_{1} \leq ε_{2}

, we have

\begin{matrix} log β_{2} + b log N_{1} + c log η_{2} + d log N_{2} \geq n R_{b, c, d} + Θ (n^{3 / 4} log n) . \end{matrix}

(31)

The proof of Theorem 2 is given in Section 4. Several remarks are in order.

First, using the alternative expression of the rate-exponent region in (30), we conclude that for any

(ε_{1}, ε_{2}) \in {(0, 1)}^{2}

such that

ε_{1} + ε_{2} < 1

, we have

R (ε_{1}, ε_{2}) = R^{*}

. This result significantly strengthens the weak converse result in ([11] (Corollary 1)) in which it was shown that

R (0, 0) = R^{*}

.

Second, it appears difficult to establish the strong converse result in Theorem 2 using existing classical techniques including image-size characterizations (a consequence of the blowing-up lemma) [4,6] and the perturbation approach [18]. In Section 4, we combine two recently proposed strong converse techniques by Liu, van Handel, and Verdú [12] and by Tyagi and Watanabe [13]. In particular, we use the strong converse technique based on reverse hypercontractivity in [12] to bound the exponent of the type-II error probability at the receiver and the strong converse technique in [13], which leverages an appropriate change-of-measure technique and replaces hard Markov constraints with soft information costs, to analyze the exponent of type-II error probability at the relay. Finally, inspired by the single-letterization steps in ([19] (Lemma C.2)) and [13], we single-letterize the derived multi-letter bounds from the previous steps to obtain the desired result in Theorem 2.

Third, we briefly comment on the apparent necessity of combining the two techniques in [12,13] instead of applying just one of them to obtain Theorem 2. The first step to apply the technique in [13] is to construct a “truncated source distribution” which is supported on a smaller set (often defined in terms of the decoding region) and is not too far away from the true source distribution in terms of the relative entropy. For our problem, the source satisfies the Markov chain

X^{n} - Y^{n} - Z^{n}

. If we naïvely apply the techniques in [13], the Markovian property would not hold for the truncated source

({\tilde{X}}^{n}, {\tilde{Y}}^{n}, {\tilde{Z}}^{n})

. On the other hand, it appears rather challenging to extend the techniques in [12] to the hypothesis testing over a multi-hop network problem since the techniques therein rely heavily on constructing semi-groups and it is difficult to devise appropriate forms of such semi-groups to be used and analyzed in this multi-hop setting. Therefore, we carefully combine the two techniques in [12,13] to ameliorate the aforementioned problems. In particular, we first use the technique in [13] to construct a truncated source

({\tilde{X}}^{n}, {\tilde{Y}}^{n})

and then let the conditional distribution of

{\tilde{Z}}^{n}

given

({\tilde{X}}^{n}, {\tilde{Y}}^{n})

be given by the true conditional source distribution

P_{Z | Y}^{n}

to maintain the Markovian property of the source (see (56)). Subsequently, in the analysis of error exponents, we use the technique in [12] to analyze the exponent of type-II error probability at the receiver to circumvent the need to construct new semi-groups.

Finally, we remark that the techniques (or a subset of the techniques) used to prove Theorem 2 can also be used to establish a strong converse result for other multiterminal hypothesis testing against independence problems, e.g., hypothesis testing over the Gray-Wyner network [7], the interactive hypothesis testing problem [9] and the cascaded hypothesis testing problem [10].

3.2. The Case $ε_{1} + ε_{2} > 1$

In this subsection, we consider the case where the sum of type-I error probabilities at the relay and the receiver is upper bounded by a quantity strictly greater than one. For ease of presentation of our results, let

\begin{matrix} Q_{2} & : = {Q_{X Y Z U_{1} U_{2} V} \in Q (X \times Y \times Z \times U_{1} \times U_{2} \times V) : \\ Q_{X Y Z} = P_{X Y Z}, U_{1} - X - Y, U_{2} - X - Y, V - Y - Z} . \end{matrix}

(32)

Given any

Q_{X Y Z U_{1} U_{2} V} \in Q_{2}

, define the following set of rate-exponent tuples

\begin{matrix} \tilde{R} (Q_{X Y Z U_{1} U_{2} V}) : = {(R_{1}, R_{2}, E_{1}, E_{2}) : & R_{1} \geq max {I_{Q} (U_{1}; X), I_{Q} (U_{2}; X)}, R_{2} \geq I_{Q} (V; X), \\ E_{1} \leq I_{Q} (U_{1}; Y), E_{2} \leq I_{Q} (U_{2}; Y) + I_{Q} (V; Z)} . \end{matrix}

(33)

Furthermore, define

\begin{matrix} \tilde{R} & : = ⋃_{Q_{X Y Z U_{1} U_{2} V}} \tilde{R} (Q_{X Y Z U_{1} U_{2} V}) . \end{matrix}

(34)

Given any

Q_{X Y Z U_{1} U_{2} V} \in Q_{2}

and

(b_{1}, b_{2}, c, d) \in R_{+}^{4}

, define the following linear combination of the mutual information terms

\begin{matrix} {\tilde{R}}_{b_{1}, b_{2}, c, d} (Q_{X Y Z U_{1} U_{2} V}) \\ : = - I_{Q} (U_{1}; Y) + b_{1} I_{Q} (U_{1}; X) + b_{2} I_{Q} (U_{2}; X) - c (I_{Q} (U_{2}; Y) + I_{Q} (V; Z)) + d I_{Q} (V; Y), \end{matrix}

(35)

and let

\begin{matrix} {\tilde{R}}_{b_{1}, b_{2}, c, d} & : = min_{Q_{X Y Z U_{1} U_{2} V}} {\tilde{R}}_{b_{1}, b_{2}, c, d} (Q_{X Y Z U_{1} U_{2} V}) . \end{matrix}

(36)

Then, based on [14,15,16], an alternative characterization of

\tilde{R}

is given by

\begin{matrix} \tilde{R} & = ⋃_{(b_{1}, b_{2}, c, d) \in R_{+}^{4}} \{(R_{1}, R_{2}, E_{1}, E_{2}) : - E_{1} + b_{1} R_{1} + b_{2} R_{1} - c E_{2} + d R_{2} \geq {\tilde{R}}_{b_{1}, b_{2}, c, d}\} . \end{matrix}

(37)

Analogously to Theorem 2, we obtain the following result.

Theorem 3.

Given any

(ε_{1}, ε_{2}) \in {(0, 1)}^{2}

and any

(b_{1}, b_{2}, c, d) \in R_{+}^{4}

, for any

(n, N_{1}, N_{2})

-code such that

β_{1} \leq ε_{1}

,

η_{1} \leq ε_{2}

, we have

\begin{matrix} log β_{2} + b_{1} log N_{1} + b_{2} log N_{1} + c log η_{2} + d log N_{2} \geq n {\tilde{R}}_{b_{1}, b_{2}, c, d} + Θ (n^{3 / 4} log n) . \end{matrix}

(38)

The proof of Theorem 3 is similar to that of Theorem 2 and thus omitted for simplicity.

To prove Theorem 3, we need to analyze two special cases (cf. Figure 2) of our system model separately:

(i): Firstly, we consider the first hop, which involves the transmitter and the relay only. The first hop itself is a hypothesis testing problem with a communication constraint [1]. Using the techniques either in [13] or [12], we can obtain bounds on a linear combination of the rate of the first encoder and the type-II error exponent of the relay, (i.e., $log β_{2} + b_{1} log N_{1}$ for any $b_{1} \in R_{+}$ ) for any type-I error probability $β_{1} \in (0, 1)$ at the relay.
(ii): Secondly, we study the second special case in which the relay does not make a decision. Using similar steps to the proof of Theorem 2, we can obtain a lower bound on a linear combination of the rate at the transmitter, the rate at the relay and the type-II exponent at the receiver (i.e., $b_{2} log N_{1} + c log η_{2} + d log N_{2}$ for any $(b_{2}, c, d) \in R_{+}^{3}$ ) for any type-I error probability $η_{1} \in (0, 1)$ at the receiver.
(iii): Finally, combining the results obtained in the first two steps, we obtain a lower bound on the linear combination of rates and type-II exponents (as shown in Theorem 3). The proof is completed by using standard single-letterization steps and the variational formula in Equation (37).

Using Theorem 3, we obtain the following proposition.

Proposition 1.

For any

(ε_{1}, ε_{2}) \in {(0, 1)}^{2}

such that

ε_{1} + ε_{2} > 1

, we have

\begin{matrix} R (ε_{1}, ε_{2}) = \tilde{R} . \end{matrix}

(39)

The converse proof of Proposition 1 follows from Theorem 3 and the alternative characterization of

\tilde{R}

in (37). The achievability proof is inspired by ([6] (Theorem 5)) and is provided in Appendix A. The main idea is that we can time-share between two close-to optimal coding schemes, each of which corresponds to one special case of the current problem as mentioned after Theorem 3.

Recall that in the first remark of Theorem 2, we provide an exact characterization of the rate-exponent region for any

(ε_{1}, ε_{2}) \in {(0, 1)}^{2}

such that

ε_{1} + ε_{2} < 1

. The converse proof follows from Theorem 2 and the achievability part was given in ([20] (Corollary 1)). Combining the first remark of Theorem 2 and Proposition 1, we provide an exact characterization of

R (ε_{1}, ε_{2})

for any

(ε_{1}, ε_{2}) \in {(0, 1)}^{2}

such that

ε_{1} + ε_{2} \neq 1

. We remark the case in which

ε_{1} + ε_{2} = 1

was also excluded in the analysis of the successive refinement of hypothesis testing with communication constraints problem studied by Tian and Chen [6]. In fact, our converse result in Theorem 3 holds for any

(ε_{1}, ε_{2}) \in {(0, 1)}^{2}

including the case

ε_{1} + ε_{2} = 1

. However, the achievability result presented in Appendix A holds only when

ε_{1} + ε_{2} > 1

and thus we are unable to characterize

R (ε_{1}, ε_{2})

when

ε_{1} + ε_{2} = 1

. Because of the need to propose an achievability scheme which uses completely different techniques to handle the case in which

ε_{1} + ε_{2} = 1

, which does not dovetail with the main message and contribution of this paper, we omit this case in this paper.

4. Proof of Theorem 2

4.1. Preliminaries

Before presenting the proof of Theorem 2, in this subsection, we briefly review the two strong converse techniques that we judiciously combine in this work, namely the change-of-measure technique by Tyagi and Watanabe [13] and the hypercontractivity technique by Liu et al. [12].

The critical step in the strong converse technique by Tyagi and Watanabe [13] is to construct a truncated source distribution, which is supported over a small set related to the decoding regions. Furthermore, the constructed truncated distribution should satisfy the following conditions:

(i): The truncated distribution is close to the original source distribution in terms of the KL divergence;
(ii): Under the truncated distribution, the (type-I) error probability is small.

Subsequent steps proceed similarly as the weak converse analysis of the problem and lead to bounds on the rates and (type-II) exponents. We then single-letterize the obtained bounds (using classical techniques in information theory without the memoryless property, e.g., [21]). Finally, we relate the single-letterized results to the the variational characterization [14,16] of the fundamental limit of the problem, which uses the idea of replacing hard Markov constraints with soft information costs.

The advantage of the Tyagi-Watanabe technique lies in its simplicity and similarity to weak converse analyses. In contrast, the disadvantage of the technique is that the structure of the source distribution (e.g., Markovian) is potentially lost in the constructed truncated distribution. As we have illustrated briefly after Theorem 2, this disadvantage prevents us from solely using the Tyagi-Watanabe technique to prove the strong converse theorem for our setting.

On the other hand, the key technique in the strong converse technique by Liu et al. [12] is the use of ideas from reverse hypercontractivity. In particular, one needs to use the variational formula of the KL divergence ([22] (Chapter 12)) and carefully construct Markov semigroups. The operation of applying a Markov semigroup is similar to a soft version of blowing up of decoding sets [4] for the discrete memoryless case. The advantage of the strong converse technique by Liu et al. lies in its wide applicability (beyond discrete settings) and its versatile performance (beyond showing strong converses it can be used to show that the second order terms scale as

O (\sqrt{n})

). However, the construction of appropriate Markov semigroups is problem-specific, which limits its applicability to other information-theoretic problems in the sense that one has to construct specific semigroups for each problem. Fortunately, in our setting this construction and combination with Tyagi-Watanabe’s technique, is feasible.

4.2. Summary of Proof Steps

In the rest of this section, we present the proof of strong converse theorem for the hypothesis testing over the two-hop network. The proof follows by combining the techniques in [12,13] and is separated into three main steps. First, we construct a truncated source distribution

P_{{\tilde{X}}^{n} {\tilde{Y}}^{n} {\tilde{Z}}^{n}}

and show that this truncated distribution is not too different from

P_{X Y Z}^{n}

in terms of the relative entropy. Subsequently, we analyze the exponents of type-II error probabilities at the relay and the receiver under the constraint that their type-I error probabilities are non-vanishing. Finally, we single-letterize the constraints on rate and error exponents to obtain desired result in Theorem 2.

To begin with, let us fix an

(n, N_{1}, N_{2})

-code with functions

(f_{1}, f_{2}, g_{1}, g_{2})

such that the type-I error probabilities are bounded above by

ε_{1} \in (0, 1)

and

ε_{2} \in (0, 1)

respectively, i.e.,

β_{1} \leq ε_{1}

and

η_{1} \leq ε_{2}

. We note from (19) and (20) that

β_{1} \leq ε_{1} + o (1)

and

β_{2} \leq ε_{2} + o (1)

. Since the

o (1)

terms are immaterial in the subsequent analyses, they are omitted for brevity.

4.3. Step 1: Construction of a Truncated Distribution

Paralleling the definitions of acceptance regions in (8) and (9), we define the following acceptance regions at the relay and the receiver as

\begin{matrix} D_{Y, n} = {(x^{n}, y^{n}) : g_{1} (f_{1} (x^{n}), y^{n}) = H_{0}}, \end{matrix}

(40)

\begin{matrix} D_{Z, n} = {(x^{n}, y^{n}, z^{n}) : g_{2} (f_{2} (f_{1} (x^{n}), y^{n}), z^{n}) = H_{0}}, \end{matrix}

(41)

respectively. Note that the only difference between

A_{Y, n}

and

D_{Y, n}

lies in whether we consider the compression index

m_{1}

or the original source sequence

x^{n}

. Recalling the definitions of the type-I error probabilities for the relay denoted by

β_{1}

in (14) and for the receiver denoted by

η_{1}

in (16), and using (40) and (41), we conclude that

\begin{matrix} P_{X Y}^{n} (D_{Y, n}) & = 1 - β_{1}, \end{matrix}

(42)

\begin{matrix} P_{X Y Z}^{n} (D_{Z, n}) & = 1 - η_{1} . \end{matrix}

(43)

For further analysis, given any

m_{2} \in M_{2}

, define a conditional acceptance region at the receiver (conditioned on

m_{2}

) as

\begin{matrix} G (m_{2}) & : = {z^{n} : g_{2} (m_{2}, z^{n}) = H_{0}} . \end{matrix}

(44)

For ease of notation, given any

(x^{n}, y^{n}) \in X^{n} \times Y^{n}

, we use

G (x^{n}, y^{n})

and

G (f_{2} (f_{1} (x^{n}), y^{n}))

(here

f_{2} (f_{1} (x^{n}), y^{n})

plays the role of

m_{2}

in (44)) interchangeably and define the following set

\begin{matrix} B_{n} : = \{(x^{n}, y^{n}) : P_{Z | Y}^{n} (G (x^{n}, y^{n}) | y^{n}) \geq \frac{1 - ε_{1} - ε_{2}}{1 + 3 ε_{2} - ε_{1}}\} . \end{matrix}

(45)

Combining (41), (43) and (44), we obtain

\begin{matrix} 1 - ε_{2} & \leq P_{X Y Z}^{n} (D_{Z, n}) \end{matrix}

(46)

\begin{matrix} = \sum_{(x^{n}, y^{n}) \in B_{n}} P_{X Y}^{n} (x^{n}, y^{n}) P_{Z | Y}^{n} (G (x^{n}, y^{n}) | y^{n}) + \sum_{(x^{n}, y^{n}) \notin B_{n}} P_{X Y}^{n} (x^{n}, y^{n}) P_{Z | Y}^{n} (G (x^{n}, y^{n}) | y^{n}) \end{matrix}

(47)

\begin{matrix} \leq P_{X Y}^{n} (B_{n}) + (1 - P_{X Y}^{n} (B_{n})) \frac{1 - ε_{1} - ε_{2}}{1 + 3 ε_{2} - ε_{1}} . \end{matrix}

(48)

Thus, we have

\begin{matrix} P_{X Y}^{n} (B_{n}) \geq \frac{3 - 3 ε_{2} + ε_{1}}{4} . \end{matrix}

(49)

For subsequent analyses, let

\begin{matrix} μ & : = {(min_{y : P_{Y} (y) > 0} P_{Y} (y))}^{- 1}, \end{matrix}

(50)

\begin{matrix} θ_{n} & : = \sqrt{\frac{3 μ}{n} log \frac{8 | Y |}{1 - ε_{1} - ε_{2}}}, \end{matrix}

(51)

and define the typical set

T_{n} (P_{Y})

as

\begin{matrix} T_{n} (P_{Y}) = {y^{n} : | {\hat{P}}_{y^{n}} (y) - P_{Y} (y) | \leq θ_{n} P_{Y} (y) \forall y \in Y} . \end{matrix}

(52)

Using the Chernoff bound, we conclude that when n is sufficiently large,

\begin{matrix} P_{Y}^{n} (T_{n} (P_{Y})) \geq 1 - \frac{1 - ε_{1} - ε_{2}}{4} . \end{matrix}

(53)

Now, define the following set

\begin{matrix} C_{n} & : = B_{n} \cap D_{Y, n} \cap (X^{n} \times T_{n} (P_{Y})) . \end{matrix}

(54)

Then, combining (42), (49) and (53), we conclude that when n is sufficiently large,

\begin{matrix} P_{X Y}^{n} (C_{n}) \geq 1 - P_{X Y}^{n} (B_{n}^{c}) - P_{X Y}^{n} (D_{Y, n}^{c}) - P_{Y}^{n} (T_{n}^{c} (P_{Y})) \geq \frac{1 - ε_{1} - ε_{2}}{2} . \end{matrix}

(55)

Let the truncated distribution

P_{{\tilde{X}}^{n} {\tilde{Y}}^{n} {\tilde{Z}}^{n}}

be defined as

\begin{matrix} P_{{\tilde{X}}^{n} {\tilde{Y}}^{n} {\tilde{Z}}^{n}} (x^{n}, y^{n}, z^{n}) & : = \frac{P_{X Y}^{n} (x^{n}, y^{n}) 1 {(x^{n}, y^{n}) \in C_{n}}}{P_{X Y}^{n} (C_{n})} P_{Z | Y}^{n} (z^{n} | y^{n}) . \end{matrix}

(56)

Note that under our constructed truncated distribution

P_{{\tilde{X}}^{n} {\tilde{Y}}^{n} {\tilde{Z}}^{n}}

, the Markov chain

{\tilde{X}}^{n} - {\tilde{Y}}^{n} - {\tilde{Z}}^{n}

holds.In other words, the Markovian property of the original source distribution

P_{X Y Z}^{n}

is retained for the truncated distribution

P_{{\tilde{X}}^{n} {\tilde{Y}}^{n} {\tilde{Z}}^{n}}

, which appears to be necessary to obtain a tight result if one wishes to use weak converse techniques. This is critical for our subsequent analyses.

Using the result in (55), we have that the marginal distribution

P_{{\tilde{X}}^{n}}

satisfies that for any

x^{n} \in X^{N}

,

\begin{matrix} P_{{\tilde{X}}^{n}} (x^{n}) & = \sum_{y^{n}, z^{n}} P_{{\tilde{X}}^{n} {\tilde{Y}}^{n} {\tilde{Z}}^{n}} (x^{n}, y^{n}, z^{n}) \end{matrix}

(57)

\begin{matrix} \leq \frac{P_{X}^{n} (x^{n})}{P_{X Y}^{n} (C_{n})} \leq \frac{2 P_{X}^{n} (x^{n})}{1 - ε_{1} - ε_{2}} . \end{matrix}

(58)

Analogously to (58), we obtain that

\begin{matrix} P_{{\tilde{Y}}^{n}} (y^{n}) & \leq \frac{2 P_{Y}^{n} (y^{n})}{1 - ε_{1} - ε_{2}}, \forall y^{n} \in Y^{n}, \end{matrix}

(59)

\begin{matrix} P_{{\tilde{Z}}^{n}} (z^{n}) & \leq \frac{2 P_{Z}^{n} (z^{n})}{1 - ε_{1} - ε_{2}}, \forall z^{n} \in Z^{n} . \end{matrix}

(60)

Finally, note that

\begin{matrix} D (P_{{\tilde{X}}^{n} {\tilde{Y}}^{n} {\tilde{Z}}^{n}} ∥ P_{X Y Z}^{n}) & = D (P_{{\tilde{X}}^{n} {\tilde{Y}}^{n}} ∥ P_{X Y}^{n}) \end{matrix}

(61)

\begin{matrix} = log \frac{1}{P_{X Y}^{n} (C_{n})} \end{matrix}

(62)

\begin{matrix} \leq log \frac{2}{1 - ε_{1} - ε_{2}} . \end{matrix}

(63)

4.4. Step 2: Analyses of the Error Exponents of Type-II Error Probabilities

4.4.1. Type-II Error Probability $β_{2}$ at the Relay

Let

{\tilde{M}}_{1}

and

{\tilde{M}}_{2}

be the outputs of encoders

f_{1}

and

f_{2}

respectively when the tuple of source sequences

({\tilde{X}}^{n}, {\tilde{Y}}^{n}, {\tilde{Z}}^{n})

is distributed according to

P_{{\tilde{X}}^{n} {\tilde{Y}}^{n} {\tilde{Z}}^{n}}

defined in (56). Thus, recalling the definitions in (10), (11) and (56), we find that the joint distribution of

({\tilde{X}}^{n}, {\tilde{Y}}^{n}, {\tilde{Z}}^{n}, {\tilde{M}}_{1}, {\tilde{M}}_{2})

is given by

\begin{matrix} P_{{\tilde{X}}^{n} {\tilde{Y}}^{n} {\tilde{Z}}^{n} {\tilde{M}}_{1} {\tilde{M}}_{2}} (x^{n}, y^{n}, z^{n}, m_{1}, m_{2}) = P_{{\tilde{X}}^{n} {\tilde{Y}}^{n} {\tilde{Z}}^{n}} (x^{n}, y^{n}, z^{n}) P_{M_{1} | X^{n}} (m_{1} | x^{n}) P_{M_{2} | Y^{n} M_{1}} (m_{2} | y^{n}, m_{1}) . \end{matrix}

(64)

Let

P_{{\tilde{M}}_{1} {\tilde{Y}}^{n}}

be induced by

P_{{\tilde{X}}^{n} {\tilde{Y}}^{n} {\tilde{Z}}^{n} {\tilde{M}}_{1} {\tilde{M}}_{2}}

. Combining (8) and (56), we conclude that

\begin{matrix} P_{{\tilde{M}}_{1} {\tilde{Y}}^{n}} (A_{Y, n}) & = \sum_{\begin{matrix} x^{n}, y^{n}, z^{n}, m_{1}, m_{2} : \\ g_{1} (m_{1}, y^{n}) = H_{0} \end{matrix}} P_{{\tilde{X}}^{n} {\tilde{Y}}^{n} {\tilde{Z}}^{n} {\tilde{M}}_{1} {\tilde{M}}_{2}} (x^{n}, y^{n}, z^{n}, m_{1}, m_{2}) \end{matrix}

(65)

\begin{matrix} = \sum_{x^{n}, y^{n} : g_{1} (f_{1} (x^{n}), y^{n}) = H_{0}} \frac{P_{X Y}^{n} (x^{n}, y^{n}) 1 {(x^{n}, y^{n}) \in C_{n}}}{P_{X Y}^{n} (C_{n})} \end{matrix}

(66)

\begin{matrix} = \sum_{x^{n}, y^{n}} \frac{P_{X Y}^{n} (x^{n}, y^{n}) 1 {(x^{n}, y^{n}) \in C_{n}}}{P_{X Y}^{n} (C_{n})} \end{matrix}

(67)

\begin{matrix} = 1 . \end{matrix}

(68)

where (67) follows from the definition of

D_{Y, n}

in (40) and the fact that

D_{Y, n} \subseteq C_{n}

.

Thus, using the data processing inequality for the relative entropy and the definition of

β_{2}

in (15), we obtain that

\begin{matrix} D (P_{{\tilde{M}}_{1} {\tilde{Y}}^{n}} ∥ P_{M_{1}} P_{Y}^{n}) & \geq D_{b} (P_{{\tilde{M}}_{1} {\tilde{Y}}^{n}} (A_{Y, n}) ∥ P_{M_{1}} P_{Y}^{n} (A_{Y, n})) \end{matrix}

(69)

\begin{matrix} = - log (P_{M_{1}} P_{Y}^{n} (A_{Y, n})) \end{matrix}

(70)

\begin{matrix} = - log β_{2} . \end{matrix}

(71)

Furthermore, recalling that

M_{1}

denotes the output of encoder

f_{1}

when

(X^{n}, Y^{n}, Z^{n}) \sim P_{X Y Z}^{n}

and

{\tilde{M}}_{1}

denotes the output of encoder

f_{1}

when

(X^{n}, Y^{n}, Z^{n}) \sim P_{{\tilde{X}}^{n} {\tilde{Y}}^{n} {\tilde{Z}}^{n}}

, and using the result in (58), we conclude that

\begin{matrix} P_{{\tilde{M}}_{1}} (m_{1}) & = \sum_{x^{n}, y^{n}, z^{n} : f_{1} (x^{n}) = m_{1}} P_{{\tilde{X}}^{n} {\tilde{Y}}^{n} {\tilde{Z}}^{n}} (x^{n}, y^{n}, z^{n}) \end{matrix}

(72)

\begin{matrix} = \sum_{x^{n} : f_{1} (x^{n}) = m_{1}} P_{{\tilde{X}}^{n}} (x^{n}) \end{matrix}

(73)

\begin{matrix} \leq \sum_{x^{n} : f_{1} (x^{n}) = m_{1}} \frac{2 P_{X}^{n} (x^{n})}{1 - ε_{1} - ε_{2}} \end{matrix}

(74)

\begin{matrix} \leq \frac{2 P_{M_{1}} (m_{1})}{1 - ε_{1} - ε_{2}}, \end{matrix}

(75)

for any

m_{1} \in M_{1}

. Thus, combining (59), (71) and (75), we have

\begin{matrix} - log β_{2} & \leq D (P_{{\tilde{M}}_{1} {\tilde{Y}}^{n}} ∥ P_{M_{1}} P_{Y}^{n}) \end{matrix}

(76)

\begin{matrix} = D (P_{{\tilde{M}}_{1} {\tilde{Y}}^{n}} ∥ P_{{\tilde{M}}_{1}} P_{{\tilde{Y}}^{n}}) + E_{P_{{\tilde{M}}_{1} {\tilde{Y}}^{n}}} [log \frac{P_{{\tilde{M}}_{1}} ({\tilde{M}}_{1}) P_{{\tilde{Y}}^{n}} ({\tilde{Y}}^{n})}{P_{M_{1}} ({\tilde{M}}_{1}) P_{Y}^{n} ({\tilde{Y}}^{n})}] \end{matrix}

(77)

\begin{matrix} \leq D (P_{{\tilde{M}}_{1} {\tilde{Y}}^{n}} ∥ P_{{\tilde{M}}_{1}} P_{{\tilde{Y}}^{n}}) + E_{P_{{\tilde{M}}_{1} {\tilde{Y}}^{n}}} [log \frac{\frac{2 P_{M_{1}} ({\tilde{M}}_{1})}{1 - ε_{1} - ε_{2}} \frac{2 P_{Y}^{n} ({\tilde{Y}}^{n})}{1 - ε_{1} - ε_{2}}}{P_{M_{1}} ({\tilde{M}}_{1}) P_{Y}^{n} ({\tilde{Y}}^{n})}] \end{matrix}

(78)

\begin{matrix} = I (\tilde{M}; {\tilde{Y}}^{n}) + 2 log \frac{2}{1 - ε_{1} - ε_{2}} . \end{matrix}

(79)

4.4.2. Type-II Error Probability $η_{2}$ at the Receiver

In this subsection, we analyze the error exponent of the type-II error probability at the receiver. For this purpose, we make use of the method introduced in [12] based on reverse hypercontractivity. We define the following additional notation:

Give $P_{Y Z} \in P (Y \times Z)$ , define

$\begin{matrix} α : = max_{y, z} \frac{P_{Z | Y} (z | y)}{P_{Z} (z)} \in (1, \infty) . \end{matrix}$

(80)

In the subsequent analysis, we only consider the case when $α > 1$ . When $α = 1$ , choosing $t = \frac{1}{\sqrt{n}}$ instead of the choice in (101), we can obtain a similar upper bound for $- log η_{2}$ as in (102), where the only difference is that $Ψ (n, ε_{1}, ε_{2})$ should be replaced by another term scaling in order $Θ (\sqrt{n})$ .
Given any $(ε_{1}, ε_{2}) \in {(0, 1)}^{2}$ such that $ε_{1} + ε_{2} < 1$ , let

$\begin{matrix} Ψ (n, ε_{1}, ε_{2}) : = 2 \sqrt{n (α - 1) log \frac{1 + 3 ε_{2} - ε_{1}}{1 - ε_{1} - ε_{2}}} . \end{matrix}$

(81)
Give any $m_{2} \in M_{2}$ and $z^{n} \in Z^{n}$ , let

$\begin{matrix} h (m_{2}, z^{n}) : = 1 {z^{n} \in G (m_{2})} . \end{matrix}$

(82)
Two operators in ([12] (Equations (25), (26), (29)))

$\begin{matrix} Λ_{α, t} & = {(exp (- t) + α (1 - exp (- t)) P_{Z})}^{\otimes n}, \end{matrix}$

(83)

$\begin{matrix} T_{y^{n}, t} & = \prod_{i = 1}^{n} (exp (- t) + (1 - exp (- t)) P_{Z | y_{i}}) . \end{matrix}$

(84)

Note that in (84), we use the convenient notation

P_{Z | y} (z) = P_{Z | Y} (z | y)

. The two operators in (83) and (84) will be used to lower bound

D (P_{{\tilde{Z}}^{n} {\tilde{M}}_{2}} ∥ P_{Z}^{n} {\bar{P}}_{M_{2}})

via a variational formula of the relative entropy (cf. ([12] (Section 4))).

Let

P_{{\tilde{Z}}^{n} {\tilde{M}}_{2}}

,

P_{{\tilde{Z}}^{n} | {\tilde{M}}_{2}}

,

P_{{\tilde{Z}}^{n} | {\tilde{Y}}^{n}}

be induced by the joint distribution

P_{{\tilde{X}}^{n} {\tilde{Y}}^{n} {\tilde{Z}}^{n} {\tilde{M}}_{1} {\tilde{M}}_{2}}

in (64) and let

{\bar{P}}_{M_{2}}

be induced by the joint distribution

{\bar{P}}_{X^{n} Y^{n} Z^{n} M_{1} M_{2}}

in (13). Invoking the variational formula for the relative entropy ([22] (Equation (2.4.67))) and recalling the notation

P (f) = E_{P} [f]

, we have

\begin{matrix} D (P_{{\tilde{Z}}^{n} {\tilde{M}}_{2}} ∥ P_{Z}^{n} {\bar{P}}_{M_{2}}) & \geq P_{{\tilde{Z}}^{n} {\tilde{M}}_{2}} (log Λ_{α, t} h ({\tilde{M}}_{2}, {\tilde{Z}}^{n})) - log ((P_{Z}^{n} {\bar{P}}_{M_{2}}) (Λ_{α, t} h (M_{2}, Z^{n}))) . \end{matrix}

(85)

Given any

m_{2} \in M_{2}

, similar to ([12] (Equations (18)–(21))), we obtain

\begin{matrix} P_{Z}^{n} (Λ_{α, t} h (m_{2}, Z^{n})) \end{matrix}

\begin{matrix} = P_{Z}^{n} ({(exp (- t) + α (1 - exp (- t)) P_{Z})}^{\otimes n} h (m_{2}, Z^{n})) \end{matrix}

(86)

\begin{matrix} = {(exp (- t) + α (1 - exp (- t)))}^{n} P_{Z}^{n} (h (m_{2}, Z^{n})) \end{matrix}

(87)

\begin{matrix} \leq exp ((α - 1) n t) P_{Z}^{n} (h (m_{2}, Z^{n})) . \end{matrix}

(88)

Thus, averaging over

m_{2}

with distribution

{\bar{P}}_{M_{2}}

on both sides of (88), we have

\begin{matrix} (P_{Z}^{n} {\bar{P}}_{M_{2}}) (Λ_{α, t} h (M_{2}, Z^{n})) \end{matrix}

\begin{matrix} \leq exp ((α - 1) n t) ({\bar{P}}_{M_{2}} P_{Z}^{n}) (h (M_{2}, Z^{n})) \end{matrix}

(89)

\begin{matrix} = exp ((α - 1) n t) η_{2}, \end{matrix}

(90)

where (90) follows from the definition of

η_{2}

in (17).

Furthermore, given any

{\tilde{m}}_{2} \in M_{2}

, we obtain

\begin{matrix} P_{{\tilde{Z}}^{n} | {\tilde{m}}_{2}} (log Λ_{α, t} h ({\tilde{m}}_{2}, {\tilde{Z}}^{n})) \end{matrix}

(91)

\begin{matrix} = (\sum_{{\tilde{y}}^{n}} P_{{\tilde{Z}}^{n} | {\tilde{y}}^{n}} P_{{\tilde{Y}}^{n} | {\tilde{M}}_{2}} ({\tilde{y}}^{n} | {\tilde{m}}_{2})) (log Λ_{α, t} h ({\tilde{m}}_{2}, {\tilde{Z}}^{n})) \end{matrix}

(92)

\begin{matrix} = \sum_{{\tilde{y}}^{n}} P_{{\tilde{Y}}^{n} | {\tilde{M}}_{2}} ({\tilde{y}}^{n} | {\tilde{m}}_{2}) P_{{\tilde{Z}}^{n} | {\tilde{y}}^{n}} (log Λ_{α, t} h ({\tilde{m}}_{2}, {\tilde{Z}}^{n})) \end{matrix}

(93)

\begin{matrix} \geq \sum_{{\tilde{y}}^{n}} P_{{\tilde{Y}}^{n} | {\tilde{M}}_{2}} ({\tilde{y}}^{n} | {\tilde{m}}_{2}) P_{{\tilde{Z}}^{n} | {\tilde{y}}^{n}} (log T_{y^{n}, t} h ({\tilde{m}}_{2}, {\tilde{Z}}^{n})) \end{matrix}

(94)

\begin{matrix} \geq \sum_{{\tilde{y}}^{n}} P_{{\tilde{Y}}^{n} | {\tilde{M}}_{2}} ({\tilde{y}}^{n} | {\tilde{m}}_{2}) (1 + \frac{1}{t}) log P_{{\tilde{Z}}^{n} | {\tilde{y}}^{n}} (h ({\tilde{m}}_{2}, {\tilde{Z}}^{n})) \end{matrix}

(95)

\begin{matrix} = (1 + \frac{1}{t}) (\sum_{{\tilde{y}}^{n}} P_{{\tilde{Y}}^{n} | {\tilde{M}}_{2}} ({\tilde{y}}^{n} | {\tilde{m}}_{2}) log P_{{\tilde{Z}}^{n} | {\tilde{y}}^{n}} (G ({\tilde{m}}_{2}))) . \end{matrix}

(96)

where (94) follows from ([12] (Lemma 4)) and (95) follows similarly to ([12] (Equations (14)–(17))).

Thus, averaging on both sides of (96) over

{\tilde{m}}_{2}

with distribution

P_{{\tilde{M}}_{2}}

and using the definition of the joint distribution

P_{{\tilde{X}}^{n} {\tilde{Y}}^{n} {\tilde{Z}}^{n} {\tilde{M}}_{1} {\tilde{M}}_{2}}

in (64), we obtain that

\begin{matrix} P_{{\tilde{Z}}^{n} {\tilde{M}}_{2}} (log Λ_{α, t} h ({\tilde{M}}_{2}, {\tilde{Z}}^{n})) \end{matrix}

\begin{matrix} \geq (1 + \frac{1}{t}) (\sum_{{\tilde{y}}^{n}, {\tilde{m}}_{2}} P_{{\tilde{Y}}^{n} {\tilde{M}}_{2}} ({\tilde{y}}^{n}, {\tilde{m}}_{2}) log P_{{\tilde{Z}}^{n} | {\tilde{y}}^{n}} (G ({\tilde{m}}_{2}))) \\ = (1 + \frac{1}{t}) \sum_{{\tilde{x}}^{n}, {\tilde{y}}^{n}, {\tilde{m}}_{1}, {\tilde{m}}_{2}} (P_{{\tilde{X}}^{n} {\tilde{Y}}^{n}} ({\tilde{x}}^{n}, {\tilde{y}}^{n}) 1 {{\tilde{m}}_{1} = f_{1} ({\tilde{x}}^{n}), {\tilde{m}}_{2} = f_{2} ({\tilde{m}}_{1}, {\tilde{y}}^{n})} \end{matrix}

(97)

\begin{matrix} \times log (\sum_{{\tilde{z}}^{n} : g_{2} ({\tilde{z}}^{n}, {\tilde{m}}_{2}) = H_{0}} P_{Z | Y}^{n} ({\tilde{z}}^{n} | {\tilde{y}}^{n}))) \end{matrix}

(98)

\begin{matrix} = (1 + \frac{1}{t}) (\sum_{\begin{matrix} {\tilde{x}}^{n}, {\tilde{y}}^{n} \end{matrix}} \frac{P_{X Y}^{n} ({\tilde{x}}^{n}, {\tilde{y}}^{n}) 1 {({\tilde{x}}^{n}, {\tilde{y}}^{n}) \in C_{n}}}{P_{X Y}^{n} (C_{n})} log P_{Z | Y}^{n} (G ({\tilde{x}}^{n}, {\tilde{y}}^{n}) | {\tilde{y}}^{n})) \end{matrix}

(99)

\begin{matrix} \geq (1 + \frac{1}{t}) log \frac{1 - ε_{1} - ε_{2}}{1 + 3 ε_{2} - ε_{1}}, \end{matrix}

(100)

where (100) follows from the definitions of

B_{n}

in (45) and

C_{n}

in (54).

Therefore, combining (85), (90) and (100) and choosing

\begin{matrix} t = \sqrt{\frac{1}{n (α - 1)} log \frac{1 + 3 ε_{2} - ε_{1}}{1 - ε_{1} - ε_{2}}}, \end{matrix}

(101)

via simple algebra, we obtain that

\begin{matrix} - log η_{2} & \leq D (P_{{\tilde{Z}}^{n} {\tilde{M}}_{2}} ∥ P_{Z^{n}} {\bar{P}}_{M_{2}}) + Ψ (n, ε_{1}, ε_{2}) - log \frac{1 - ε_{1} - ε_{2}}{1 + 3 ε_{2} - ε_{1}} . \end{matrix}

(102)

In the following, we further upper bound

D (P_{{\tilde{Z}}^{n} {\tilde{M}}_{2}} ∥ P_{Z^{n}} {\bar{P}}_{M_{2}})

. For this purpose, define the following distribution

\begin{matrix} {\bar{P}}_{{\tilde{M}}_{2}} (m_{2}) & : = \sum_{y^{n}, m_{1}} P_{{\tilde{M}}_{1}} (m_{1}) P_{{\tilde{Y}}^{n}} (y^{n}) 1 {m_{2} = f_{2} (m_{1}, y^{n})} . \end{matrix}

(103)

Combining the results in (59) and (75), and recalling that

{\bar{P}}_{M_{2}}

is induced by joint distribution

{\bar{P}}_{X^{n} Y^{n} Z^{n} M_{1} M_{2}}

in (13), for any

m_{2} \in M_{2}

, we have

\begin{matrix} {\bar{P}}_{{\tilde{M}}_{2}} (m_{2}) & \leq {(\frac{2}{1 - ε_{1} - ε_{2}})}^{2} (\sum_{y^{n}, m_{1}} P_{M_{1}} (m_{1}) P_{Y}^{n} (y^{n}) 1 {m_{2} = f_{2} (f_{1} (x^{n}), y^{n})}) \end{matrix}

(104)

\begin{matrix} = \frac{4 {\bar{P}}_{M_{2}} (m_{2})}{{(1 - ε_{1} - ε_{2})}^{2}} . \end{matrix}

(105)

Thus, combining (60) and (105), we have

\begin{matrix} D (P_{{\tilde{Z}}^{n} {\tilde{M}}_{2}} ∥ P_{Z}^{n} {\bar{P}}_{M_{2}}) \end{matrix}

\begin{matrix} = D (P_{{\tilde{Z}}^{n} {\tilde{M}}_{2}} ∥ P_{{\tilde{Z}}^{n}} {\bar{P}}_{{\tilde{M}}_{2}}) + E_{P_{{\tilde{Z}}^{n} {\tilde{M}}_{2}}} [log \frac{P_{{\tilde{Z}}^{n}} ({\tilde{Z}}^{n}) {\bar{P}}_{{\tilde{M}}_{2}} ({\tilde{M}}_{2})}{P_{Z}^{n} ({\tilde{Z}}^{n}) {\bar{P}}_{M_{2}} ({\tilde{M}}_{2})}] \end{matrix}

(106)

\begin{matrix} \leq D (P_{{\tilde{Z}}^{n} {\tilde{M}}_{2}} ∥ P_{{\tilde{Z}}^{n}} {\bar{P}}_{{\tilde{M}}_{2}}) + E_{P_{{\tilde{Z}}^{n} {\tilde{M}}_{2}}} [log \frac{\frac{2 P_{Z}^{n} ({\tilde{Z}}^{n})}{1 - ε_{1} - ε_{2}} \frac{4 {\bar{P}}_{M_{2}} ({\tilde{M}}_{2})}{{(1 - ε_{1} - ε_{2})}^{2}}}{P_{Z}^{n} ({\tilde{Z}}^{n}) {\bar{P}}_{M_{2}} ({\tilde{M}}_{2})}] \end{matrix}

(107)

\begin{matrix} = D (P_{{\tilde{Z}}^{n} {\tilde{M}}_{2}} ∥ P_{{\tilde{Z}}^{n}} {\bar{P}}_{{\tilde{M}}_{2}}) + 3 log \frac{2}{1 - ε_{1} - ε_{2}} . \end{matrix}

(108)

Therefore, combining (102) and (108), we have

\begin{matrix} - log η_{2} & \leq D (P_{{\tilde{Z}}^{n} {\tilde{M}}_{2}} ∥ P_{{\tilde{Z}}^{n}} {\bar{P}}_{{\tilde{M}}_{2}}) + Ψ (n, ε_{1}, ε_{2}) - log \frac{1 - ε_{1} - ε_{2}}{1 + 3 ε_{2} - ε_{1}} - 3 log \frac{1 - ε_{1} - ε_{2}}{2} . \end{matrix}

(109)

4.5. Step 3: Analyses of Communication Constraints and Single-Letterization Steps

For any

(n, N_{1}, N_{2})

-code, since

{\tilde{M}}_{i} \in M_{i}

for

i \in {1, 2}

, we have that

\begin{matrix} log N_{1} & \geq H ({\tilde{M}}_{1}) \geq I ({\tilde{M}}_{1}; {\tilde{X}}^{n}, {\tilde{Y}}^{n}), \end{matrix}

(110)

\begin{matrix} log N_{2} & \geq H ({\tilde{M}}_{2}) \geq I ({\tilde{M}}_{2}; {\tilde{Y}}^{n}) . \end{matrix}

(111)

Furthermore, from the problem setting (see (64)), we have

\begin{matrix} I ({\tilde{M}}_{1}; {\tilde{Y}}^{n} | {\tilde{X}}^{n}) = 0, \end{matrix}

(112)

For subsequent analyses, given any

(b, c, d, γ) \in R_{+}^{4}

, define

\begin{matrix} R_{b, c, d, γ}^{(n)} & : = - I ({\tilde{M}}_{1}; {\tilde{Y}}^{n}) + b I ({\tilde{M}}_{1}; {\tilde{X}}^{n}, {\tilde{Y}}^{n}) - c D (P_{{\tilde{Z}}^{n} {\tilde{M}}_{2}} ∥ P_{{\tilde{Z}}^{n}} {\bar{P}}_{{\tilde{M}}_{2}}) + d I ({\tilde{M}}_{2}; {\tilde{Y}}^{n}) + γ I ({\tilde{M}}_{1}; {\tilde{Y}}^{n} | {\tilde{X}}^{n}) \\ + (b + d + γ) D (P_{{\tilde{X}}^{n} {\tilde{Y}}^{n}} ∥ P_{X^{n} Y^{n}}) . \end{matrix}

(113)

Combining the results in (63), (79), (109) to (112), for any

γ \in R_{+}

, we obtain

\begin{matrix} log β_{2} + b log N_{1} + c log η_{2} + d log N_{2} + c Ψ (n, ε_{1}, ε_{2}) \\ \geq R_{b, c, d, γ}^{(n)} + log \frac{1 - ε_{1} - ε_{2}}{1 + 3 ε_{2} - ε_{1}} + (b + d + γ + 5) log \frac{1 - ε_{1} - ε_{2}}{2} . \end{matrix}

(114)

The proof of Theorem 2 is complete by the two following lemmas which provide a single-letterized lower bound for

R_{b, c, d, γ}^{(n)}

and relate the derived lower bound to

R_{b, c, d}

. For this purpose, recalling the definition of

θ_{n}

in (51), we define the following set of joint distributions

\begin{matrix} Q_{1} & : = {Q_{X Y Z U V} \in P (X \times Y \times Z \times U \times V) : \\ Q_{Z | Y} = P_{Z | Y}, X - Y - Z, V - Y - Z, \\ | Q_{Y} (y) - P_{Y} (y) | \leq θ_{n} P_{Y} (y), \forall y \in Y} . \end{matrix}

(115)

Given

Q_{X Y Z U V} \in Q_{1}

, define

\begin{matrix} Δ_{b, d, γ} (Q_{X Y Z U V}) & : = (b + γ) D (Q_{X Y} ∥ P_{X Y}) + d D (Q_{Y} ∥ P_{Y}) + γ I_{Q} (U; Y | X) . \end{matrix}

(116)

Recall the definition of

R_{b, c, d} (Q_{X Y Z U V})

in (28). Define

\begin{matrix} R_{b, c, d, γ} & : = min_{Q_{X Y Z U V} \in Q_{1}} (R_{b, c, d} (Q_{X Y Z U V}) + Δ_{b, d, γ} (Q_{X Y Z U V})) . \end{matrix}

(117)

The following lemma presents a single-letterized lower bound for

R_{b, c, d, γ}^{(n)}

.

Lemma 1.

For any

(b, c, d, γ) \in R_{+}^{4}

,

\begin{matrix} R_{b, c, d, γ}^{(n)} \geq n R_{b, c, d, γ} . \end{matrix}

(118)

The proof of Lemma 1 is inspired by ([13] (Prop. 2)) and provided in Appendix B.

Combining the results in (114) and Lemma 1, we obtain the desired result and this completes the proof of Theorem 2.

Lemma 2.

Choosing

γ = \sqrt{n}

, we have

\begin{matrix} n R_{b, c, d, γ} + log \frac{1 - ε_{1} - ε_{2}}{1 + 3 ε_{2} - ε_{1}} + (b + d + γ + 5) log \frac{1 - ε_{1} - ε_{2}}{2} \geq n R_{b, c, d} + Θ (n^{3 / 4} log n) . \end{matrix}

(119)

The proof of Lemma 2 is inspired by ([19] (Lemma C.2)) and provided in Appendix C.

5. Discussion and Future Work

We strengthened the result in ([11] (Corollary 1)) by deriving a strong converse theorem for hypothesis testing against independence over a two-hop network with communication constraints (see Figure 1). In our proof, we combined two recently proposed strong converse techniques [12,13]. The apparent necessity of doing so comes from the Markovian requirement in the source distribution (recall (1)) and is reflected in the construction of a truncated distribution in (56) to ensure the Markovian structure of the source sequences is preserved. Subsequently, due to this constraint, the application the strong converse technique by Tyagi and Watanabe in [13] was only amenable in analyzing the type-II error exponent at the relay. On the other hand, to analyze the type-II error exponent at the receiver, we need to carefully adapt the strong converse technique based on reverse hypercontractivity by Liu, van Handel and Verdú in [12]. Furthermore, to complete the proof, we carefully combine the single-letterization techniques in [12,13].

Another important take-home message is the techniques (or a subset of the techniques) used in this paper can be applied to strengthen the results of other multiterminal hypothesis testing against independence problems. If the source distribution has no Markov structure, it is usually the case that one can directly apply the technique by Tyagi and Watanabe [13] to obtain strong converse theorems. Such examples include [7,8,9]. On the other hand, if the source sequences admit Markovian structure, then it appears necessary to combine techniques in [12,13] to obtain strong converse theorems, just as it was done in this paper.

Finally, we discuss some avenues for future research. In this paper, we only derived the strong converse but not a second-order converse result as was done in ([12] (Section 4.4)) for the problem of hypothesis testing against independence with a communication constraint [1]. Thus, in the future, one may refine the proof in the current paper by deriving second-order converse or exact second-order asymptotics. Furthermore, one may also consider deriving strong converse theorems or simplifying existing strong converse proofs for hypothesis testing problems with both communication and privacy constraints such as that in [23] by using the techniques in the current paper. It is also interesting to explore whether current techniques can be applied to obtain strong converse theorems for hypothesis testing with zero-rate compression problems [3].

Author Contributions

Formal analysis, D.C. and L.Z.; Supervision, V.Y.F.T.; Writing—original draft, D.C. and L.Z.; Writing—review & editing, V.Y.F.T.

Funding

D.C. is supported by the China Scholarship Council with No. 201706090064 and the National Natural Science Foundation of China under Grant 61571122. L.Z. was supported by NUS RSB grants (C-261-000-207-532 and C-261-000-005-001).

Acknowledgments

The authors acknowledge Sadaf Salehkalaibar (University of Tehran) for drawing our attention to ([11] (Corollary 1)) and providing helpful comments and suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Achievability Proof of Proposition 1

Fix any joint distribution

Q_{X Y Z U_{1} U_{2} V} \in Q_{2}

. Let

(f_{1}^{'}, g_{1}^{'})

be an encoder-decoder pair with rate

R_{1} = I_{Q} (U_{1}; X)

for the hypothesis testing with communication constraint problem [1] (i.e., no receiver in Figure 1) such that the type-II error probability decays exponentially fast at speed no smaller than

E_{1} = I_{Q} (U_{1}; Y)

and the type-I error probability is vanishing, i.e.,

log N_{1}^{'} \leq n R_{1}

,

β_{2}^{'} \leq exp (- n E_{1})

and

β_{1}^{'} \leq ε_{1}^{'}

for any

ε_{1}^{'} > 0

. Furthermore, let

(f_{1}^{″}, f_{2}^{″}, g_{1}^{″}, g_{2}^{″})

be a tuple of encoders and decoders with rates

(R_{1}, R_{2}) = (I_{Q} (U_{2}; X), I_{Q} (V; Y))

for the problem in Figure 1 such that the type-II error probability at the receiver decays exponentially fast at speed no smaller

E_{2} = I_{Q} (V; Z)

and type-I error probability at the receiver is vanishing, i.e.,

log N_{1}^{″} \leq n R_{1}

,

log N_{2}^{″} \leq n R_{2}

,

η_{2}^{″} \leq exp (- n E_{2})

and

η_{1}^{″} \leq ε_{2}^{'}

for any

ε_{2}^{'} > 0

. Such tuples of encoders and decoders exist as proved in [1,11]. Furthermore, let

A_{1}^{'} \subseteq X^{n} \times Y^{n}

be the acceptance region associated with

(f_{1}^{'}, g_{1}^{'})

at the relay and let

A_{2}^{'} \subseteq X^{n} \times Y^{n} \times Z^{n}

be the acceptance region associated with

(f_{1}^{″}, f_{2}^{″}, g_{1}^{″}, g_{2}^{″})

at the receiver.

Now, let us partition the source space

X^{n}

into two disjoint sets

X_{1}^{n}

and

X_{2}^{n}

such that

X_{1}^{n} \cup X_{2}^{n} = X^{n}

,

P_{X}^{n} (X_{1}^{n}) > 1 - ε_{1}

and

P_{X}^{n} (X_{2}^{n}) > 1 - ε_{2}

. We construct an

(n, N_{1}, N_{2})

-code as follows. Given a source sequence

X^{n}

, if

X^{n} \in X_{1}^{n}

, then encoder

f_{1}^{'}

is used; and if otherwise, the encoder

f_{1}^{″}

is used. Furthermore, an additional bit indicating whether

X^{n} \in X_{1}^{n}

is also sent to the relay and further forwarded to the receiver by the relay. Given encoded index

M_{1}

, if

X^{n} \in X_{1}^{n}

, the relay uses decoder

g_{1}^{'}

to make the decision; otherwise, if

X^{n} \in X_{2}^{n}

, the relay declares hypothesis

H_{1}

to be true. Furthermore, in both cases, the relay transmits an index

M_{2}

using encoder

f_{2}^{″}

. Given the index

M_{2}

, if

X^{n} \in X_{1}^{n}

, the receiver declares hypothesis

H_{1}

to be true; otherwise, the receiver uses decoder

g_{2}^{″}

to make the decision.

The performance of the constructed

(n, N_{1}, N_{2})

-code is as follows. In terms of rates, we have

\begin{matrix} log N_{1} \leq n R_{1} + 1, \end{matrix}

(A1)

\begin{matrix} log N_{2} \leq n R_{2} + 1 . \end{matrix}

(A2)

The type-I error probability at the relay satisfies that

\begin{matrix} 1 - β_{1} & = P_{X Y}^{n} {A_{1}^{'} \cap (X_{1}^{n} \times Y^{n})} \end{matrix}

(A3)

\begin{matrix} \geq P_{X}^{n} {X_{1}^{n}} - P_{X Y}^{n} {{(A_{1}^{'})}^{c}} \end{matrix}

(A4)

\begin{matrix} \geq 1 - ε_{1}, \end{matrix}

(A5)

where (A5) follows when n is sufficiently large and thus

ε_{1}^{'}

can be made arbitrarily close to zero. Furthermore, the type-II error probability at the relay can be upper bounded as follows

\begin{matrix} β_{2} & = P_{X}^{n} P_{Y}^{n} {A_{1}^{'} \cap (X_{1}^{n} \times Y^{n})} \end{matrix}

(A6)

\begin{matrix} \leq P_{X}^{n} P_{Y}^{n} {A_{1}^{'}} \end{matrix}

(A7)

\begin{matrix} = β_{2}^{'} \end{matrix}

(A8)

\begin{matrix} \leq exp (- n E_{1}^{'}) . \end{matrix}

(A9)

Similarly, for n sufficiently large, the error probabilities at the receiver can be upper bounded as follows

\begin{matrix} η_{1} & = 1 - P_{X Y Z}^{n} {A_{2}^{″} \cap (X_{2}^{n} \times Y^{n} \times Z^{n})} \end{matrix}

(A10)

\begin{matrix} \leq 1 - P_{X}^{n} (X_{2}^{n}) + P_{X Y Z}^{n} ({(A_{2}^{'})}^{c}) \end{matrix}

(A11)

\begin{matrix} \leq ε_{2}, \end{matrix}

(A12)

and

\begin{matrix} η_{2} & = P_{X}^{n} P_{Y}^{n} P_{Z}^{n} {A_{2}^{″} \cap (X_{2}^{n} \times Y^{n} \times Z^{n})} \end{matrix}

(A13)

\begin{matrix} \leq P_{X}^{n} P_{Y}^{n} P_{Z}^{n} {A_{2}^{″}} \end{matrix}

(A14)

\begin{matrix} \leq exp (- n E_{2}^{″}) . \end{matrix}

(A15)

The achievability proof of Proposition 1 is now complete.

Appendix B. Proof of Lemma 1

Recall the definition of distribution

{\bar{P}}_{{\tilde{M}}_{2}}

(see (103)). Noting that

P_{{\tilde{M}}_{2}}

is the marginal distribution induced by

P_{{\tilde{X}}^{n} {\tilde{Y}}^{n} {\tilde{Z}}^{n} {\tilde{M}}_{1} {\tilde{M}}_{2}}

(see (64)), we have that for any

{\tilde{m}}_{2} \in M_{2}

\begin{matrix} P_{{\tilde{M}}_{2}} ({\tilde{m}}_{2}) & = \sum_{y^{n}, m_{1}} P_{{\tilde{Y}}^{n} {\tilde{M}}_{1}} (y^{n}, m_{1}) 1 {{\tilde{m}}_{2} = f_{2} (m_{1}, y^{n})} . \end{matrix}

(A16)

Thus, applying the data processing inequality for the relative entropy, we have that

\begin{matrix} I ({\tilde{M}}_{1}; {\tilde{Y}}^{n}) & = D (P_{{\tilde{Y}}^{n} {\tilde{M}}_{1}} ∥ P_{{\tilde{Y}}^{n}} P_{{\tilde{M}}_{1}}) \end{matrix}

(A17)

\begin{matrix} \geq D (P_{{\tilde{M}}_{2}} ∥ {\bar{P}}_{{\tilde{M}}_{2}}) . \end{matrix}

(A18)

Using (A18) and following similar steps to the proof of weak converse in ([11] (Equationation (186))), we obtain

\begin{matrix} D (P_{{\tilde{Z}}^{n} {\tilde{M}}_{2}} ∥ P_{{\tilde{Z}}^{n}} {\bar{P}}_{{\tilde{M}}_{2}}) & = I ({\tilde{M}}_{2}; {\tilde{Z}}^{n}) + D (P_{{\tilde{M}}_{2}} ∥ {\bar{P}}_{{\tilde{M}}_{2}}) \end{matrix}

(A19)

\begin{matrix} \leq I ({\tilde{M}}_{2}; {\tilde{Z}}^{n}) + I ({\tilde{M}}_{1}; {\tilde{Y}}^{n}) . \end{matrix}

(A20)

Using (A20) and the definition of

R_{b, c, d, γ}^{(n)}

in (113), we have the following lower bound for

R_{b, c, d, γ}^{(n)}

\begin{matrix} R_{b, c, d, γ}^{(n)} & \geq - I ({\tilde{M}}_{1}; {\tilde{Y}}^{n}) + b (D (P_{{\tilde{X}}^{n} {\tilde{Y}}^{n}} ∥ P_{X^{n} Y^{n}}) + H ({\tilde{X}}^{n}, {\tilde{Y}}^{n}) - H ({\tilde{X}}^{n}, {\tilde{Y}}^{n} | {\tilde{M}}_{1})) \\ - c (I ({\tilde{M}}_{2}; Z^{n}) + I ({\tilde{M}}_{1}; {\tilde{Y}}^{n})) + d (D (P_{{\tilde{X}}^{n} {\tilde{Y}}^{n}} ∥ P_{X^{n} Y^{n}}) + H ({\tilde{Y}}^{n}) - h ({\tilde{Y}}^{n} | {\tilde{M}}_{2})) \\ + γ (D (P_{{\tilde{X}}^{n} {\tilde{Y}}^{n}} ∥ P_{X^{n} Y^{n}}) + H ({\tilde{Y}}^{n} | {\tilde{X}}^{n}) - H ({\tilde{Y}}^{n} | {\tilde{X}}^{n}, {\tilde{M}}_{1})) . \end{matrix}

(A21)

The rest of the proof concerns single-letterizing each term in (A21). For this purpose, for each

j \in [n]

, we define two auxiliary random variables

U_{j} : = ({\tilde{M}}_{1}, {\tilde{X}}^{j - 1}, {\tilde{Y}}^{j - 1})

and

V_{j} : = ({\tilde{M}}_{2}, {\tilde{Y}}^{j - 1})

and let J be a random variable which is distributed uniformly over the set

[n]

and is independent of all other random variables.

Using standard single-letterization techniques as in [21], we obtain

\begin{matrix} I ({\tilde{M}}_{1}; {\tilde{Y}}^{n}) & = \sum_{j \in [n]} I ({\tilde{M}}_{1}; {\tilde{Y}}_{j} | {\tilde{Y}}^{j - 1}) \end{matrix}

(A22)

\begin{matrix} \leq \sum_{j \in [n]} I ({\tilde{M}}_{1}, {\tilde{Y}}^{j - 1}; {\tilde{Y}}_{j}) \end{matrix}

(A23)

\begin{matrix} \leq \sum_{j \in [n]} I ({\tilde{M}}_{1}, {\tilde{X}}^{j - 1}, {\tilde{Y}}^{j - 1}; {\tilde{Y}}_{j}) \end{matrix}

(A24)

\begin{matrix} = n I (U_{J}, J; {\tilde{Y}}_{J}), \end{matrix}

(A25)

and

\begin{matrix} H ({\tilde{X}}^{n}, {\tilde{Y}}^{n} | {\tilde{M}}_{1}) & = n H ({\tilde{X}}_{J}, {\tilde{Y}}_{J} | U_{J}, J) . \end{matrix}

(A26)

Furthermore, analogous to ([13] (Prop. 1)), we obtain that

\begin{matrix} H ({\tilde{X}}^{n}, {\tilde{Y}}^{n}) + D (P_{{\tilde{X}}^{n} {\tilde{Y}}^{n}} ∥ P_{X Y}^{n}) & = \sum_{x^{n}, y^{n}} P_{{\tilde{X}}^{n} {\tilde{Y}}^{n}} (x^{n}, y^{n}) log \frac{1}{P_{X Y}^{n} (x^{n}, y^{n})} \end{matrix}

(A27)

\begin{matrix} = \sum_{x^{n}, y^{n}} P_{{\tilde{X}}^{n} {\tilde{Y}}^{n}} (x^{n}, y^{n}) \sum_{j \in [n]} log \frac{1}{P_{X Y} (x_{j}, y_{j})} \end{matrix}

(A28)

\begin{matrix} = \sum_{j \in [n]} P_{{\tilde{X}}_{j} {\tilde{Y}}_{j}} (x_{j}, y_{j}) log \frac{1}{P_{X Y} (x_{j}, y_{j})} \end{matrix}

(A29)

\begin{matrix} = n (H ({\tilde{X}}_{J}, {\tilde{Y}}_{J}) + D (P_{{\tilde{X}}_{J} Y_{J}} ∥ P_{X Y})) . \end{matrix}

(A30)

Subsequently, we can single-letterize

I ({\tilde{M}}_{2}; {\tilde{Z}}^{n})

as follows:

\begin{matrix} I ({\tilde{M}}_{2}; {\tilde{Z}}^{n}) & = \sum_{j \in [n]} I ({\tilde{M}}_{2}; {\tilde{Z}}_{j} | {\tilde{Z}}^{j - 1}) \end{matrix}

(A31)

\begin{matrix} \leq \sum_{j \in [n]} I ({\tilde{M}}_{2}, {\tilde{Z}}^{j - 1}, {\tilde{Y}}^{j - 1}; {\tilde{Z}}_{j}) \end{matrix}

(A32)

\begin{matrix} = \sum_{j \in [n]} I ({\tilde{M}}_{2}, {\tilde{Y}}^{j - 1}; {\tilde{Z}}_{j}) \end{matrix}

(A33)

\begin{matrix} = n I (V_{J}, J; {\tilde{Z}}_{J}), \end{matrix}

(A34)

where (A33) follows from the Markov chain

{\tilde{Z}}^{j - 1} - {\tilde{M}}_{2} {\tilde{Y}}^{j - 1} - {\tilde{Z}}_{j}

implied by the joint distribution of

({\tilde{X}}^{n}, {\tilde{Y}}^{n}, {\tilde{Z}}^{n}, {\tilde{M}}_{1}, {\tilde{M}}_{2})

in (64). Furthermore, using similar proof techniques to ([13] (Prop. 1)) and standard single-letterization techniques (e.g., in [4] or [21]), we obtain that

\begin{matrix} H ({\tilde{Y}}^{n} | {\tilde{X}}^{n}) + D (P_{{\tilde{X}}^{n} {\tilde{Y}}^{n}} ∥ P_{X Y}^{n}) & \geq n (H ({\tilde{Y}}_{J} | {\tilde{X}}_{J}) + D (P_{{\tilde{X}}_{J} {\tilde{Y}}_{J}} ∥ P_{X Y})), \end{matrix}

(A35)

\begin{matrix} H ({\tilde{Y}}^{n}) + D (P_{{\tilde{X}}^{n} {\tilde{Y}}^{n}} ∥ P_{X Y}^{n}) & \geq n (H ({\tilde{Y}}_{J}) + D (P_{Y_{J}} ∥ P_{Y})), \end{matrix}

(A36)

\begin{matrix} H ({\tilde{Y}}^{n} | {\tilde{M}}_{2}) & = n H ({\tilde{Y}}_{J} | V_{J}, J), \end{matrix}

(A37)

\begin{matrix} H ({\tilde{Y}}^{n} | {\tilde{M}}_{1}, {\tilde{X}}^{n}) & \leq n H ({\tilde{Y}}_{J} | X_{J}, U_{J}, J) . \end{matrix}

(A38)

Let

U : = (U_{J}, J)

,

V : = (V_{J}, J)

,

X^{'} : = {\tilde{X}}_{J}

,

Y^{'} : = {\tilde{Y}}_{J}

and

Z^{'} : = {\tilde{Z}}_{J}

. Using the joint distribution

P_{{\tilde{X}}^{n} {\tilde{Y}}^{n} {\tilde{Z}}^{n} {\tilde{M}}_{1} {\tilde{M}}_{2}}

in (64), we conclude that the joint distribution of random variables

(X^{'}, Y^{'}, Z^{'}, U, V)

, denoted by

Q_{X^{'} Y^{'} Z^{'} U V}

, belongs to the set

Q_{1}

defined in (115). The proof of Lemma 1 is complete by combining (A21) to (A38) and noting that

I_{Q} (X^{'}, Y^{'}; U) \geq I_{Q} (X^{'}; U)

.

Appendix C. Proof of Lemma 2

Given any

γ \in R_{+}

, let

Q_{X Y Z U V}^{(γ)}

achieve the minimum in (117). Recall the definition of

θ_{n}

in (51) and define a new alphabet

\tilde{V} : = V \cup {v^{*}}

. We then define a joint distribution

P_{Y \tilde{V}}^{(γ)}

by specifying the following (conditional) marginal distributions

\begin{matrix} P_{\tilde{V}}^{(γ)} (v) & : = \frac{1}{1 + θ_{n}} Q_{V}^{(γ)} (v) 1 {v \neq v^{*}} + \frac{θ_{n}}{1 + θ_{n}} 1 {v = v^{*}}, \end{matrix}

(A39)

\begin{matrix} P_{Y | \tilde{V}}^{(γ)} (y | v) & : = Q_{Y | V}^{(γ)} (y | v) 1 {v \neq v^{*}} + (\frac{1 + θ_{n}}{θ_{n}} P_{Y} (y) - \frac{1}{θ_{n}} Q_{Y}^{(γ)} (y)) 1 {v = v^{*}} . \end{matrix}

(A40)

Thus, the induced marginal distribution

P_{Y}^{(γ)}

satisfies

\begin{matrix} P_{Y}^{(γ)} (y) & = \sum_{v \in \tilde{V}} P_{\tilde{V}}^{(γ)} (v) P_{Y | \tilde{V}}^{(γ)} (y | v) \end{matrix}

(A41)

\begin{matrix} = (\sum_{v \in V} \frac{1}{1 + θ_{n}} Q_{V}^{(γ)} (v) Q_{Y | V}^{(γ)} (y | v)) + (P_{Y} (y) - \frac{1}{1 + θ_{n}} Q_{Y}^{(γ)} (y)) \end{matrix}

(A42)

\begin{matrix} = P_{Y} (y) . \end{matrix}

(A43)

Furthermore, let

P_{\tilde{V} | Y}^{(γ)}

be induced by

P_{Y \tilde{V}}^{(γ)}

and define the following distribution

\begin{matrix} P_{X Y Z U \tilde{V}}^{(γ)} = P_{X Y Z} Q_{U | X}^{(γ)} P_{\tilde{V} | Y}^{(γ)} . \end{matrix}

(A44)

Recall the definition of

R_{b, c, d} (\cdot)

in (28). The following lemma lower bounds the difference between

R_{b, c, d} (Q_{X Y Z U V}^{(γ)})

and

R_{b, c, d} (P_{X Y Z U \tilde{V}}^{(γ)})

and is critical in the proof of Lemma 2.

Lemma A1.

When

γ = \sqrt{n}

, we have

\begin{matrix} R_{b, c, d} (Q_{X Y Z U V}^{(γ)}) - R_{b, c, d} (P_{X Y Z U \tilde{V}}^{(γ)}) \geq Θ (\frac{log n}{n^{1 / 4}}) . \end{matrix}

(A45)

The proof of Lemma A1 is deferred to Appendix D.

Now, using the assumption that

Q_{X Y Z U V}^{(γ)}

is a minimizer for

R_{b, c, d, γ}

in (117), the fact that

Δ_{b, d, γ} (Q_{X Y Z U V}^{(γ)}) \geq 0

(see (116)) and the result in (A45), we conclude that when

γ = \sqrt{n}

,

\begin{matrix} R_{b, c, d, γ} & = R_{b, c, d} (Q_{X Y Z U V}^{(γ)}) + Δ_{b, d, γ} (Q_{X Y Z U V}^{(γ)}) \end{matrix}

(A46)

\begin{matrix} \geq R_{b, c, d} (P_{X Y Z U \tilde{V}}^{(γ)}) + Θ (\frac{log n}{n^{1 / 4}}) \end{matrix}

(A47)

\begin{matrix} \geq R_{b, c, d} + Θ (\frac{log n}{n^{1 / 4}}), \end{matrix}

(A48)

where (A48) follows from the definition of

R_{b, c, d}

in (29) and the fact that

P_{X Y Z U \tilde{V}}^{(γ)} \in Q

(see (24)).

The proof of Lemma 2 is complete by using (A48) and noting that when

γ = \sqrt{n}

,

\begin{matrix} log \frac{1 - ε_{1} - ε_{2}}{1 + 3 ε_{2} - ε_{1}} + (b + d + γ + 5) log \frac{1 - ε_{1} - ε_{2}}{2} = Θ (\sqrt{n}) . \end{matrix}

(A49)

Appendix D. Proof of Lemma A1

In subsequent analyses, all distributions indicated by

P^{(γ)}

are induced by

P_{X Y Z U \tilde{V}}^{(γ)}

. We have

\begin{matrix} D (Q_{X Y U}^{(γ)} ∥ P_{X Y U}^{(γ)}) = D (Q_{X Y}^{(γ)} ∥ P_{X Y}^{(γ)}) + I_{Q^{(γ)}} (U; Y | X) . \end{matrix}

(A50)

Recalling the definitions of

R_{b, c, d}

in (29) and

R_{b, c, d, γ}

in (117), we conclude that for any

γ \in R_{+}

,

\begin{matrix} R_{b, c, d, γ} \leq R_{b, c, d} \leq b log | X | + d log | Y | = : a^{'} . \end{matrix}

(A51)

Using the definition of

Δ_{b, d, γ} (Q_{X Y Z U V})

in (116) and recalling that

Q_{X Y Z U V}^{(γ)}

is a minimizer for

R_{b, c, d, γ}

, we have

\begin{matrix} γ D (Q_{X Y U}^{(γ)} ∥ P_{X Y U}^{(γ)}) & \leq Δ_{b, d, γ} (Q_{X Y Z U V}^{(γ)}) \end{matrix}

(A52)

\begin{matrix} = R_{b, c, d, γ} - R_{b, c, d} (Q_{X Y Z U V}^{(γ)}) \end{matrix}

(A53)

\begin{matrix} \leq a^{'} + (c + 1) log | Y | + c log | Z | = : a . \end{matrix}

(A54)

We can now upper bound

I_{P^{γ}} (\tilde{V}; Y)

as follows:

\begin{matrix} I_{P^{(γ)}} (\tilde{V}; Y) & = D (P_{Y | \tilde{V}}^{(γ)} ∥ P_{Y}^{(γ)} | P_{\tilde{V}}^{(γ)}) \end{matrix}

(A55)

\begin{matrix} = D (P_{Y | \tilde{V}}^{(γ)} ∥ P_{Y} | P_{\tilde{V}}^{(γ)}) \end{matrix}

(A56)

\begin{matrix} = \frac{1}{1 + θ_{n}} D (Q_{Y | V}^{(γ)} ∥ P_{Y} | Q_{V}^{(γ)}) + \frac{θ_{n}}{1 + θ_{n}} D (\frac{1 + θ_{n}}{θ_{n}} P_{Y} - \frac{1}{θ_{n}} Q_{Y}^{(γ)} ∥ P_{Y}) \end{matrix}

(A57)

\begin{matrix} = \frac{1}{1 + θ_{n}} (D (Q_{Y | V}^{(γ)} ∥ Q_{Y}^{(γ)} | Q_{V}^{(γ)}) + D (Q_{Y}^{(γ)} ∥ P_{Y})) + \frac{θ_{n}}{1 + θ_{n}} D (\frac{1 + θ_{n}}{θ_{n}} P_{Y} - \frac{1}{θ_{n}} Q_{Y}^{(γ)} ∥ P_{Y}) \end{matrix}

(A58)

\begin{matrix} \leq \frac{1}{1 + θ_{n}} I_{Q^{(γ)}} (V; Y) + \frac{1}{1 + θ_{n}} \frac{a}{γ} + \frac{θ_{n}}{1 + θ_{n}} log μ, \end{matrix}

(A59)

where (A56) follows from (A43), and (A59) follows from the result in (A54), the fact that

D (Q_{Y}^{(γ)} ∥ P_{Y}) \leq D (Q_{X Y U}^{(γ)} ∥ P_{X Y U}^{(γ)})

and the definition of

μ

in (50). Thus, when

γ = \sqrt{n}

, recalling the definition of

θ_{n}

in (51), we have

\begin{matrix} I_{Q^{(γ)}} (V; Y) & \geq I_{P^{(γ)}} (\tilde{V}; Y) - \frac{a}{γ} - θ_{n} log μ \end{matrix}

(A60)

\begin{matrix} = I_{P^{(γ)}} (\tilde{V}; Y) + Θ (\frac{1}{\sqrt{n}}) . \end{matrix}

(A61)

Similar to (A59), we obtain

\begin{matrix} I_{P^{(γ)}} (\tilde{V}; Z) \end{matrix}

\begin{matrix} = D (P_{Z | \tilde{V}}^{(γ)} ∥ P_{Z}^{(γ)} | P_{\tilde{V}}^{(γ)}) \end{matrix}

(A62)

\begin{matrix} = D (P_{Z | \tilde{V}}^{(γ)} ∥ P_{Z} | P_{\tilde{V}}^{(γ)}) \end{matrix}

(A63)

\begin{matrix} = \frac{1}{1 + θ_{n}} D (Q_{Z | V}^{(γ)} ∥ P_{Z} | Q_{V}^{(γ)}) + \frac{θ_{n}}{1 + θ_{n}} D (\frac{1 + θ_{n}}{θ_{n}} P_{Z} - \frac{1}{θ_{n}} Q_{Z}^{(γ)} ∥ P_{Z}) \end{matrix}

(A64)

\begin{matrix} = \frac{1}{1 + θ_{n}} (D (Q_{Z | V}^{(γ)} ∥ Q_{Z}^{(γ)} | Q_{V}^{(γ)}) + D (Q_{Z}^{(γ)} ∥ P_{Z})) + \frac{θ_{n}}{1 + θ_{n}} D (\frac{1 + θ_{n}}{θ_{n}} P_{Z} - \frac{1}{θ_{n}} Q_{Z}^{(γ)} ∥ P_{Z}) \end{matrix}

(A65)

\begin{matrix} \geq \frac{1}{1 + θ_{n}} I_{Q^{(γ)}} (V; Z), \end{matrix}

(A66)

where (A64) follows since

Q^{(γ)} \in Q_{1}

(see (115)) implies that

Q_{Z | Y}^{(γ)} = P_{Z | Y}

and the Markov chains

Z - Y - X

and

V - Y - Z

holds and thus using (A39) to (A40), we have

\begin{matrix} P_{Z | \tilde{V}}^{(γ)} (z | v) & = \frac{\sum_{y} P_{Z | Y} (z | y) P_{\tilde{V}} (v) P_{Y | \tilde{V}} (y | v)}{P_{\tilde{V}} (v)} \end{matrix}

(A67)

\begin{matrix} = \frac{\sum_{y} Q_{Z | Y}^{(γ)} (z | y) Q_{V}^{(γ)} (v) Q_{Y | V}^{(γ)} (y | v)}{Q_{V}^{(γ)} (v)} \end{matrix}

(A68)

\begin{matrix} = Q_{Z | V}^{(γ)} (z | v), \end{matrix}

(A69)

and

\begin{matrix} P_{Z | \tilde{V}}^{(γ)} (z | v^{*}) & = \frac{\sum_{y} P_{Z | Y} (z | y) P_{\tilde{V}} (v^{*}) P_{Y | \tilde{V}} (y | v^{*})}{P_{\tilde{V}} (v^{*})} \end{matrix}

(A70)

\begin{matrix} = \sum_{y} Q_{Z | Y}^{(γ)} (z | y) (\frac{1 + θ_{n}}{θ_{n}} P_{Y} (y) - \frac{1}{θ_{n}} Q_{Y}^{(γ)} (y)) \end{matrix}

(A71)

\begin{matrix} = \frac{1 + θ_{n}}{θ_{n}} P_{Z} (z) - \frac{1}{θ_{n}} Q_{Z}^{(γ)} (z), \end{matrix}

(A72)

Therefore, we have

\begin{matrix} I_{Q^{(γ)}} (V; Z) & \leq (1 + θ_{n}) I_{P^{(γ)}} (\tilde{V}; Z) \end{matrix}

(A73)

\begin{matrix} \leq I_{P^{(γ)}} (\tilde{V}; Z) + θ_{n} log | Z | \end{matrix}

(A74)

\begin{matrix} = I_{P^{(γ)}} (\tilde{V}; Z) + Θ (\frac{1}{\sqrt{n}}) . \end{matrix}

(A75)

Let

∥ P - Q ∥

be the

ℓ_{1}

norm between P and Q regarded as vectors. Using Pinsker’s inequality, the result in (105), and the data processing inequality for the relative entropy [17], we obtain

\begin{matrix} ∥ Q_{U X}^{(γ)} - P_{U X}^{(γ)} ∥ & \leq \sqrt{2 log 2 \cdot D (Q_{U X}^{(γ)} ∥ P_{U X}^{(γ)})} \end{matrix}

(A76)

\begin{matrix} \leq \sqrt{2 log 2 \cdot D (Q_{X Y U}^{(γ)} ∥ P_{X Y U}^{(γ)})} \end{matrix}

(A77)

\begin{matrix} \leq \sqrt{\frac{2 a log 2}{γ}} . \end{matrix}

(A78)

From the support lemma ([21] (Appendix C)), we conclude that the cardinality of U can be upper bounded by a function depending only on

| X |

,

| Y |

and

| Z |

(these alphabets are all finite). Thus, when

γ = \sqrt{n}

, invoking ([4] (Lemma 2.2.7)), we have

\begin{matrix} | H (Q_{U X}^{(γ)}) - H (P_{U X}^{(γ)}) | \leq \sqrt{\frac{2 a log 2}{γ}} log \frac{| U | | X |}{\sqrt{\frac{2 a log 2}{γ}}} = Θ (\frac{log n}{n^{1 / 4}}) . \end{matrix}

(A79)

Similar to (A79), we have

\begin{matrix} | I_{Q^{(γ)}} (U; X) - I_{P^{(γ)}} (U; X) | & \leq Θ (\frac{log n}{n^{1 / 4}}), \end{matrix}

(A80)

\begin{matrix} | I_{Q^{(γ)}} (U; Y) - I_{P^{(γ)}} (U; Y) | & \leq Θ (\frac{log n}{n^{1 / 4}}) . \end{matrix}

(A81)

Combining (A61), (A75), (A80) and (A81), when

γ = \sqrt{n}

, using the definition of

R_{b, c, d} (\cdot)

in (28), we have

\begin{matrix} R_{b, c, d} (Q_{X Y Z U V}^{(γ)}) \end{matrix}

\begin{matrix} \geq - (c + 1) I_{Q^{(γ)}} (U; Y) + b I_{Q^{(γ)}} (U; X) - c I_{Q^{(γ)}} (V; Z) + d I_{Q_{(γ)}} (V; Y) \end{matrix}

(A82)

\begin{matrix} \geq - (c + 1) I_{P^{(γ)}} (U; Y) + b I_{P^{(γ)}} (U; X) - c I_{P^{(γ)}} (\tilde{V}; Z) + d I_{P^{(γ)}} (\tilde{V}; Y) + Θ (\frac{log n}{n^{1 / 4}}) \end{matrix}

(A83)

\begin{matrix} = R_{b, c, d} (P_{X Y Z U \tilde{V}}^{(γ)}) + Θ (\frac{log n}{n^{1 / 4}}) . \end{matrix}

(A84)

The proof of Lemma A1 is now complete.

References

Ahlswede, R.; Csiszár, I. Hypothesis testing with communication constraints. IEEE Trans. Inf. Theory 1986, 32, 533–542. [Google Scholar] [CrossRef]
Han, T. Hypothesis testing with multiterminal data compression. IEEE Trans. Inf. Theory 1987, 33, 759–772. [Google Scholar] [CrossRef]
Shalaby, H.M.; Papamarcou, A. Multiterminal detection with zero-rate data compression. IEEE Trans. Inf. Theory 1992, 38, 254–267. [Google Scholar] [CrossRef]
Csiszár, I.; Körner, J. Information Theory: Coding Theorems for Discrete Memoryless Systems; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
Han, T.S.; Amari, S. Statistical inference under multiterminal data compression. IEEE Trans. Inf. Theory 1998, 44, 2300–2324. [Google Scholar]
Tian, C.; Chen, J. Successive refinement for hypothesis testing and lossless one-helper problem. IEEE Trans. Inf. Theory 2008, 54, 4666–4681. [Google Scholar] [CrossRef]
Wigger, M.; Timo, R. Testing against independence with multiple decision centers. In Proceedings of the IEEE SPCOM, Bengaluru, India, 12–15 June 2016; pp. 1–5. [Google Scholar]
Zhao, W.; Lai, L. Distributed detection with vector quantizer. IEEE Trans. Signal Inf. Process. Netw. 2016, 2, 105–119. [Google Scholar] [CrossRef]
Xiang, Y.; Kim, Y.H. Interactive hypothesis testing with communication constraints. In Proceedings of the IEEE 50th Annual Allerton on Communication, Control, and Computing, Monticello, IL, USA, 1–5 October 2012; pp. 1065–1072. [Google Scholar]
Zhao, W.; Lai, L. Distributed testing with cascaded encoders. IEEE Trans. Inf. Theory 2018, 64, 7339–7348. [Google Scholar] [CrossRef]
Salehkalaibar, S.; Wigger, M.; Wang, L. Hypothesis Testing Over the Two-Hop Relay Network. IEEE Trans. Inf. Theory 2019, 65, 4411–4433. [Google Scholar] [CrossRef]
Liu, J.; van Handel, R.; Verdú, S. Beyond the blowing-up lemma: Sharp converses via reverse hypercontractivity. In Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017; pp. 943–947. [Google Scholar]
Tyagi, H.; Watanabe, S. Strong Converse using Change of Measure. IEEE Trans. Inf. Theory 2020, in press. [Google Scholar]
Oohama, Y. Exponential strong converse for source coding with side information at the decoder. Entropy 2018, 20, 352. [Google Scholar] [CrossRef]
Oohama, Y. Exponent function for one helper source coding problem at rates outside the rate region. In Proceedings of the 2015 IEEE International Symposium on Information Theory (ISIT), Hong Kong, China, 14–19 June 2015; pp. 1575–1579. [Google Scholar]
Oohama, Y. Exponential Strong Converse for One Helper Source Coding Problem. Entropy 2019, 21, 567. [Google Scholar] [CrossRef]
Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons: New York, NY, USA, 2012. [Google Scholar]
Gu, W.; Effros, M. A strong converse for a collection of network source coding problems. In Proceedings of the 2009 IEEE International Symposium on Information Theory, Seoul, Korea, 28 June–3 July 2009; pp. 2316–2320. [Google Scholar] [CrossRef]
Liu, J.; van Handel, R.; Verdú, S. Beyond the Blowing-Up Lemma: Optimal Second-Order Converses via Reverse Hypercontractivity. Available online: http://web.mit.edu/jingbo/www/preprints/msl-blup.pdf (accessed on 17 April 2019).
Salehkalaibar, S.; Wigger, M.; Wang, L. Hypothesis testing over cascade channels. In Proceedings of the 2017 IEEE Information Theory Workshop (ITW), Kaohsiung, Taiwan, 6–10 November 2017; pp. 369–373. [Google Scholar] [CrossRef]
El Gamal, A.; Kim, Y.H. Network Information Theory; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
Raginsky, M.; Sason, I. Concentration of measure inequalities in information theory, communications, and coding. Found. Trends^® Commun. Inf. Theory 2013, 10, 1–246. [Google Scholar] [CrossRef]
Gilani, A.; Amor, S.B.; Salehkalaibar, S.; Tan, V.Y.F. Distributed Hypothesis Testing with Privacy Constraints. Entropy 2019, 21, 478. [Google Scholar] [CrossRef] [Green Version]

Figure 1. System model for hypothesis testing over a two-hop network

Figure 2. Illustration of the proof sketch of Theorem 3.

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cao, D.; Zhou, L.; Tan, V.Y.F. A Strong Converse Theorem for Hypothesis Testing Against Independence over a Two-Hop Network. Entropy 2019, 21, 1171. https://doi.org/10.3390/e21121171

AMA Style

Cao D, Zhou L, Tan VYF. A Strong Converse Theorem for Hypothesis Testing Against Independence over a Two-Hop Network. Entropy. 2019; 21(12):1171. https://doi.org/10.3390/e21121171

Chicago/Turabian Style

Cao, Daming, Lin Zhou, and Vincent Y. F. Tan. 2019. "A Strong Converse Theorem for Hypothesis Testing Against Independence over a Two-Hop Network" Entropy 21, no. 12: 1171. https://doi.org/10.3390/e21121171

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Strong Converse Theorem for Hypothesis Testing Against Independence over a Two-Hop Network^†

Abstract

1. Introduction

Notation

2. Problem Formulation and Existing Results

2.1. Problem Formulation

2.2. Existing Results

3. Strong Converse Theorem

3.1. The Case $ε_{1} + ε_{2} < 1$

3.2. The Case $ε_{1} + ε_{2} > 1$

4. Proof of Theorem 2

4.1. Preliminaries

4.2. Summary of Proof Steps

4.3. Step 1: Construction of a Truncated Distribution

4.4. Step 2: Analyses of the Error Exponents of Type-II Error Probabilities

4.4.1. Type-II Error Probability $β_{2}$ at the Relay

4.4.2. Type-II Error Probability $η_{2}$ at the Receiver

4.5. Step 3: Analyses of Communication Constraints and Single-Letterization Steps

5. Discussion and Future Work

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A. Achievability Proof of Proposition 1

Appendix B. Proof of Lemma 1

Appendix C. Proof of Lemma 2

Appendix D. Proof of Lemma A1

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

A Strong Converse Theorem for Hypothesis Testing Against Independence over a Two-Hop Network †

Abstract

1. Introduction

Notation

2. Problem Formulation and Existing Results

2.1. Problem Formulation

2.2. Existing Results

3. Strong Converse Theorem

3.1. The Case ε 1 + ε 2 < 1

3.2. The Case ε 1 + ε 2 > 1

4. Proof of Theorem 2

4.1. Preliminaries

4.2. Summary of Proof Steps

4.3. Step 1: Construction of a Truncated Distribution

4.4. Step 2: Analyses of the Error Exponents of Type-II Error Probabilities

4.4.1. Type-II Error Probability β 2 at the Relay

4.4.2. Type-II Error Probability η 2 at the Receiver

4.5. Step 3: Analyses of Communication Constraints and Single-Letterization Steps

5. Discussion and Future Work

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A. Achievability Proof of Proposition 1

Appendix B. Proof of Lemma 1

Appendix C. Proof of Lemma 2

Appendix D. Proof of Lemma A1

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

A Strong Converse Theorem for Hypothesis Testing Against Independence over a Two-Hop Network^†

3.1. The Case $ε_{1} + ε_{2} < 1$

3.2. The Case $ε_{1} + ε_{2} > 1$

4.4.1. Type-II Error Probability $β_{2}$ at the Relay

4.4.2. Type-II Error Probability $η_{2}$ at the Receiver