Next Article in Journal
Aging Modulates the Resting Brain after a Memory Task: A Validation Study from Multivariate Models
Next Article in Special Issue
Information Theoretic Security for Shannon Cipher System under Side-Channel Attacks
Previous Article in Journal
A Review of Early Fault Diagnosis Approaches and Their Applications in Rotating Machinery
Previous Article in Special Issue
A Monotone Path Proof of an Extremal Result for Long Markov Chains
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Exponential Strong Converse for Successive Refinement with Causal Decoder Side Information †

Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109, USA
*
Authors to whom correspondence should be addressed.
Part of this paper has been accepted to ISIT 2019, Paris, France.
Entropy 2019, 21(4), 410; https://doi.org/10.3390/e21040410
Submission received: 27 February 2019 / Revised: 9 April 2019 / Accepted: 13 April 2019 / Published: 17 April 2019
(This article belongs to the Special Issue Multiuser Information Theory II)

Abstract

:
We consider the k-user successive refinement problem with causal decoder side information and derive an exponential strong converse theorem. The rate-distortion region for the problem can be derived as a straightforward extension of the two-user case by Maor and Merhav (2008). We show that for any rate-distortion tuple outside the rate-distortion region of the k-user successive refinement problem with causal decoder side information, the joint excess-distortion probability approaches one exponentially fast. Our proof follows by judiciously adapting the recently proposed strong converse technique by Oohama using the information spectrum method, the variational form of the rate-distortion region and Hölder’s inequality. The lossy source coding problem with causal decoder side information considered by El Gamal and Weissman is a special case ( k = 1 ) of the current problem. Therefore, the exponential strong converse theorem for the El Gamal and Weissman problem follows as a corollary of our result.

1. Introduction

We consider the k-user successive refinement problem with causal decoder side information shown in Figure 1, which we refer to as the k-user causal successive refinement problem. The decoders aim to recover the source sequence based on the encoded symbols and causally available private side information sequences. Specifically, given the source sequence X n , each encoder f j where j { 1 , , k } compresses X n into a codeword S j . At time i { 1 , , n } , for each j { 1 , , k } , the j-th user aims to recover the i-th source symbol using the codewords from encoders ( f 1 , , f j ) , the side information up to time i and a decoding function ϕ j , i , i.e., X ^ j , i = ϕ j , i ( S 1 , , S j , Y j , 1 , , Y j , i ) . Finally, at time n, for all j { 1 , , k } , the j-th user outputs the source estimate X ^ j n which, under a distortion measure d j , is required to be less than or equal to a specified distortion level D j .
The causal successive refinement problem was first considered by Maor and Merhav in [1] who fully characterized the rate-distortion region for the two-user version. Maor and Merhav showed that, unlike the case with non-causal side information [2,3], no special structure e.g., degradedness, is required between the side information Y 1 n and Y 2 n . Furthermore, Maor and Merhav discussed the performance loss due to causal decoder side information compared with non-causal side information [2,3]. In general, for the k-user successive refinement problem, the loss of performance due to causal decoder side information can be derived using Theorem 1 of the present paper and the results in [2,3] for the k-user case, under certain conditions on the degradedness of the side information in [2,3].
However, Maor and Merhav only presented a weak converse in [1]. In this paper, we strengthen the result in [1] by providing an exponential strong converse theorem for the full k-user causal successive refinement problem, which states that the joint excess-distortion probability approaches one exponentially fast if the rate-distortion tuple falls outside the rate-distortion region.

1.1. Related Works

We first briefly summarize existing works on the successive refinement problem. The successive refinement problem was first considered by Equitz and Cover [4] and by Koshelev [5] who considered necessary and sufficient conditions for a source-distortion triple to be successively refinable. Rimoldi [6] fully characterized the rate-distortion region of the successive refinement problem under the joint excess-distortion probability criterion while Kanlis and Narayan [7] derived the excess-distortion exponent in the same setting. The second-order asymptotic analysis of No and Weissman [8], which provides approximations to finite blocklength performance and implies strong converse theorems, was derived under the marginal excess-distortion probabilities criteria. This analysis was extended to the joint excess-distortion probability criterion by Zhou, Tan and Motani [9]. Other frameworks for successive refinement decoding include [10,11,12,13].
The study of source coding with causal decoder side information was initiated by Weissman and El Gamal in [14] where they derived the rate-distortion function for the lossy source coding problem with causal side information at the decoders (i.e., k = 1 , see also [15], Chapter 11.2). Subsequently, Timo and Vellambi [16] characterized the rate-distortion regions of the Gu-Effros two-hop network [17] and the Gray-Wyner problem [18] with causal decoder side information; Maor and Merhav [19] derived the rate-distortion region for the successive refinement of the Heegard-Berger problem [20] with causal side information available at the decoders; Chia and Weissman [21] considered the cascade and triangular source coding problem with causal decoder side information. In all the aforementioned works, the authors used Fano’s inquality to prove a weak converse. The weak converse implies that as the blocklength tends to infinity, if the rate-distortion tuple falls outside the rate-distortion region, then the joint excess-distortion probability is bounded away from zero. However, in this paper, we prove an exponential strong converse theorem for the k-user causal successive refinement problem, which significantly strengthens the weak converse as it implies that the joint excess-distortion probability tends to one exponentially fast with respect to the blocklength if the rate-distortion tuple falls outside the rate-distortion region (cf. Theorem 3). As a corollary of our result, for any ε [ 0 , 1 ) , the ε -rate-distortion region (cf. Definition 2) remains the same as the rate-distortion region (cf. Equation (27)). Please note that with weak converse, one can only assert that the ε -rate-distortion region equals the rate-distortion region when ε = 0 . See [22] for yet another justification for the utility of a strong converse compared to a weak converse theorem.
As the information spectrum method will be used in this paper to derive an exponential strong converse theorem for the causal successive refinement problem, we briefly summarize the previous applications of this method to network information theory problems. In [23,24,25], Oohama used this method to derive exponential strong converses for the lossless source coding problem with one-helper [26,27] (i.e., the Wyner-Ahlswede-Körner (WAK) problem), the asymmetric broadcast channel problem [28], and the Wyner-Ziv problem [29] respectively. Furthermore, Oohama’s information spectrum method was also used to derive exponential strong converse theorems for content identification with lossy recovery [30] by Zhou, Tan, Yu and Motani [31] and for Wyner’s common information problem under the total variation distance measure [32] by Yu and Tan [33].

1.2. Main Contribution and Challenges

We consider the k-user causal successive refinement problem and present an exponential strong converse theorem. For given rates and blocklength, define the joint excess-distortion probability as the probability that any decoder incurs a distortion level greater than the specified distortion level (see (3)) and define the probability of correct decoding as the probability that all decoders satisfy the specified distortion levels (see (24)). Our proof proceeds as follows. First, we derive a non-asymptotic converse (finite blocklength upper) bound on the probability of correct decoding of any code for the k-user causal successive refinement problem using the information spectrum method. Subsequently, by using Cramér’s inequality and the variational formulation of the rate-distortion region, we show that the probability of correct decoding decays exponentially fast to zero as the blocklength tends to infinity if the rate-distortion tuple falls outside the rate-distortion region of the causal successive refinement problem.
As far as we are aware, this paper is the first to establish a strong converse theorem for any lossy source coding problem with causal decoder side information. Furthermore, our methods can be used to derive exponential strong converse theorems for other lossy source coding problems with causal decoder side information discussed in Section 1.1. In particular, since the lossy source coding problems with causal decoder side information in [1,14] are special cases of the k-user causal successive refinement problem, the exponential strong converse theorems for the problems in [1,14] follow as a corollary of our result.
To establish the strong converse in this paper, we must overcome several major technical challenges. The main difficulty lies in the fact that for the causal successive refinement problem, the side information is available to the decoder causally instead of non-causally. This causal nature of the side information makes the design of the decoder much more complicated and involved, which complicates the analysis of the joint excess-distortion probability. We find that classical strong converse techniques like the image size characterization [34] and the perturbation approach [35] cannot lead to a strong converse theorem due to the above-mentioned difficulty. However, it is possible that other approaches different from ours can be used to obtain a strong converse theorem for the current problem. For example, it is interesting to explore whether two recently proposed strong converse techniques in [36,37] can be used for this purpose considering the fact that the methods in [36,37] have been successfully applied to problems including the Wyner-Ziv problem [29], the Wyner-Ahlswede-Körner (WAK) problem [26,27] and hypothesis testing problems with communication constraints [38,39,40].

2. Problem Formulation and Existing Results

2.1. Notation

Random variables and their realizations are in upper (e.g., X) and lower case (e.g., x) respectively. Sets are denoted in calligraphic font (e.g., X ). We use X c to denote the complement of X and use X n : = ( X 1 , , X n ) to denote a random vector of length n. Furthermore, given any j [ n ] , we use X n \ j to denote ( X 1 , , X j 1 , X j + 1 , , X n ) . We use R + and N to denote the set of positive real numbers and integers respectively. Given two integers a and b, we use [ a : b ] to denote the set of all integers between a and b and use [ a ] to denote [ 1 : a ] . The set of all probability distributions on X is denoted as P ( X ) and the set of all conditional probability distributions from X to Y is denoted as P ( Y | X ) . For information-theoretic quantities such as entropy and mutual information, we follow the notation in [34]. In particular, when the joint distribution of ( X , Y ) is P X Y P ( X × Y ) , we use I ( P X , P Y | X ) and I ( X ; Y ) interchangeably.

2.2. Problem Formulation

Let k N be a fixed finite integer and let P X Y k be a joint probability mass function (pmf) on the finite alphabet X × ( j [ k ] Y j ) with its marginals denoted in the customary way, e.g., P X , P X Y 1 . Throughout the paper, we consider memoryless sources ( X n , Y 1 n , , Y k n ) , which are generated i.i.d. according to P X Y k . Let a finite alphabet X ^ j be the alphabet of the reproduced source symbol for user j [ k ] . Recall the encoder-decoder system model for the k-user causal successive refinement problem in Figure 1.
A formal definition of a code for the causal successive refinement problem is as follows.
Definition 1.
An ( n , M 1 , , M k ) -code for the causal successive refinement problem consists of
  • k encoding functions
    f j : X n M j : = { 1 , , M j } , j [ k ] ,
  • and k n decoding functions: for each i [ n ]
    ϕ j , i : ( l [ j ] M l ) × ( Y j ) i X ^ j , j [ k ] .
For j [ k ] , let d j : X × X ^ j [ 0 , ) be a distortion measure. Given the source sequence x n and a reproduced version x ^ j n , we measure the distortion between them using the additive distortion measure d j ( x n , x ^ j n ) : = 1 n i [ n ] d j ( x i , x ^ j , i ) . To evaluate the performance of an ( n , M 1 , , M k ) -code for the causal successive refinement problem, given distortion specified levels ( D 1 , , D k ) , we consider the following joint excess-distortion probability
P e ( n ) ( D 1 , , D k ) : = Pr j [ k ] s . t . d j ( X n , X ^ j n ) > D j .
For ease of notation, throughout the paper, we use D k to denote ( D 1 , , D k ) , M k to denote ( M 1 , , M k ) and R k to denote ( R 1 , , R k ) .
Given ε ( 0 , 1 ) , the ε -rate-distortion region for the k-user causal successive refinement problem is defined as follows.
Definition 2.
Given any ε ( 0 , 1 ) , a rate-distortion tuple ( R k , D k ) is said to be ε-achievable if there exists a sequence of ( n , M k ) -codes such that
lim sup n 1 n log M 1 R 1 ,
lim sup n 1 n log M j R j l [ j 1 ] R l , j [ 2 : k ] ,
lim sup n P e ( n ) ( D k ) ε .
The closure of the set of all ε-achievable rate-distortion tuples is called the ε-rate-distortion region and is denoted as R ( ε ) .
Please note that in Definition 2, R j is the sum rate of the first j decoders. Using Definition 2, the rate-distortion region for the problem is defined as
R : = ε ( 0 , 1 ) R ( ε ) .

2.3. Existing Results

For the two-user causal successive refinement problem, the rate-distortion region was fully characterized by Maor and Merhav (Theorem 1 in [1]). With slight generalization, the result can be extended to the k-user case.
For j [ k ] , let W j be a random variable taking values in a finite alphabet W j . For simplicity, throughout the paper, we let
T : = ( X , Y k , W k , X ^ k ) ,
and let ( t , T ) be a particular realization of T and its alphabet set, respectively.
Define the following set of joint distributions:
P : = { Q T P ( T ) : Q X Y k = P X Y k , W k X Y k , | W 1 | | X | + 3 , and j [ k ] : | W j | | X | l [ j 1 ] | W l | + 1 , X ^ j = ϕ j ( W j , Y j ) for some ϕ j : l [ j ] W l × Y j X ^ j } .
Given any joint distribution Q T P ( T ) , define the following set of rate-distortion tuples
R ( Q T ) : = { ( R k , D k ) : R 1 I ( Q X , Q W 1 | X ) , D 1 E [ d 1 ( X , ϕ 1 ( W 1 , Y 1 ) ) ] , and j [ 2 : k ] : R j l [ j 1 ] R l I ( Q X | W j 1 , Q W j | X W j 1 | Q W j 1 ) , D j E [ d j ( X , ϕ j ( W j , Y j ) ) ] } .
For k = 2 , Maor and Merhav [1] defined the following information theoretical sets of rate-distortion tuples
R : = Q T P R ( Q T ) .
Theorem 1.
The rate-distortion region for the causal successive refinement problem satisfies
R = R .
We remark that in [1], Maor and Merhav considered the average distortion criterion for k = 2 , i.e.,
lim sup n E [ d j ( X n , X ^ j n ) ] D k , j [ k ] ,
instead of the vanishing joint excess-distortion probability criterion (see (6)) in Definition 2. However, with slight modification to the proof of [1], it can be verified (see Appendix A) that the rate-distortion region R under the vanishing joint excess-distortion probability criterion, is identical to the rate-distortion region R derived by Maor and Merhav under the average distortion criterion.
Theorem 1 implies that if a rate-distortion tuple falls outside the rate-distortion region, i.e., ( R k , D k ) R , then the joint excess-distortion probability P e ( n ) ( D k ) is bounded away from zero. We strengthen the converse proof of Theorem 1 by showing that if ( R k , D k ) R , then the joint excess-distortion probability P e ( n ) ( D k ) approaches one exponentially fast as the blocklength n tends to infinity.

3. Main Results

3.1. Preliminaries

In this subsection, we present necessary definitions and a key lemma before stating our main result.
Define the following set of distributions
Q : = Q T P ( T ) : | W j | | X | | Y | | Z | | X ^ 1 | | X 2 ^ | j , j [ k ] .
Throughout the paper, we use α k to denote ( α 1 , , α k ) and use β k similarly. Given any ( μ , α k , β k ) R + × [ 0 , 1 ] 2 k such that
i [ k ] ( α i + β i ) = 1 ,
for any Q T Q , define the following linear combination of log likelihoods
ω Q T ( μ , α k , β k ) ( t ) : = log Q X ( x ) P X ( x ) + log Q Y k | X W k ( y k | x , w k ) P Y k | X ( y k | x ) + log Q X Y k \ 1 W k \ 1 | Y 1 W 1 X ^ 1 ( x , y k \ 1 , w k \ 1 | y 1 , w 1 , x ^ 1 ) Q X Y k \ 1 W k \ 1 | Y 1 W 1 ( x , y k \ 1 , w k \ 1 | y 1 , w 1 ) + j [ 2 : k ] log Q X ^ j | X Y k W k X ^ j 1 ( x ^ j | x , y k , w k , x ^ j 1 ) Q X ^ j | Y j W j ( x ^ j | y j , w j ) + μ α 1 log Q X | W 1 ( x | w 1 ) P X ( x ) + j [ 2 : k ] μ α j log Q X | W j ( x | w j ) Q X | W j 1 ( x | w j 1 ) + j [ k ] μ β j d j ( x , x ^ j ) .
Given any θ R + and any Q T Q , define the negative cumulant generating function of ω Q T ( μ , α k , β k ) ( · ) as
Ω ( θ , μ , α k , β k ) ( Q T ) : = log E Q T exp θ ω Q T ( μ , α k , β k ) ( T ) .
Furthermore, define the minimal negative cumulant generating function over distributions in Q as
Ω ( θ , μ , α k , β k ) : = min Q T Q Ω ( θ , μ , α k , β k ) ( Q T ) .
Finally, given any rate-distortion tuple ( R k , D k ) , define
κ ( α k , β k ) ( R k , D k ) : = α 1 R 1 + β 1 D 1 + j [ 2 : k ] ( α j ( R j l [ j 1 ] R l ) + β j D j ) ,
F ( θ , μ , α k , β k ) ( R k , D k ) : = Ω ( θ , μ , α k , β k ) θ μ κ ( α k , β k ) ( R k , D k ) 1 + ( 2 k + 2 ) θ + j [ k ] 2 θ μ α j ,
F ( R k , D k ) : = sup ( θ , μ , α k , β k ) R + 2 × [ 0 , 1 ] 2 k : i [ k ] ( α i + β i ) = 1 F ( θ , μ , α k , β k ) ( R k , D k ) .
With the above definitions, we have the following lemma establishing the properties of the exponent function F ( R k , D k ) .
Lemma 1.
The following holds.
(i) 
For any rate-distortion tuple outside the rate-distortion region, i.e., ( R k , D k ) R , we have
F ( R k , D k ) > 0 ,
(ii) 
For any rate-distortion tuple inside the rate-distortion region, i.e., ( R k , D k ) R , we have
F ( R k , D k ) = 0 .
The proof of Lemma 1 is inspired by Property 4 in [25], Lemma 2 in [31] and is given in Section 5. As will be shown in Theorem 2, the exponent function F ( R k , D k ) is a lower bound on the exponent of the probability of correct decoding for the k-user causal successive refinement problem. Thus, Claim (i) in Lemma 1 is crucial to establish the exponential strong converse theorem which states that the joint excess-distortion probability (see (3)) approaches one exponentially fast with respect to the blocklength of the source sequences.

3.2. Main Result

Define the probability of correct decoding as
P c ( n ) ( D k ) : = 1 P e ( n ) ( D k ) = Pr j [ k ] , d j ( X n , X ^ j n ) D j .
Theorem 2.
Given any ( n , M k ) -code for the k-user causal successive refinement problem such that
log M 1 n R 1 , and j [ 2 : k ] , log M j n R j l [ j 1 ] R l ,
we have the following non-asymptotic upper bound on the probability of correct decoding
P c ( n ) ( D k ) ( 2 k + 3 ) exp ( n F ( R k , D k ) ) .
The proof of Theorem 2 is given in Section 4. Several remarks are in order.
First, our result is non-asymptotic, i.e., the bound in (26) holds for any n N . To prove Theorem 2, we adapt the recently proposed strong converse technique by Oohama [25] to analyze the probability of correct decoding. We first obtain a non-asymptotic upper bound using the information spectrum of log-likelihoods involved in the definition of ω Q T ( μ , α k , β k ) (see (16)) and then apply Cramér’s bound on large deviations (see e.g., Lemma 13 in [31]) to obtain an exponential type non-asymptotic upper bound. Subsequently, we apply the recursive method [25] and proceed similarly as in [31] to obtain the desired result. Our method can also be used to establish similar results for other source coding problems with causal decoder side information [16,19,21].
Second, we do not believe that classical strong converse techniques including the image size characterization [34] and the perturbation approach [35] can be used to obtain a strong converse theorem for the causal successive refinement problem (e.g., Theorem 3). The main obstacle is that the side information is available causally and thus complicates the decoding analysis significantly.
Invoking Lemma 1 and Theorem 2, we conclude that the exponent on the right hand side of (26) is positive if and only if the rate-distortion tuple is outside the rate-distortion region, which implies the following exponential strong converse theorem.
Theorem 3.
For any sequence of ( n , M k ) -codes satisfying the rate constraints in (25), given any distortion levels D k , we have that if ( R k , D k ) R , then the probability of correct decoding P c ( n ) ( D k ) decays exponentially fast to zero as the blocklength of the source sequences tends to infinity.
As a result of Theorem 3, we conclude that for every ε ( 0 , 1 ) , the ε -rate distortion region (see Definition 2) satisfies that
R ( ε ) = R ,
i.e., a strong converse holds for the k-user causal successive refinement problem. Using the strong converse theorem and Marton’s change-of-measure technique [41], similarly to Theorem 5 in [31], we can also derive an upper bound on the exponent of the joint excess-distortion probability. Furthermore, applying the one-shot techniques in [42], we can also establish a non-asymptotic achievability bound. Applying the Berry-Esseen theorem to the achievability bound and analyzing the non-asymptotic converse bound in Theorem 2, similarly to [25], we conclude that the backoff from the rate-distortion region at finite blocklength scales on the order of Θ ( 1 n ) . However, nailing down the exact second-order asymptotics [43,44] is challenging and is left for future work.
Our main results in Lemma 1, Theorems 2 and 3 can be specialized to the settings in [1,14] with k = 1 and k = 2 decoders (users) respectively.

4. Proof of the Non-Asymptotic Converse Bound (Theorem 2)

4.1. Preliminaries

Given any ( n , M k ) -code with encoding functions ( f 1 , , f k ) and and decoding functions { ( ϕ 1 , i , , ϕ k , i ) } i [ n ] , we define the following induced conditional distributions on the encoders and decoders: for each j [ k ] ,
P S j | X n ( s j | x n ) : = 1 { s j = f j ( x n ) } ,
P X ^ j n | S j Y j n ( x ^ j n | s j , y j n ) : = i [ n ] 1 { x ^ j , i = ϕ j , i ( s j , y j , 1 , , y j , i ) } .
For simplicity, in the following, we define
G : = ( X n , Y 1 n , , Y k n , S k , X ^ 1 n , , X ^ k n ) ,
and let ( g , G ) be a particular realization and the alphabet of G respectively. With above definitions, we have that the distribution P G satisfies that for any g G ,
P G ( g ) : = P X Y k n ( x n , y 1 n , , y k n ) j [ k ] P S j | X n ( s j | x n ) j [ k ] P X ^ j n | S j Y j n ( x ^ j n | s j , y j n ) .
In the remaining part of this section, all distributions denoted by P are induced by the joint distribution P G .
To simplify the notation, given any ( i , j ) [ n ] × [ k ] , we use Y j , 1 j , i to denote ( Y j , 1 , , Y j , i ) and we use Y 1 , i k , i to denote ( Y 1 , i , , Y k , i ) . Similarly, we use W 1 , i k , i and X ^ 1 , i k , i . For each i [ n ] , let auxiliary random variables be W 1 , i : = ( X i 1 , Y 1 , 1 1 , i 1 , , Y k , 1 k , i 1 , S 1 ) and W j , i = S j for all j [ 2 : k ] . Please note that as a function of i [ n ] , the Markov chain ( W 1 , i k , i ) X i ( Y i , Z i ) holds under P G . Throughout the paper, for each i [ n ] , we let
T i : = ( X i , Y 1 , i k , i , W 1 , i k , i , X ^ 1 , i k , i ) ,
and let ( t i , T i ) be a particular realization of T i and the alphabet of T i , respectively.
For each i [ n ] , let Q C i | D i be arbitrary distributions where C i T i and D i T i . Given any positive real number η and rate-distortion tuple ( R k , D k ) , define the following subsets of G :
B 1 : = g : 0 1 n i [ n ] log Q X i ( x i ) P X ( x i ) η ,
B 2 : = g : 0 1 n i [ n ] log Q Y 1 , i k , i | X i W 1 , i k , i ( y 1 , i k , i | x i , w 1 , i k , i ) P Y k | X ( y 1 , i k , i | x i ) η ,
B 3 : = { g : 0 1 n i [ n ] log Q X i Y 2 , i k , i W 2 , i k , i | Y 1 , i W 1 , i X ^ 1 , i ( x i , y 2 , i k , i , w 2 , i k , i | y 1 , i , w 1 , i , x ^ 1 , i ) P X i Y 2 , i k , i W 2 , i k , i | Y 1 , i W 1 , i ( x i , y 2 , i k , i , w 2 , i k , i | y 1 , i , w 1 , i ) η } ,
B 4 : = { g : 0 1 n i [ n ] log Q X ^ j , i | X i Y 1 , i k , i W 1 , i k , i X ^ 1 , i j 1 , i ( x ^ j , i | x i , y 1 , i k , i , w 1 , i k , i , x ^ 1 , i j 1 , i ) P X ^ j , i | Y j , i W 1 , i j , i ( x ^ j , i | y j , i , w 1 , i j , i ) η , j [ 2 : k ] } ,
B 5 : = { g : R 1 1 n i [ n ] log P X i | W 1 , i ( x i | w 1 , i ) P X ( x i ) η } ,
B 6 : = { g : R j l [ j 1 ] R l 1 n i [ n ] log P X i | W 1 , i j , i ( x i | w 1 , i j , i ) P X i | W 1 , i j 1 , i ( x i | w 1 , i j 1 , i ) η , j [ 2 : k ] }
B 7 : = { g : D j 1 n i [ n ] log exp ( d j ( x i , x ^ j , i ) ) , j [ k ] } .

4.2. Proof Steps of Theorem 2

We first present the following non-asymptotic upper bound on the probability of correct decoding using the information spectrum method.
Lemma 2.
For any ( n , M k ) -code satisfying (25), given any distortion levels D k , we have
P c ( n ) ( D k ) Pr i [ 7 ] B i + ( 2 k + 2 ) exp ( n η ) .
The proof of Lemma 2 is given in Appendix B and is divided into two steps. First, we derive a n-letter non-asymptotic upper bound which holds for certain arbitrary n-letter auxiliary distributions. Subsequently, we single-letterize the derived bound by proper choice of auxiliary distributions and careful decomposition of induced distributions of P G .
Subsequently, we will apply Cramér’s bound on Lemma 2 to obtain an exponential type non-asymptotic upper bound on the probability of correct decoding. For simplicity, we will use P i to denote P T i and use Q i to denote Q T i . To present our next result, we need the following definitions. Given any μ R + and any ( α k , β k ) [ 0 , 1 ] 2 k satisfying (15), let f Q i , P i ( α k , β k ) ( t i ) be the weighted sum of log likelihood terms in the summands to the right of the inequalities in { B i } i [ 7 ] , i.e.,
f Q i , P i ( α k , β k ) ( t i ) : = log Q X i ( x i ) P X ( x i ) + log Q Y 1 , i k , i | X i W 1 , i k , i ( y 1 , i k , i | x i , w 1 , i k , i ) P Y k | X ( y 1 , i k , i | x i ) + log Q X i Y 2 , i k , i W 2 , i k , i | Y 1 , i W 1 , i X ^ 1 , i ( x i , y 2 , i k , i , w 2 , i k , i | y 1 , i , w 1 , i , x ^ 1 , i ) P X i Y 2 , i k , i W 2 , i k , i | Y 1 , i W 1 , i ( x i , y 2 , i k , i , w 2 , i k , i | y 1 , i , w 1 , i ) + j [ 2 : k ] log Q X ^ j , i | X i Y 1 , i k , i W 1 , i k , i X ^ i j 1 ( x ^ j , i | x i , y 1 , i k , i , w 1 , i k , i , x ^ 1 , i j 1 , i ) P X ^ j , i | Y j , i W 1 , i j , i ( x ^ j , i | y j , i , w 1 , i j , i ) + μ α 1 log P X i | W 1 , i ( x i | w 1 , i ) P X ( x i ) + j [ 2 : k ] μ α j log P X i | W 1 , i j , i ( x i | w 1 , i j , i ) P X i | W 1 , i j 1 , i ( x i | w 1 , i j 1 , i ) + j [ k ] μ β j d j ( x i , x ^ j , i ) .
Furthermore, given any non-negative real number λ R + , define the following negative cumulant generating function
Ω ( λ , μ , α k , β k ) ( { P i , Q i } i [ n ] ) : = log E exp λ i [ n ] f Q i , P i ( μ , α k , β k ) ( T i ) .
Recall the definition of κ ( α k , β k ) ( R k , D k ) in (19). Please note that κ ( α k , β k ) ( R k , D k ) is a linear combination of the rate-distortion tuple. Using Lemma 2 and Cramér’s bound (Lemma 13 in [31]), we obtain the following non-asymptotic exponential type upper bound on the probability of correct decoding, whose proof is given in in Appendix D.
Lemma 3.
For any ( n , M k ) -code satisfying the conditions in Lemma 2, given any distortion levels D k , we have
P c ( n ) ( D k ) ( 2 k + 3 ) exp n 1 n Ω ( λ , μ , α k , β k ) ( { P i , Q i } i [ n ] ) λ μ κ ( α k , β k ) ( R k , D k ) 1 + λ ( k + 2 + j [ k ] μ α j ) .
For subsequent analyses, let Ω ̲ ( λ , μ , α k , β k ) ( { P i } i [ n ] ) be the lower bound on the Q-maximal negative cumulant generating function Ω ( λ , μ , α k , β k ) ( { P i , Q i } i [ n ] ) obtained by optimizing over the choice of auxiliary distributions { Q i } i [ n ] , i.e.,
Ω ̲ ( λ , μ , α k , β k ) ( { P i } i [ n ] ) : = inf n N sup { Q i } i [ n ] Ω ( λ , μ , α k , β k ) ( { P i , Q i } i [ n ] ) .
Here the supremum over { Q i } i [ n ] is taken since we want the bound to hold for favorable auxiliary distributions and the infimum over n N is taken to yield a non-asymptotic bound.
In the following, we derive a relationship between Ω ̲ ( λ , μ , α k , β k ) ( { P i } i [ n ] ) and Ω ( θ , μ , α k , β k ) (cf. (18)), which, as we shall see later, is a crucial step in proving Theorem. For this purpose, given any ( λ , μ , α k ) R + 2 × [ 0 , 1 ] k such that
λ ( k + j [ k ] μ α j ) 1 ,
let
θ : = λ 1 k λ j [ k ] λ μ α j .
Then we have the following lemma which shows that Ω ̲ ( λ , μ , α k , β k ) ( { P i } i [ n ] ) in Equation (44) can be lower bounded by a scaled version of Ω ( θ , μ , α k , β k ) in Equation (18).
Lemma 4.
Given any ( λ , μ , α k , β k ) R + 2 × [ 0 , 1 ] 3 satisfying (15) and (45), for θ defined in (46), we have:
Ω ̲ ( λ , μ , α k , β k ) ( { P i } i [ n ] ) n Ω ( θ , μ , α k , β k ) 1 + k θ + j [ k ] θ μ α j .
The proof of Lemma 4 uses Hölder’s inequality and the recursive method in [25] and is given in Appendix E.
Combining Lemmas 3 and 4, we conclude that for any ( n , M k ) -code satisfying the conditions in Lemma 2 and for any ( μ , α k , β k ) R + × [ 0 , 1 ] 3 , given any λ R + satisfying (45), we have
P c ( n ) ( D k ) ( 2 k + 3 ) exp n 1 n Ω ( λ , μ , α k , β k ) ( { P i , Q i } i [ n ] ) λ μ κ ( α k , β k ) ( R k , D k ) 1 + λ ( k + 2 + j [ k ] μ α j )
( 2 k + 3 ) exp n Ω ( θ , μ , α k , β k ) θ μ κ ( α k , β k ) ( R k , D k ) 1 + ( 2 k + 2 ) θ + j [ k ] 2 θ μ α j
( 2 k + 3 ) exp n F ( θ , μ , α k , β k ) ( R k , D k ) ,
where (49) follows from the definitions of κ ( α k , β k ) ( · ) in (19) and θ in (46), and (50) is simply due to the definition of F ( θ , μ , α k , β k ) ( · ) in (20).

5. Proof of Properties of Strong Converse Exponent: Proof of Lemma 1

5.1. Alternative Expressions for the Rate-Distortion Region

In this section, we present preliminaries for the proof of Lemma 1, including several definitions and two alternative characterizations of the rate-distortion region R (cf. (7)).
Recall that we use Y k \ j to denote ( Y 1 , , Y j 1 , Y j + 1 , , Y k ) . First, paralleling (9), we define the following set of joint distributions
P : = { Q T P ( T ) : Q X Y k = P X Y k , W k X Y k , and j [ k ] : | W j | ( | X | + 1 ) j , X ^ j ( W j , Y j ) ( X , Y k \ j , W j + 1 k , X ^ j 1 ) } .
Please note that compared with (9), the deterministic decoding functions ϕ j are now replaced by stochastic functions, which are characterized by transition matrices and induce Markov chains, and the cardinality bounds on auxiliary random variables are changed accordingly. Using the definitions of P and R ( Q T ) (cf. (10)), we can define the following rate-distortion region denoted by R ran where the subscript “ran” refers to the randomness of the stochastic functions in the definition of P :
R ran : = Q T P R ( Q T ) .
As we shall see later, R ran = R .
To present the alternative characterization of the rate-distortion region using supporting hyperplanes, we need the following definitions. First, we let P sh be the following set of joint distributions
P sh : = { Q T P ( T ) : Q X Y k = P X Y k , W k X Y k , and j [ k ] , | W j | ( | X | ) j , X ^ j ( W j , Y j ) ( X , Y k \ j , W j + 1 k , X ^ j 1 ) } .
Please note that P sh are the same as P (cf. (51)) except that the cardinality bounds are reduced. Given any ( α k , β k ) [ 0 , 1 ] 2 k satisfying (15), define the following linear combination of achievable rate-distortion tuples
R ( α k , β k ) : = min Q T P sh α 1 I ( Q X , Q W 1 | X ) + j [ 2 : k ] α j I ( Q X | W j 1 , Q W j | X W j 1 | Q W j 1 ) + j [ k ] β j E [ d j ( X , X ^ j ) ] .
Recall the definition of linear combination of rate-distortion tuples κ · ( · ) in (19) and let R sh be the following collection of rate-distortion tuples defined using supporting hyperplane R ( α k , β k ) :
R sh : = ( α k , β k ) [ 0 , 1 ] 2 k : i [ k ] ( α i + β i ) = 1 { ( R k , D k ) : κ ( α k , β k ) ( R k , D k ) R ( α k , β k ) } .
Finally, recall the definitions of the rate-distortion region R in (7) and the characterization R in (11). Similarly to Properties 2 and 3 in [25], one can establish the following lemma, which states that: (i) the rate-distortion region R for the k-user causal successive refinement problem remains unchanged even if one uses stochastic decoding functions; and (ii) the rate-distortion region R has alternative characterization R sh in terms of supporting hyperplanes in (54).
Lemma 5.
The rate-distortion region for the causal successive refinement problem satisfies
R = R = R ran = R sh .

5.2. Proof of Claim (i)

Recall that we use T (cf. (8)) to denote the collection of random variables ( X , Y k , S k , X ^ k ) and use t , T similarly to denote a realization of T and its alphabet, respectively. For any P T P sh (recall (53)), any ( α k , β k ) [ 0 , 1 ] 2 k satisfying (15) and any λ R + , for any t T , paralleling (16) and (17), define the following linear combination of log likelihoods and its negative cumulative generating function:
ω ˜ P T ( α k , β k ) ( t ) : = α 1 log P X | W 1 ( x | w 1 ) P X ( x ) + j [ 2 : k ] α j log P X | W j ( x | w j ) P X | W j 1 ( x | w j 1 ) + j [ k ] β j d j ( x , x ^ j ) ,
Ω ˜ ( λ , α k , β k ) ( P T ) : = log E P T exp ( λ ω ˜ P T ( α k , β k ) ( T ) ) .
For simplicity, we let
α + : = max j [ k ] α j .
Furthermore, paralleling the steps used to go from (18) to (21) and recalling the definition of κ ( α k , β k ) ( · ) in (19), let
Ω ˜ ( λ , α k , β k ) : = min P T P sh Ω ˜ ( λ , α k , β k ) ( P T ) ,
F ˜ ( λ , α k , β k ) ( R k , D k ) : = Ω ˜ ( λ , α k , β k ) λ κ ( α k , β k ) ( R k , D k ) 2 k + 3 + λ α + + j [ 2 : k ] λ ( 2 k + 3 ) α j + l [ k ] 2 λ α l ,
F ˜ ( R k , D k ) : = sup ( λ , α k , β k ) R + × [ 0 , 1 ] 2 k : i [ k ] ( α i + β i ) = 1 F ˜ ( λ , α k , β k ) ( R k , D k ) .
To prove Claim (i), we will need the following two definitions of the tilted distribution and the dispersion function:
P T ( λ , α k , β k ) ( t ) : = P T ( t ) exp ( λ ω ˜ P T ( α k , β k ) ( t ) ) E P T exp ( λ ω ˜ P T ( α k , β k ) ( T ) ) ,
ρ : = sup P T P sh sup ( λ , α k , β k ) R + × [ 0 , 1 ] 2 k : i [ k ] ( α i + β i ) = 1 Var P T ( λ , α k , β k ) ω ˜ P T ( α k , β k ) ( T ) .
Please note that ρ is positive and finite.
The proof of Claim (i) in Lemma 1 is completed by the following lemma which relates F ( R k , D k ) in Equation (21) to F ˜ ( R k , D k ) in Equation (62).
Lemma 6.
The following holds.
(i) 
For any rate-distortion tuple ( R k , D k ) ,
F ( R k , D k ) F ˜ ( R k , D k ) .
(ii) 
For any rate-distortion tuple ( R k , D k ) outside the rate-distortion region, i.e., ( R k , D k ) R , there exists δ ( 0 , ρ ] such that:
F ˜ ( R k , D k ) δ 2 2 ( 2 k + 9 ) ρ > 0 .
The proof of Lemma 6 is inspired by [25,31] and given in Appendix F. To prove Lemma 6, we use the alternative characterizations of the rate-distortion region R in Lemma 5 and analyze the connections between the two exponent functions F ( R k , D k ) and F ˜ ( R k , D k ) .

5.3. Proof of Claim (ii)

Recall the definition of the linear combination of rate-distortion tuple κ ( α k , β k ) ( R k , D k ) in Equation (19). If a rate-distortion tuple falls inside the rate-distortion region, i.e., ( R k , D k ) R , then there exists a distribution Q T P sh (see (53)) such that for any ( α k , β k ) [ 0 , 1 ] 2 k satisfying (15), we have the following lower bound on κ ( α k , β k ) ( R k , D k ) :
κ ( α k , β k ) ( R k , D k ) α 1 I ( Q X 1 , Q W 1 | X 1 ) + β 1 E [ d 1 ( X , X ^ 1 ) ] + j [ 2 : k ] ( α j I ( Q X 1 | W j 1 , Q W j | X W j 1 | Q W j 1 ) + β j E [ d j ( X , X ^ j ) ] ) .
Recall the definition of Ω ( θ , μ , α k , β k ) ( Q T ) in (17). Simple calculation establishes
Ω ( 0 , μ , α k , β k ) ( Q T ) = 0 ,
Ω ( θ , μ , α k , β k ) ( Q T ) θ | θ = 0 = E Q T ω Q T μ , α k , β k ( T ) .
Combining (68) and (69), by concavity of Ω ( θ , μ , α k , β k ) ( Q T ) in θ , it follows that for any ( θ , μ , α k , β k ) R + 2 × [ 0 , 1 ] 2 k ,
Ω ( θ , μ , α k , β k ) ( Q T ) θ E Q T ω Q T μ , α k , β k ( T ) .
Using the definition of Ω ( θ , μ , α k , β k ) in (18), it follows that
Ω ( θ , μ , α k , β k ) min Q T P sh Ω ( θ , μ , α k , β k ) ( Q T )
min Q T P sh θ E Q T ω Q T μ , α k , β k ( T ) α 1 I ( Q X 1 , Q W 1 | X 1 ) + β 1 E [ d 1 ( X , X ^ 1 ) ]
+ j [ 2 : k ] ( α j I ( Q X 1 | W j 1 , Q W j | X W j 1 | Q W j 1 ) + β j E [ d j ( X , X ^ j ) ] )
μ κ ( α k , β k ) ( R k , D k ) ,
where (71) follows from P sh Q (recall (14)), (72) follows from the result in (70), (73) follows from the definitions of ω Q T μ , α k , β k ( t ) in (17) and P sh in (53), and (74) follows from the result in (67).
Using the definition of F ( θ , μ , α k , β k ) ( R k , D k ) in (21) and the result in (74), we conclude that for any ( R k , D k ) R ,
F ( θ , μ , α k , β k ) ( R k , D k ) 0 .
The proof of Claim (ii) is completed by noting that
lim θ 0 F ( θ , μ , α k , β k ) ( R k , D k ) = 0 .

6. Conclusions

We considered the k-user causal successive refinement problem [1] and established an exponential strong converse theorem using the strong converse techniques proposed by Oohama [25]. Our work appears to be the first to derive a strong converse theorem for any source coding problem with causal decoder side information. The methods we adopted can also be used to obtain exponential strong converse theorems for other source coding problems with causal decoder side information. This paper further illustrates the usefulness and generality of Oohama’s information spectrum method in deriving exponential strong converse theorems. The discovered duality in [45] between source coding with decoder side information [46] and channel coding with encoder state information [47] suggests that Oohama’s techniques [25] can also be used to establish the strong converse theorem for channel coding with causal encoder state information, e.g., [48,49,50].
There are several natural future research directions. In Theorem 2, we presented only a lower bound on the strong converse exponent. It would be worthwhile to obtain an exact expression for the strong converse exponent and thus characterize the speed at which the probability of correct decoding decays exponentially fast with respect to the blocklength of source sequences when the rate-distortion tuple falls outside the rate-distortion region. Furthermore, one can explore whether the methods in this paper can be used to establish strong converse theorems for causal successive refinement under the logarithmic loss [51,52], which corresponds to soft decoding of each source symbol. Finally, one can also explore extensions to continuous alphabet by considering Gaussian memoryless sources under bounded distortion measures and derive second-order asymptotics [44,53,54,55,56] for the causal successive refinement problem.

Author Contributions

Formal analysis, L.Z.; Funding acquisition, A.H.; Supervision, A.H.; Writing—original draft, L.Z.; Writing—review & editing, A.H.

Funding

This work was partially supported by ARO grant W911NF-15-1-0479.

Acknowledgments

The authors acknowledge anonymous reviewers for helpful comments.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Theorem 1

Replacing (6) with Definition 2, we can define the ε -rate-distortion region R ad ( ε ) under the average distortion criterion. Furthermore, let
R ad : = ε [ 0 , 1 ) R ad ( ε ) .
Maor and Merhav [1] showed that for k = 2 ,
R ad = R .
Actually, in Section 7 of [1], in order to prove that R R ad , it was already shown that R R . Furthermore, it is straightforward to show that the above results hold for any finite k N . Thus, to prove Theorem 1, it suffices to show
R R = R ad .
For this purpose, given any j [ k ] , let
d ¯ j : = max ( x , x ^ j ) X × X ^ j d j ( x , x ^ j ) .
From the problem formulation, we know that d ¯ j < for all j [ k ] . Now consider any rate-distortion tuple ( R k , D k ) R , then we have (4) to (6). Therefore, for any j [ k ] ,
lim sup n E [ d j ( X n , X ^ j n ) ] lim sup n E [ d j ( X n , X ^ j n ) 1 { d j ( X n , X ^ j n ) D j } ] + d ¯ j Pr { d j ( X n , X ^ j n ) > D j }
D j .
As a result, we have ( R k , D k ) R ad . Thus establishes that R R ad = R .

Appendix B. Proof of Lemma 2

Recall the definition of G and G in (30). Given any C G and D G , let Q C | D be arbitrary distributions. For simplicity, given each j [ k ] , we use Y j to denote ( Y 1 n , , Y j n ) and use Y j \ l to denote ( Y 1 n , , Y l 1 n , Y l + 1 n , , Y l n ) where l [ j ] . Similarly we use X ^ j and X ^ j \ l .
Given any positive real number η , define the following sets:
A 1 : = g : 1 n log P X n ( x n ) Q X n ( x n ) η ,
A 2 : = g : 1 n log P Y k | X n ( y k | x n ) Q Y k | X n S k ( y k | x n , s k ) η ,
A 3 : = g : 1 n log P X n Y k \ 1 S k \ 1 | Y 1 n S 1 ( x n , y k \ 1 , s k \ 1 | y 1 n , s 1 ) Q X n Y k \ 1 S k \ 1 | Y 1 n S 1 X ^ 1 n ( x n , y k \ 1 , s k \ 1 | y 1 n , s 1 , x ^ 1 n ) η ,
A 4 : = g : 1 n log P X ^ j n | Y j n S j ( x ^ j n | y j n , s j ) Q X ^ j n | X n Y k S k X ^ j 1 ( x ^ j n | x n , y k , s k , x ^ j 1 ) η , j [ 2 : k ] ,
A 5 : = g : R 1 1 n log P X n | S 1 ( x n | s 1 ) P X n ( x n ) η ,
A 6 : = g : R j l [ j 1 ] R l 1 n log P X n | S j ( x n | s j ) P X n | S j 1 ( x n | s j 1 ) η , j [ 2 : k ] ,
A 7 : = g : D j d j ( x n , x ^ j n ) j [ k ] = g : D j 1 n i [ n ] d j ( x i , x ^ j , i ) j [ k ] .
Then we have the following non-asymptotic upper bound on the probability of correct decoding.
Lemma A1.
Given any ( n , M k ) -code satisfying (25) and any distortion levels D k , we have
P c ( n ) ( D k ) Pr i [ 7 ] A i + ( 2 k + 2 ) exp ( n η ) .
The proof of Lemma A1 is given in Appendix C.
In the remainder of this subsection, we single-letterize the bound in Lemma A1. Recall that given any ( i , j ) [ n ] × [ k ] , we use Y j , 1 j , i to denote ( Y j , 1 , , Y j , i ) . Recalling that the distributions starting with P are all induced by the joint distribution P G in (31) and using the choice of auxiliary random variables ( W 1 , i , , W k , i , V i ) , we have
P X n Y k \ 1 S k \ 1 | Y 1 n S 1 ( x n , y k \ 1 , s k \ 1 | y 1 n , s 1 )
= i [ n ] P X i Y 2 , i k , i S k \ 1 | X i 1 , Y 2 , 1 2 , i 1 , , Y k , 1 k , i 1 , Y 1 n , S 1 ( x i , y 2 , i k , i , s k \ 1 | x i 1 , y 2 , 1 2 , i 1 , , y k , 1 k , i 1 , y 1 n , s 1 )
= i [ n ] P X i Y 2 , i k , i S k \ 1 | X i 1 , Y 1 , 1 1 , i 1 , , Y k , 1 i 1 , Y 1 , i , S 1 ( x i , y 2 , i k , i , s k \ 1 | x i 1 , y 1 , 1 1 , i 1 , , y k , 1 k , i 1 , y 1 , i , s 1 )
= i [ n ] P X i Y 2 , i k , i W 2 , i k , i | Y 1 , i W 1 , i ( x i , y 2 , i k , i , w 2 , i k , i | y 1 , i , w 1 , i )
P X ^ j n | Y j n S j ( x ^ j n | y j n , s j ) = i [ n ] P X ^ j , i | Y j , 1 j , i S j ( x ^ j , i | y j , 1 j , i , s j )
= i [ n ] P X ^ j , i | X i 1 , Y 1 , 1 1 , i 1 , , Y k , 1 k , i 1 , Y j , i , S j ( x ^ j , i | x i 1 , y 1 , 1 1 , i 1 , , y k , 1 k , i 1 , y j , i , s j )
= i [ n ] P X ^ j , i | Y j , i W 1 , i k , i ( x ^ j , i | y j , i , w 1 , i k , i ) ,
P X n | S 1 ( x n | s 1 ) = i [ n ] P X i | X i 1 S 1 ( x i | x i 1 , S 1 )
= i [ n ] P X i | X i 1 Y 1 , 1 1 , i 1 , , Y k , 1 k , i 1 S 1 ( x i | x i 1 y 1 , 1 1 , i 1 , , y k , 1 k , i 1 , S 1 )
= i [ n ] P X i | W 1 , i ( x i | w 1 , i )
P X n | S j 1 ( x n | s j 1 ) = i [ n ] P X i | X i 1 S j 1 ( x i | x i 1 , s j 1 )
= i [ n ] P X i | X i 1 Y 1 , 1 1 , i 1 , , Y k , 1 k , i 1 , S j 1 ( x i | x i 1 , y 1 , 1 1 , i 1 , , y k , 1 k , i 1 , s j 1 )
= i [ n ] P X i | W 1 , i j 1 , i ( x i , w 1 , i j 1 , i ) ,
P X n | S j ( x n | s j ) = i [ n ] P X i | W 1 , i j , i ( x i , w 1 , i j , i ) ,
where (A16) follows from the Markov chain ( X i , Y 2 , i k , i , S 2 k ) ( X i 1 , Y 1 , 1 1 , i 1 , , Y k , 1 k , i 1 , Y 1 , i , S 1 ) Y 1 , i + 1 1 , n , (A19) follows from the Markov chain X ^ j , i ( Y j , 1 j , i , S j ) ( X i 1 , Y 1 , 1 1 , i 1 , , Y j 1 , 1 j 1 , i 1 , Y j + 1 , 1 j + 1 , i 1 , , Y k , 1 k , i 1 ) , (A22) follows from the Markov chain X i ( X i 1 , S 1 ) ( Y 1 i 1 , , Y k i 1 ) , and (A25) follows from the Markov chain X i ( X i 1 , S j 1 ) ( Y 1 , 1 1 , i 1 , , Y k , 1 k , i 1 ) .
Furthermore, recall that for i [ n ] , Q C i | D i are arbitrary distributions where C i T i and D i T i . Please note that Lemma A1 holds for arbitrary choices of distributions Q C | D where C G and D G . The proof of Lemma 2 is completed by using Lemma A1 with the following choices of auxiliary distributions and noting that B 7 = A 7 :
Q X n ( x n ) : = i [ n ] Q X i ( x i ) ,
Q Y k | X n S k ( y k | x n , s k ) : = i [ n ] Q Y 1 , i k , i | X i , W 1 , i k , i ( y 1 , i k , i | x i , w 1 , i k , i ) ,
Q X n Y k \ 1 S 2 k | Y 1 n S 1 X ^ 1 n ( x n , y k \ 1 , s 2 k | y 1 n , s 1 , x ^ 1 n ) : = i [ n ] Q X i Y 2 , i k , i W 2 , i k , i | Y 1 , i W 1 , i X ^ 1 , i ( x i , y 2 , i k , i , w 2 , i k , i | y 1 , i , w 1 , i , x ^ 1 , i )
Q X ^ j n | X n Y k S k X ^ j 1 ( x ^ j n | x n , y k , s k , x ^ k \ j ) : = i [ n ] Q X ^ j , i | X i , Y 1 , i k , i , W 1 , i k , i , X ^ 1 , i j 1 , i ( x ^ j , i | x i , y 1 , i k , i , w 1 , i k , i , x ^ 1 , i j 1 , i ) .

Appendix C. Proof of Lemma A1

Recall the definition of the probability of correct decoding P c ( n ) ( D k ) in (24) and the definitions of sets { A j } j [ 7 ] in (A7) to (A13). For any ( n , M k ) -code, we have that
P c ( n ) ( D k ) = Pr { A 7 }
= Pr A 7 ( j [ 6 ] A j ) + Pr A 7 ( j [ 6 ] A j c )
Pr j [ 7 ] A j + j [ 6 ] Pr { A j c } ,
where (A34) follows from the union bound and the fact that Pr { A B } Pr { B } for any two sets A and B . The proof of Lemma A1 is completed by showing that
j [ 6 ] Pr { A j c } ( 2 k + 2 ) exp ( n η ) .
In the remainder of this subsection, we show that (A35) holds. Recall the joint distribution of G in (31). In the following, when we use a (conditional) distribution starting with P, we mean that the (conditional) distribution is induced by the joint distribution P G in (31).
Using the definition of A 1 in (A7),
Pr { A 1 c } = x n X n P X n ( x n ) 1 { P X n ( x n ) exp ( n η ) Q X n ( x n ) }
exp ( n η ) .
Similarly to (A37), it follows that
Pr { A 2 c } = g A 2 c P G ( g )
= x n , s k , y k P X Y k ( x n , y k ) j [ k ] P S j | X n ( s j | x n ) 1 { P Y k | X n ( y k | x n ) exp ( n η ) Q Y k | X n S k ( y k | x n , s k ) }
exp ( n η ) x n , s k , y k P X n ( x n ) Q Y k | X n S k ( y k | x n , s k ) j [ k ] P S j | X n ( s j | x n )
exp ( n η ) ,
Pr { A 3 c } = g A 3 c P G ( g )
exp ( n η ) x n , y k , s k , x ^ 1 n P Y 1 n S 1 ( y 1 n , s 1 ) P X ^ 1 n | Y 1 n S 1 ( x ^ 1 n | y 1 n , s 1 ) Q X n Y k \ 1 S 2 k | Y 1 n S 1 X ^ 1 n ( x n , y k \ 1 , s 2 k | y 1 n , s 1 , x ^ 1 n )
exp ( n η ) ,
Furthermore, using the definition of A 4 in (A10) and the union bound,
Pr { A 4 c } j [ 2 : k ] exp ( n η ) x n , y k , s k , x ^ j P X Y k n ( x n , y k ) l [ k ] P S l | X n ( s l | x n ) l [ j 1 ] P X ^ l n | Y l n S l ( x ^ l n | y l n , s l )
× Q X ^ j n | X n Y k S k X ^ j 1 ( x ^ j n | x n , y k , s k , x ^ j 1 )
( k 1 ) exp ( n η ) .
Furthermore, using the definition of A 5 in (A11),
Pr { A 5 c } x n , s 1 P S 1 | X n ( s 1 | x n ) exp ( n ( R 1 + η ) ) P X n | S 1 ( x n | s 1 )
x n , s 1 exp ( n ( R 1 + η ) ) P X n | S 1 ( x n | s 1 )
= s 1 exp ( n ( η + R 1 ) )
exp ( n η ) ,
where (A49) follows since P S 1 | X n ( s 1 | x n ) 1 for all ( x n , s 1 ) , and (A51) follows since s 1 = | W 1 | = M 1 exp ( n R 1 ) .
Using the definition of A 6 in (A12) and the union bound similarly to (A47), it follows that
Pr { A 6 c } j [ 2 : k ] x n , s j P S j 1 ( s j 1 ) exp ( n η ) P X n | S j ( x n | s j ) exp ( n ( R j l [ j 1 ] R l ) ) P S j | X n ( s j | x n )
j [ 2 : k ] exp ( n η ) x n , s j P S j 1 ( s j 1 ) P X n | S j ( x n | s j ) exp ( n ( R j l [ j 1 ] R l ) )
j [ 2 : k ] exp ( n η ) s j exp ( n ( R j l [ j 1 ] R l ) )
( k 1 ) exp ( n η ) ,
where (A53) follows since P S j | X n ( s j | x n ) 1 for all ( x n , s j ) and (A55) follows since s j = | M j | = M j exp ( n ( R j l [ j 1 ] R l ) ) .

Appendix D. Proof of Lemma 3

For any ( μ , α k , β k ) R + × [ 0 , 1 ] 2 k satisfying (15), for i [ 4 ] , define F i = B i (cf. (33) to (36)) and for i [ 5 : 7 ] , define
F 5 : = g : μ α 1 R 1 μ α 1 n i [ n ] log Q X i | W 1 , i ( x i | w 1 , i ) P X ( x i ) μ α 1 η ,
F 6 : = { g : μ α j ( R j l [ j 1 ] R l ) i [ n ] μ α j n log Q X i | W 1 , i W 2 , i ( x i | w 1 , i , w 2 , i ) P X i | W 1 , i ( x i | w 1 , i ) μ α j η , j [ 2 : k ] } ,
F 7 : = { g : μ β j D j μ β j n i [ n ] log exp ( d 1 ( x i , x ^ 1 , i ) ) , j [ k ] } .
Furthermore, let
c ( μ , α k ) : = k + 2 + j [ k ] μ α j .
Using Lemma 2 and definitions in (A56) to (A59), we obtain that
P c ( n ) ( D k ) ( 2 k + 2 ) exp ( n η )
Pr i [ 7 ] F i
Pr n ( μ κ ( α k , β k ) ( R k , D k ) + c ( μ , α k ) η ) i [ n ] f Q i , P i ( μ , α k , β k ) ( T i )
exp n λ ( μ κ ( α k , β k ) ( R k , D k ) + c ( μ , α k ) η ) + log E exp λ i [ n ] f Q i , P i ( μ , α k , β k ) ( T i )
= exp n λ μ κ ( α k , β k ) ( R k , D k ) + λ c ( μ , α k ) η 1 n Ω ( λ , μ , α k , β k ) ( { P i , Q i } i [ n ] ) ,
where (A62) follows from Cramér’s bound in Lemma 13 of [31] and (A63) follows from the definition of Ω ( λ , μ , α k , β k ) ( { P i , Q i } i [ n ] ) in (42).
Choose η such that
η = λ μ κ ( α k , β k ) ( R k , D k ) + λ c ( μ , α k ) η 1 n Ω ( λ , μ , α k , β k ) ( { P i , Q i } i [ n ] ) ,
i.e.,
η = 1 n Ω ( λ , μ , α k , β k ) ( { P i , Q i } i [ n ] ) λ μ κ ( α k , β k ) ( R k , D k ) 1 + λ c ( μ , α k ) .
The proof of Lemma 3 is completed by combining (A63) and (A65).

Appendix E. Proof of Lemma 4

Recall that for each i [ n ] , we use t i to denote ( x i , y 1 , i k , i , w 1 , i k , i , x ^ 1 , i k , i ) and use T i similarly. Recall that the auxiliary random variables are chosen as w 1 , i = ( x i 1 , y 1 i 1 , , y k i 1 , s 1 ) and w j , i = s j for all j [ 2 : k ] . Using the definition of f Q i , P i ( μ , α k , β k ) in (41), define
h Q i , P i ( λ , μ , α k , β k ) ( t i ) : = exp λ f Q i , P i ( μ , α k , β k ) ( t i ) .
Recall the joint distribution of G in (31). For each j [ n ] , define
C ˜ j : = g P G ( g ) i [ j ] h Q i , P i ( λ , μ , α k , β k ) ( t i ) ,
P G ( λ , μ , α k , β k ) | j ( g ) : = P G ( g ) i [ j ] h Q i , P i ( λ , μ , α k , β k ) ( t i ) C ˜ i ,
Λ j ( λ , μ , α k , β k ) ( { Q i , P i } i [ n ] ) : = C ˜ j C ˜ j 1 .
Combining (42) and (A69),
exp Ω ( { P i , Q i } i [ n ] ) ( λ , μ , α k , β k ) = E i [ n ] h Q i , P i ( λ , μ , α k , β k ) ( T i )
= g G P G ( g ) i [ n ] h Q i , P i ( λ , μ , α k , β k ) ( t i )
= i [ n ] Λ i ( λ , μ , α k , β k ) ( { Q i , P i } ) .
Furthermore, similar to Lemma 5 of [25], we obtain the following lemma, which is critical in the proof of Lemma 4.
Lemma A2.
For each j [ n ] ,
Λ j ( λ , μ , α k , β k ) ( { Q i , P i } i [ n ] ) = g G P G ( λ , μ , α k , β k ) | j 1 ( g ) h Q j , P j ( μ , α k , β k ) ( t i ) .
Furthermore, for each j [ n ] , define
P ( λ , μ , α k , β k ) ( t j ) : = x j + 1 n , y 1 , j + 1 n , , y k , j + 1 n , x ^ 1 j 1 , , x ^ k j 1 , x ^ 1 , j + 1 n , , x ^ k , j + 1 n P G ( λ , μ , α k , β k ) | j 1 ( g ) .
Using Lemma A2 and (A74), it follows that for each j [ n ] ,
Λ j ( λ , μ , α k , β k ) ( { Q i , P i } i [ n ] ) = t j P ( λ , μ , α k , β k ) ( t j ) h Q j , P j ( μ , α k , β k ) ( t j ) .
Recall that the auxiliary distributions { Q i } i [ n ] can be arbitrary distributions. Following the recursive method in [25], for each i [ n ] , we choose Q i such that
Q i ( t i ) = P ( λ , μ , α k , β k ) ( t i ) .
Let Q C i | D i , where C i T i and D i T i , be induced by Q i . Using the definition of h Q i , P i ( λ , μ , α k , β k ) ( t i ) in (A66), we define
ξ Q i , P i ( λ , μ , α k , β k ) ( t i ) : = h Q i , P i ( λ , μ , α k , β k ) ( t i ) P X i Y 2 , i k , i W 2 , i k , i | Y 1 , i W 1 , i ( x i , y 2 , i k , i , w 2 , i k , i | y 1 , i , w 1 , i ) Q X i Y 2 , i k , i W 2 , i k , i | Y 1 , i W 1 , i ( x i , y 2 , i k , i , w 2 , i k , i | y 1 , i , w 1 , i ) λ × j [ 2 : k ] P X ^ j , i | Y j , i W 1 , i j , i ( x ^ j , i | y j , i , w 1 , i j , i ) Q X ^ j , i | Y j , i W 1 , i j , i ( x ^ j , i | y j , i , w 1 , i j , i ) λ P X i | W 1 , i ( x i | w 1 , i ) Q X i | W 1 , i ( x i | w 1 , i ) λ μ α 1 × j [ 2 : k ] P X i | W 1 , i j 1 , i ( x i , w 1 , i j 1 , i ) Q X i | W 1 , i j 1 , i ( x i , w 1 , i j 1 , i ) λ μ α j .
In the following, for simplicity, we let Ψ : = 1 k λ j [ k ] λ μ α j . Combining (A74) and (A75), we obtain that for each l [ n ] ,
Λ l ( λ , μ , α k , β k ) ( { Q i , P i } i [ n ] )
= E Q l h Q l , P l ( μ , α k , β k ) ( T l )
= E Q l [ ξ Q l , P l ( μ , α k , β k ) ( T l ) P X l Y 2 , l k , l W 2 , l k , l | Y 1 , l W 1 , l ( x l , y 2 , l k , l , w 2 , l k , l | y 1 , l , w 1 , l ) Q X l Y 2 , l k , l W 2 , l k , l | Y 1 , l W 1 , l ( x l , y 2 , l k , l , w 2 , l k , l | y 1 , l , w 1 , l ) λ j [ 2 : k ] P X ^ j , l | Y j , l W 1 , l j , l ( x ^ j , l | y j , l , w 1 , l j , l ) Q X ^ j , l | Y j , l W 1 , l j , l ( x ^ j , l | y j , l , w 1 , l j , l ) λ
× P X l | W 1 , l ( x l | w 1 , l ) Q X l | W 1 , l ( x l | w 1 , l ) λ μ α 1 j [ 2 : k ] P X l | W 1 , l j 1 , l ( x l , w 1 , l j 1 , l ) Q X l | W 1 , l j 1 , l ( x l , w 1 , l j 1 , l ) λ μ α j ]
E Q l ξ Q l , P l ( μ , α k , β k ) ( T l ) 1 Ψ Ψ E P X l Y 2 , l k , l W 2 , l k , l | Y 1 , l W 1 , l ( x l , y 2 , l k , l , w 2 , l k , l | y 1 , l , w 1 , l ) Q X l Y 2 , l k , l W 2 , l k , l | Y 1 , l W 1 , l ( x l , y 2 , l k , l , w 2 , l k , l | y 1 , l , w 1 , l ) λ × j [ 2 : k ] E P X ^ j , l | Y j , l W 1 , l j , l ( x ^ j , l | y j , l , w 1 , l j , l ) Q X ^ j , l | Y j , l W 1 , l j , l ( x ^ j , l | y j , l , w 1 , l j , l ) λ E P X l | W 1 , l ( x l | w 1 , l ) Q X l | W 1 , l ( x l | w 1 , l ) λ μ α 1
× j [ 2 : k ] E P X l | W 1 , l j 1 , l ( x l , w 1 , l j 1 , l ) Q X l | W 1 , l j 1 , l ( x l , w 1 , l j 1 , l ) λ μ α j
exp Ψ Ω ( λ Ψ , μ , α k , β k ) ( Q j )
= exp Ω ( θ , μ , α k , β k ) ( Q j ) 1 + k θ + j [ k ] θ μ α j
exp min Q j P ( T j ) Ω ( θ , μ , α k , β k ) ( Q j ) 1 + k θ + j [ k ] θ μ α j
= exp Ω ( θ , μ , α k , β k ) 1 + k θ + j [ k ] θ μ α j
where (A80) results from Hölder’s inequality, (A81) follows from the definitions of Ω ( θ , μ , α k , β k ) ( · ) in (17) and ξ Q j , P j ( μ , α k , β k ) ( · ) in (A77), (A82) follows from the result in (46), and (A84) follows from the definition of Ω ( θ , μ , α k , β k ) in (18) and the fact it is sufficient to consider distributions Q j with cardinality bounds W 1 , j | X | and W 2 , j | X | 2 for the optimization problem in (A83) (the proof of this fact is similar to Property 4(a) in [25] and thus omitted).
The proof of Lemma 4 is completed by combining (A72) and (A84).

Appendix F. Proof of Lemma 6

Appendix F.1. Proof of Claim (i)

For any Q T Q (see (14)), let P T P sh (see (53)) be chosen such that P W k | X = Q W k | X and P X ^ j | Y j W j = Q X ^ j | Y j W j for all j [ k ] .
In the following, we drop the subscript of distributions when there is no confusion. For any ( θ , μ , α k , β k ) R + 2 × [ 0 , 1 ] 2 k satisfying (15) and
j [ 2 : k ] μ α j 1 and l [ k ] , θ 1 1 + μ α l ,
using the definition of Ω ( θ , μ , α k , β k ) ( Q T ) in (17), we obtain
exp Ω ( θ , μ , α k , β k ) ( Q T ) = E Q T [ P ( X , Y k ) Q ( X , Y k \ 1 , W k \ 1 | Y 1 , W 1 ) j [ 2 : k ] Q ( X ^ j | Y , W j ) Q ( X ) Q ( Y k | X , W k ) Q ( X , Y k \ 1 , W k \ 1 | Y 1 , W 1 , X ^ 1 ) j [ 2 : k ] Q ( X ^ j | X , Y k , W k , X ^ j 1 ) θ
× P ( X ) Q ( X | W 1 ) θ μ α 1 j [ 2 : k ] Q ( X | W j 1 ) Q ( X | W j ) θ μ α j exp θ μ j [ k ] β j d j ( X , X ^ j ) ]
= E Q T P ( T ) Q ( T ) θ P ( X ) Q ( X | W 1 ) θ μ α 1 j [ 2 : k ] Q ( X | W j 1 ) Q ( X | W j ) θ μ α j exp θ μ j [ k ] β j d j ( X , X ^ j )
= E Q T P ( T ) Q ( T ) θ P ( X ) P ( X | W 1 ) θ μ α 1 j [ 2 : k ] Q ( X | W j 1 ) P ( X | W j ) θ μ α j exp θ μ j [ k ] β j d j ( X , X ^ j )
× j [ k ] P ( X | W j ) Q ( X | W j ) θ μ α j
E Q T P ( T ) Q ( T ) P ( X ) P ( X | W 1 ) μ α 1 j [ 2 : k ] Q ( X | W j 1 ) P ( X | W j ) μ α j exp μ j [ k ] β j d j ( X , X ^ j ) θ
× j [ k ] E Q T P ( X | W j ) Q ( X | W j ) θ μ α j 1 θ 1 θ
( E P T [ P ( X ) P ( X | W 1 ) μ α 1 j [ 2 : k ] Q ( X | W j 1 ) P ( X | W j ) μ α j exp μ j [ k ] β j d j ( X , X ^ j ) ] ) θ
= ( E P T [ P ( X ) P ( X | W 1 ) μ α 1 j [ 2 : k ] P ( X | W j 1 ) P ( X | W j ) μ α j exp μ j [ k ] β j d j ( X , X ^ j )
× j [ 2 : k ] Q ( X | W j 1 ) P ( X | W j 1 ) μ α j ] ) θ
= E P T P ( X ) P ( X | W 1 ) μ α 1 j [ 2 : k ] P ( X | W j 1 ) P ( X | W j ) μ α j exp μ j [ k ] β j d j ( X , X ^ j ) 1 1 j [ 2 : k ] μ α j θ ( 1 j [ 2 : k ] μ α j )
× j [ 2 : k ] E P T Q ( X | W j 1 ) P ( X | W j 1 ) θ μ α j
= exp θ ( 1 j [ 2 : k ] μ α j ) Ω ˜ μ 1 j [ 2 : k ] μ α j , α k , β k ,
where (A87) follows since (i) with our choice of P T P sh , we have
P ( T ) = P ( X , Y k ) P ( W k | X ) j [ k ] P ( X ^ j | Y j , W j )
and (ii) the following equality holds
Q ( X , Y k \ 1 , W k \ 1 | Y 1 , W 1 ) Q ( X , Y k \ 1 , W k \ 1 | Y 1 , W 1 , X ^ 1 ) = Q ( X ^ 1 | Y 1 , W 1 ) Q ( X ^ 1 | X , Y k , W k ) ,
Equation (A89) follows from Hölder’s inequality, (A90) follows from the concavity of X a for a [ 0 , 1 ] and the choice of θ which ensures θ μ α j 1 θ 1 for all j [ k ] , (A92) follows by applying Hölder’s inequality and recalling that j [ 2 : k ] μ α j 1 , and (A93) follows from the definition of Ω ˜ ( λ , α k , β k ) ( P T ) in (58).
Therefore, for any ( θ , μ , α k , β k ) R + 2 × [ 0 , 1 ] 2 k satisfying (15) and (A85), using the definition of Ω ( θ , μ , α k , β k ) in (18) and the result in (A93), we have that
Ω ( θ , μ , α k , β k ) θ ( 1 j [ 2 : k ] μ α j ) Ω ˜ μ 1 j [ 2 : k ] μ α j , α k , β k .
Recalling the definition of F ( R k , D k ) in (21) and using the result in (A96), we have
F ( R k , D k )
= sup ( θ , μ , α k , β k ) R + 2 × [ 0 , 1 ] 2 k : i [ k ] ( α i + β i ) = 1 Ω ( θ , μ , α k , β k ) θ μ κ ( α k , β k ) ( R k , D k ) 1 + ( 2 k + 2 ) θ + j [ k ] 2 θ μ α j
sup ( θ , μ , α k , β k ) R + 2 × [ 0 , 1 ] 2 k : ( 15 ) and ( A 85 ) θ ( 1 j [ 2 : k ] μ α j ) Ω ˜ μ 1 j [ 2 : k ] μ α j , α k , β k θ μ κ ( α k , β k ) ( R k , D k ) 1 + ( 2 k + 2 ) θ + j [ k ] 2 θ μ α j
= sup ( μ , α k , β k ) R + × [ 0 , 1 ] 2 k : ( 15 ) and μ 1 j [ 2 : k ] α j sup θ R + : max j [ k ] θ ( 1 + μ α j ) 1 θ ( 1 j [ 2 : k ] μ α j ) Ω ˜ μ 1 j [ 2 : k ] μ α j , α k , β k θ μ κ ( α k , β k ) ( R k , D k ) 1 + ( 2 k + 2 ) θ + j [ k ] 2 θ μ α j
= sup ( μ , α k , β k ) R + × [ 0 , 1 ] 2 k : ( 15 ) and μ 1 j [ 2 : k ] α j ( 1 j [ 2 : k ] μ α j ) Ω ˜ μ 1 j [ 2 : k ] μ α j , α k , β k μ κ ( α k , β k ) ( R k , D k ) 2 k + 3 + μ α + + l [ k ] 2 μ α l
= sup ( λ , α k , β k ) R + × [ 0 , 1 ] 2 k : ( 15 ) Ω ˜ ( λ , α k , β k ) λ κ ( α k , β k ) ( R k , D k ) 2 k + 3 + λ α + + j [ 2 : k ] λ ( 2 k + 3 ) α j + l [ k ] 2 λ α l
= F ˜ ( R k , D k ) ,
where (A100) follows since
sup θ R + : max j [ k ] θ ( 1 + μ α j ) 1 θ 1 + ( 2 k + 2 ) θ + j [ k ] 2 θ μ α j = min j [ k ] 1 2 k + 3 + μ α j + l [ k ] 2 μ α l
= 1 2 k + 3 + μ α + + l [ k ] 2 μ α l ,
and (A101) follows by choosing λ = μ 1 j [ 2 : k ] μ α j and (A102) follows from the definition of F ˜ in (62).

Appendix F.2. Proof of Claim (ii)

Recall the definitions of Ω ˜ ( λ , α k , β k ) ( P T ) in (57) and P T ( λ , α k , β k ) in (63). By simple calculation, one can verify that
Ω ˜ ( λ , α k , β k ) ( P T ) λ = E P T ( λ , α k , β k ) ω ˜ P T ( α k , β k ) ( T ) ,
2 Ω ˜ ( λ , α k , β k ) ( P T ) λ 2 = Var P T ( λ , α k , β k ) ω ˜ P T ( α k , β k ) ( T ) .
Applying Taylor expansion to Ω ˜ ( λ , α k , β k ) ( P T ) at around λ = 0 and combining (A105), (A106), we have that for any P T P sh and any λ [ 1 , 1 j [ k ] α j ] , there exists τ [ 0 , λ ] such that
Ω ˜ ( λ , α k , β k ) ( P T ) = Ω ˜ ( 0 , α k , β k ) ( P T ) + λ E P T ( 0 , α k , β k ) ω ˜ P T ( α k , β k ) ( T ) λ 2 2 Var P T ( τ , α k , β k ) ω ˜ P T ( α k , β k ) ( T )
λ E P T ω ˜ P T ( α k , β k ) ( T ) λ 2 ρ 2 ,
where (A108) follows from the definitions in (57), (63) and (64).
Using the definitions in (54), (57) and (60) and the result in (A108), we have that for any λ [ 0 , 1 j [ k ] α j ] ,
Ω ˜ ( λ , α k , β k ) = min P T P sh Ω ˜ ( λ , α k , β k ) ( P T )
λ R ( α k , β k ) λ 2 ρ 2 .
For any rate-distortion tuple outside the rate-distortion region, i.e., ( R k , D k ) R , from Lemma 5, we conclude that there exists ( α k , , β k , ) [ 0 , 1 ] 2 k satisfying (15) such that for some positive δ [ 0 , ρ ]
κ ( α k , , β k , ) ( R k , D k ) R ( α k , , β k , ) δ .
Using the definition of F ˜ ( R k , D k ) in (62), we have
F ˜ ( R k , D k ) = sup ( λ , α k , β k ) R + × [ 0 , 1 ] 2 k : ( 15 ) Ω ˜ ( λ , α k , β k ) λ κ ( α k , β k ) ( R k , D k ) 2 k + 3 + λ α + + j [ 2 : k ] λ ( 2 k + 3 ) α j + l [ k ] 2 λ α l
sup λ [ 0 , 1 ] Ω ˜ ( λ , α k , , β k , ) λ κ ( α k , , β k , ) ( R k , D k ) 2 k + 3 + λ max j [ k ] α j + j [ 2 : k ] λ ( 2 k + 3 ) α j + l [ k ] 2 λ α l
sup λ [ 0 , 1 ] λ δ λ 2 ρ 2 2 k + 9
= δ 2 2 ( 2 k + 9 ) ρ ,
where (A114) follows from the results in (A110), (A111) and the inequality
2 k + 3 + λ max j [ k ] α j + j [ 2 : k ] λ ( 2 k + 3 ) α j + l [ k ] 2 λ α l 2 k + 9 ,
resulting from the constraints that ( α k , , β k , ) [ 0 , 2 ] 2 k satisfying (15) and λ [ 0 , 1 ] .

References

  1. Maor, A.; Merhav, N. On Successive Refinement with Causal Side Information at the Decoders. IEEE Trans. Inf. Theory 2008, 54, 332–343. [Google Scholar] [CrossRef]
  2. Tian, C.; Diggavi, S.N. On multistage successive refinement for Wyner–Ziv source coding with degraded side informations. IEEE Trans. Inf. Theory 2007, 53, 2946–2960. [Google Scholar] [CrossRef]
  3. Steinberg, Y.; Merhav, N. On successive refinement for the Wyner-Ziv problem. IEEE Trans. Inf. Theory 2004, 50, 1636–1654. [Google Scholar] [CrossRef]
  4. Equitz, W.H.; Cover, T.M. Successive refinement of information. IEEE Trans. Inf. Theory 1991, 37, 269–275. [Google Scholar] [CrossRef]
  5. Koshelev, V. Estimation of mean error for a discrete successive-approximation scheme. Probl. Peredachi Informatsii 1981, 17, 20–33. [Google Scholar]
  6. Rimoldi, B. Successive refinement of information: Characterization of the achievable rates. IEEE Trans. Inf. Theory 1994, 40, 253–259. [Google Scholar] [CrossRef]
  7. Kanlis, A.; Narayan, P. Error exponents for successive refinement by partitioning. IEEE Trans. Inf. Theory 1996, 42, 275–282. [Google Scholar] [CrossRef]
  8. No, A.; Ingber, A.; Weissman, T. Strong Successive Refinability and Rate-Distortion-Complexity Tradeoff. IEEE Trans. Inf. Theory 2016, 62, 3618–3635. [Google Scholar] [CrossRef]
  9. Zhou, L.; Tan, V.Y.F.; Motani, M. Second-Order and Moderate Deviation Asymptotics for Successive Refinement. IEEE Trans. Inf. Theory 2017, 63, 2896–2921. [Google Scholar] [CrossRef]
  10. Tuncel, E.; Rose, K. Additive successive refinement. IEEE Trans. Inf. Theory 2003, 49, 1983–1991. [Google Scholar] [CrossRef]
  11. Chow, J.; Berger, T. Failure of successive refinement for symmetric Gaussian mixtures. IEEE Trans. Inf. Theory 1997, 43, 350–352. [Google Scholar] [CrossRef]
  12. Tuncel, E.; Rose, K. Error exponents in scalable source coding. IEEE Trans. Inf. Theory 2003, 49, 289–296. [Google Scholar] [CrossRef]
  13. Effros, M. Distortion-rate bounds for fixed- and variable-rate multiresolution source codes. IEEE Trans. Inf. Theory 1999, 45, 1887–1910. [Google Scholar] [CrossRef]
  14. Weissman, T.; Gamal, A.E. Source Coding With Limited-Look-Ahead Side Information at the Decoder. IEEE Trans. Inf. Theory 2006, 52, 5218–5239. [Google Scholar] [CrossRef]
  15. El Gamal, A.; Kim, Y.H. Network Information Theory; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
  16. Timo, R.; Vellambi, B.N. Two lossy source coding problems with causal side-information. In Proceedings of the 2009 IEEE International Symposium on Information Theory, Seoul, South Korea, 28 June–3 July 2009; pp. 1040–1044. [Google Scholar] [CrossRef]
  17. Gu, W.H.; Effros, M. Source coding for a simple multi-hop network. In Proceedings of the International Symposium on Information Theory (ISIT 2005), Adelaide, SA, Australia, 4–9 September 2005. [Google Scholar]
  18. Gray, R.; Wyner, A. Source coding for a simple network. Bell Syst. Tech. J. 1974, 53, 1681–1721. [Google Scholar] [CrossRef]
  19. Maor, A.; Merhav, N. On Successive Refinement for the Kaspi/Heegard–Berger Problem. IEEE Trans. Inf. Theory 2010, 56, 3930–3945. [Google Scholar] [CrossRef]
  20. Heegard, C.; Berger, T. Rate distortion when side information may be absent. IEEE Trans. Inf. Theory 1985, 31, 727–734. [Google Scholar] [CrossRef]
  21. Chia, Y.K.; Weissman, T. Cascade and Triangular source coding with causal side information. In Proceedings of the 2011 IEEE International Symposium on Information Theory Proceedings, St. Petersburg, Russia, 31 July–5 August 2011; pp. 1683–1687. [Google Scholar] [CrossRef]
  22. Fong, S.L.; Tan, V.Y.F. A Proof of the Strong Converse Theorem for Gaussian Multiple Access Channels. IEEE Trans. Inf. Theory 2016, 62, 4376–4394. [Google Scholar] [CrossRef]
  23. Oohama, Y. Exponent Function for One Helper Source Coding Problem at Rates outside the Rate Region. arXiv 2015, arXiv:1504.05891. [Google Scholar]
  24. Oohama, Y. New Strong Converse for Asymmetric Broadcast Channels. arXiv 2016, arXiv:1604.02901. [Google Scholar]
  25. Oohama, Y. Exponential Strong Converse for Source Coding with Side Information at the Decoder. Entropy 2018, 20, 352. [Google Scholar] [CrossRef]
  26. Ahlswede, R.; Korner, J. Source coding with side information and a converse for degraded broadcast channels. IEEE Trans. Inf. Theory 1975, 21, 629–637. [Google Scholar] [CrossRef]
  27. Wyner, A.D. On source coding with side information at the decoder. IEEE Trans. Inf. Theory 1975, 21, 294–300. [Google Scholar] [CrossRef]
  28. Korner, J.; Marton, K. General broadcast channels with degraded message sets. IEEE Trans. Inf. Theory 1977, 23, 60–64. [Google Scholar] [CrossRef]
  29. Wyner, A.D.; Ziv, J. The rate-distortion function for source coding with side information at the decoder. IEEE Trans. Inf. Theory 1976, 22, 1–10. [Google Scholar] [CrossRef]
  30. Tuncel, E.; Gündüz, D. Identification and Lossy Reconstruction in Noisy Databases. IEEE Trans. Inf. Theory 2014, 60, 822–831. [Google Scholar] [CrossRef]
  31. Zhou, L.; Tan, V.Y.F.; Motani, M. Exponential Strong Converse for Content Identification with Lossy Recovery. IEEE Trans. Inf. Theory 2018, 64, 5879–5897. [Google Scholar] [CrossRef]
  32. Wyner, A. The common information of two dependent random variables. IEEE Trans. Inf. Theory 1975, 21, 163–179. [Google Scholar] [CrossRef]
  33. Yu, L.; Tan, V.Y.F. Wyner’s Common Information Under Rényi Divergence Measures. IEEE Trans. Inf. Theory 2018, 64, 3616–3632. [Google Scholar] [CrossRef]
  34. Csiszár, I.; Körner, J. Information Theory: Coding Theorems for Discrete Memoryless Systems; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
  35. Gu, W.; Effros, M. A strong converse for a collection of network source coding problems. In Proceedings of the 2009 IEEE International Symposium on Information Theory, Seoul, South Korea, 28 June–3 July 2009; pp. 2316–2320. [Google Scholar] [CrossRef]
  36. Liu, J.; van Handel, R.; Verdú, S. Beyond the Blowing-Up Lemma: Optimal Second-Order Converses via Reverse Hypercontractivity. Preprint. Available online: http://web.mit.edu/jingbo/www/preprints/msl-blup.pdf (accessed on 17 April 2019).
  37. Tyagi, H.; Watanabe, S. Strong Converse using Change of Measure. arXiv 2018, arXiv:1805.04625. [Google Scholar]
  38. Ahlswede, R.; Csiszár, I. Hypothesis testing with communication constraints. IEEE Trans. Inf. Theory 1986, 32, 533–542. [Google Scholar] [CrossRef]
  39. Salehkalaibar, S.; Wigger, M.; Wang, L. Hypothesis testing in multi-hop networks. arXiv 2017, arXiv:1708.05198. [Google Scholar]
  40. Cao, D.; Zhou, L.; Tan, V.Y.F. Strong Converse for Hypothesis Testing Against Independence over a Two-Hop Network. arXiv 2018, arXiv:1808.05366. [Google Scholar]
  41. Marton, K. Error exponent for source coding with a fidelity criterion. IEEE Trans. Inf. Theory 1974, 20, 197–199. [Google Scholar] [CrossRef]
  42. Yassaee, M.H.; Aref, M.R.; Gohari, A. A technique for deriving one-shot achievability results in network information theory. In Proceedings of the 2013 IEEE International Symposium on Information Theory, Istanbul, Turkey, 7–12 July 2013; pp. 1287–1291. [Google Scholar]
  43. Kostina, V.; Verdú, S. Fixed-length lossy compression in the finite blocklength regime. IEEE Trans. Inf. Theory 2012, 58, 3309–3338. [Google Scholar] [CrossRef]
  44. Tan, V.Y.F. Asymptotic Estimates in Information Theory with Non-Vanishing Error Probabilities. Found. Trends Commun. Inf. Theory 2014, 11, 1–184. [Google Scholar] [CrossRef]
  45. Pradhan, S.S.; Chou, J.; Ramchandran, K. Duality between source coding and channel coding and its extension to the side information case. IEEE Trans. Inf. Theory 2003, 49, 1181–1203. [Google Scholar] [CrossRef]
  46. Slepian, D.; Wolf, J.K. Noiseless coding of correlated information sources. IEEE Trans. Inf. Theory 1973, 19, 471–480. [Google Scholar] [CrossRef]
  47. Gelfand, S. Coding for channel with random parameters. Probl. Contr. Inf. Theory 1980, 9, 19–31. [Google Scholar]
  48. Shannon, C.E. Channels with side information at the transmitter. IBM J. Res. Dev. 1958, 2, 289–293. [Google Scholar] [CrossRef]
  49. Sigurjónsson, S.; Kim, Y.H. On multiple user channels with state information at the transmitters. In Proceedings of the International Symposium on Information Theory (ISIT 2005), Adelaide, SA, Australia, 4–9 September 2005; pp. 72–76. [Google Scholar]
  50. Zaidi, A.; Shitz, S.S. On cooperative multiple access channels with delayed CSI at transmitters. IEEE Trans. Inf. Theory 2014, 60, 6204–6230. [Google Scholar] [CrossRef]
  51. Shkel, Y.Y.; Verdú, S. A single-shot approach to lossy source coding under logarithmic loss. IEEE Trans. Inf. Theory 2018, 64, 129–147. [Google Scholar] [CrossRef]
  52. Courtade, T.A.; Weissman, T. Multiterminal source coding under logarithmic loss. IEEE Trans. Inf. Theory 2014, 60, 740–761. [Google Scholar] [CrossRef]
  53. Strassen, V. Asymptotische abschätzungen in shannons informationstheorie. In Transactions of the Third Prague Conference on Information Theory etc; Czechoslovak Academy of Sciences: Prague, Czech Republic, 1962; pp. 689–723. [Google Scholar]
  54. Hayashi, M. Information spectrum approach to second-order coding rate in channel coding. IEEE Trans. Inf. Theory 2009, 55, 4947–4966. [Google Scholar] [CrossRef]
  55. Polyanskiy, Y.; Poor, H.V.; Verdú, S. Channel coding rate in the finite blocklength regime. IEEE Trans. Inf. Theory 2010, 56, 2307–2359. [Google Scholar] [CrossRef]
  56. Kostina, V. Lossy Data Compression: Non-Asymptotic Fundamental Limits. Ph.D. Thesis, Department of Electrical Engineering, Princeton University, Princeton, NJ, USA, 2013. [Google Scholar]
Figure 1. Encoder-decoder system model for the k-user successive refinement problem with causal decoder side information at time i [ n ] . Each encoder f j where j [ k ] compresses the source information into codewords S j . Given accumulated side information ( Y j , 1 , , Y j , i ) and the codewords ( S 1 , , S j ) , decoder ϕ j , i reproduces the i-th source symbol as X ^ j , i . At time n, for j [ k ] , the estimate X ^ j n for user j is required to satisfy distortion constraint D j under a distortion measure d j .
Figure 1. Encoder-decoder system model for the k-user successive refinement problem with causal decoder side information at time i [ n ] . Each encoder f j where j [ k ] compresses the source information into codewords S j . Given accumulated side information ( Y j , 1 , , Y j , i ) and the codewords ( S 1 , , S j ) , decoder ϕ j , i reproduces the i-th source symbol as X ^ j , i . At time n, for j [ k ] , the estimate X ^ j n for user j is required to satisfy distortion constraint D j under a distortion measure d j .
Entropy 21 00410 g001

Share and Cite

MDPI and ACS Style

Zhou, L.; Hero, A. Exponential Strong Converse for Successive Refinement with Causal Decoder Side Information. Entropy 2019, 21, 410. https://doi.org/10.3390/e21040410

AMA Style

Zhou L, Hero A. Exponential Strong Converse for Successive Refinement with Causal Decoder Side Information. Entropy. 2019; 21(4):410. https://doi.org/10.3390/e21040410

Chicago/Turabian Style

Zhou, Lin, and Alfred Hero. 2019. "Exponential Strong Converse for Successive Refinement with Causal Decoder Side Information" Entropy 21, no. 4: 410. https://doi.org/10.3390/e21040410

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop