1. Introduction
For single or multi terminal source coding systems, the converse coding theorems state that at any data compression rates below the fundamental theoretical limit of the system the error probability of decoding cannot go to zero when the block length n of the codes tends to infinity. On the other hand, the strong converse theorems state that, at any transmission rates exceeding the fundamental theoretical limit, the error probability of decoding must go to one when n tends to infinity. The former converse theorems are sometimes called the weak converse theorems to distinguish them with the strong converse theorems.
In this paper, we study the strong converse theorem for the rate distortion problem with side information at the decoder posed and investigated by Wyner and Ziv [
1]. We call the above source coding system the Wyner and Ziv source coding system (the WZ system). The WZ system is shown in
Figure 1. In this figure, the WZ system corresponds to the case where the switch is close. In
Figure 1, the sequence
represents independent copies of a pair of dependent random variables
which take values in the finite sets
and
, respectively. We assume that
has a probability distribution denoted by
. The encoder
outputs a binary sequence which appears at a rate
R bits per input symbol. The decoder function
observes
and
to output a sequence
. The
t-th component
of
for
take values in the finite reproduction alphabet
. Let
be an arbitrary distortion measure on
. The distortion between
and
is defined by
In general, we have two criteria on
. One is the excess-distortion probability of decoding defined by
The other is the average distortion defined by
A pair
is
-
achievable for
if there exist a sequence of pairs
such that for any
and any
n with
where
stands for the range of cardinality of
. The rate distortion region
is defined by
On the other hand, we can define a rate distortion region based on the average distortion criterion, a formal definition of which is the following. A pair
is
achievable for
if there exist a sequence of pairs
such that for any
and any
n with
,
The rate distortion region
is defined by
If the switch is open, then the side information is not available to the decoder. In this case the communication system corresponds to the source coding for the discrete memoryless source (DMS) specified with . We define the rate distortion region in a similar manner to the definition of . We further define the region and , respectively in a similar manner to the definition of and .
Previous works on the characterizations of
,
, and
are shown in
Table 1. Shannon [
2] determined
. Subsequently, Wolfowiz [
3] proved that
Furthermore, he proved the strong converse theorem. That is, if
, then for any sequence
of encoder and decoder functions satisfying the condition
we have
The above strong converse theorem implies that, for any
,
Csiszár and Körner proved that in Equation (3), the probability converges to one exponentially and determined the optimal exponent as a function of .
The previous works on the coding theorems for the WZ system are summarized in
Table 1. The rate distortion region
was determined by Wyner and Ziv [
1]. Csiszár and Körner [
4] proved that
. On the other hand, we have had no result on the strong converse theorem for the WZ system.
Main results of this paper are summarized in
Table 1. For the WZ system, we prove that if
is out side the rate distortion region
, then we have that for any sequence
of encoder and decoder functions satisfying the conditionin Equation (
2), the quantity
goes to zero exponentially and derive an explicit lower bound of this exponent function. This result corresponds to Theorem 3 in
Table 1. As a corollary from this theorem, we obtain the strong converse result, which is stated in Corollary 2 in
Table 1. This results states that we have an outer bound with
gap from the rate distortion region
.
To derive our result, we use a new method called the recursive method. This method is a general powerful tool to prove strong converse theorems for several coding problems in information theory. In fact, the recursive method plays important roles in deriving exponential strong converse exponent for communication systems treated in [
5,
6,
7,
8].
2. Source Coding with Side Information at the Decoder
In the following argument, the operations
and
, respectively, stand for the expectation and the variance with respect to a probability distribution
p. When the value of
p is obvious from the context, we omit the suffix
p in those operations to simply write
and
. Let
and
be finite sets and
be a stationary discrete memoryless source. For each
, the random pair
takes values in
, and has a probability distribution
We write
n independent copies of
and
, respectively, as
We consider a communication system depicted in
Figure 2. Data sequences
is separately encoded to
and is sent to the information processing center. At the centerm the decoder function
observes
and
to output the estimation
of
. The encoder function
is defined by
Let
be a reproduction alphabet. The decoder function
is defined by
Let
be an arbitrary distortion measure on
. The distortion between
and
is defined by
The excess-distortion probability of decoding is
where
. The average distortion
between
and
is defined by
In the previous section, we gave the formal definitions of , , , and . We can show that the above three rate distortion regions satisfy the following property.
Property 1. - (a)
The regions , , , and are closed convex sets of , where - (b)
has another form using -rate distortion region, the definition of which is as follows. We setwhich is called the -rate distortion region. Using , can be expressed aswhere stands for the closure operation.
It is well known that
was determined by Wyner and Ziv [
1]. To describe their result we introduce auxiliary random variables
U and
Z, respectively, taking values in finite sets
and
. We assume that the joint distribution of
is
The above condition is equivalent to
Define the set of probability distribution
by
By definitions, it is obvious that
. Set
We can show that the above functions and sets satisfy the following property:
Property 2. - (a)
The region is a closed convex set of .
- (b)
Proof of Property 2 is given in
Appendix C. In Property 2 Part (b),
is regarded as another expression of
. This expression is useful for deriving our main result. The rate region
was determined by Wyner and Ziv [
1]. Their result is the following:
On
, Csiszár and Körner [
4] obtained the following result.
Theorem 2 (Csiszár and Körner [
4])
. We are interested in an asymptotic behavior of the error probability of decoding to tend to one as
for
. To examine the rate of convergence, we define the following quantity. Set
By time sharing, we have that
Choosing
and
in Equation (
7), we obtain the following subadditivity property on
:
which together with Fekete’s lemma yields that
exists and satisfies the following:
The exponent function
is a convex function of
. In fact, from Equation (
7), we have that for any
where
. The region
is also a closed convex set. Our main aim is to find an explicit characterization of
. In this paper, we derive an explicit outer bound of
whose section by the plane
coincides with
.
3. Main Results
In this section, we state our main results. We first explain that the rate distortion region
can be expressed with two families of supporting hyperplanes. To describe this result, we define two sets of probability distributions on
by
Then, we have the following property:
Proof of Property 3 is given in
Appendix D. For
and
, define
We next define a functions serving as a lower bound of
. For each
, define
We can show that the above functions satisfies the following properties:
Property 4. - (a)
The cardinality bound appearing in is sufficient to describe the quantity . Furthermore, the cardinality bound in is sufficient to describe the quantity .
- (b)
For any , we have - (c)
Fix any and . For , exists and is nonnegative. For , define a probability distribution by Then, for , is twice differentiable. Furthermore, for , we have The second equality implies that is a concave function of .
- (d)
For , defineand set Then, we have . Furthermore, for any , - (e)
For every , the condition implieswhere g is the inverse function of .
Proof of Property 4 Part (a) is given in
Appendix B. Proof of Property 4 Part (b) is given in
Appendix E. Proofs of Property 4 Parts (c), (d), and (e) are given in
Appendix F.
Our main result is the following:
Theorem 3. For any , any , and for any satisfying we have It follows from Theorem 3 and Property 4 Part (d) that if is outside the rate distortion region, then the error probability of decoding goes to one exponentially and its exponent is not below .
It immediately follows from Theorem 3 that we have the following corollary.
Corollary 1. For any and any , we have Furthermore, for any , we have Proof of Theorem 3 will be given in the next section. The exponent function in the case of
can be obtained as a corollary of the result of Oohama and Han [
9] for the separate source coding problem of correlated sources [
10]. The techniques used by them is a method of types [
4], which is not useful for proving Theorem 3. In fact, when we use this method, it is very hard to extract a condition related to the Markov chain condition
, which the auxiliary random variable
must satisfy when
is on the boundary of the set
. Some novel techniques based on the information spectrum method introduced by Han [
11] are necessary to prove this theorem.
From Theorem 3 and Property 4 Part (e), we can obtain an explicit outer bound of
with an asymptotically vanishing deviation from
. The strong converse theorem immediately follows from this corollary. To describe this outer bound, for
, we set
which serves as an outer bound of
. For each fixed
, we define
by
Step (a) follows from . Since as , we have the smallest positive integer such that for . From Theorem 3 and Property 4 Part (e), we have the following corollary.
Corollary 2. For each fixed ε, we choose the above positive integer Then, for any , we have The above result together withyields that for each fixed , we have Proof of this corollary will be given in the next section.
The direct part of coding theorem, i.e., the inclusion of
⊆
was established by Csiszár and Körner [
4]. They proved a weak converse theorem to obtain the inclusion
. Until now, we have had no result on the strong converse theorem. The above corollary stating the strong converse theorem for the Wyner–Ziv source coding problem implies that a long standing open problem since Csiszár and Körner [
4] has been resolved.
4. Proof of the Main Results
In this section, we prove Theorem 3 and Corollary 2. We first present a lemma which upper bounds the correct probability of decoding by the information spectrum quantities. We set
Then, we have the following:
Lemma 1. For any and for any , satisfying we have The probability distribution and stochastic matrices appearing in the right members of Equation (18) have a property that we can select them arbitrary. In Equation (14), we can choose any probability distribution on . In Equation (15), we can choose any stochastic matrix . In Equation (16), we can choose any stochastic matrix ×. In Equation (17), we can choose any stochastic matrix ×.
Lemma 2. Suppose that, for each , the joint distribution of the random vector is a marginal distribution of . Then, for , we have the following Markov chain:or equivalently that . Proof of this lemma is given in
Appendix H. For
, set
. Let
be a random vector taking values in
×
×
. From Lemmas 1 and 2, we have the following:
Lemma 3. For any and for any , satisfying we have the following:where for each , the following probability distribution and stochastic matrices:appearing in the first term in the right members of Equation (21) have a property that we can choose their values arbitrary. Proof. On the probability distributions appearing in the right members of Equation (18), we take the following choices. In Equation (14), we choose
so that
In Equation (15), we choose
so that
In Equation (16), we choose
so that
In Equation (16), we note that
Step (a) follows from Lemma 2. In Equation (17), we choose
so that
From Lemma 1 and Equations (21)–(25), we have the bound of Equation (21) in Lemma 3. ☐
To evaluate an upper btound of Equation (
21) in Lemma 3, we use the following lemma, which is well known as the Cramér’s bound in the large deviation principle.
Lemma 4. For any real valued random variable A and any , we have Here, we define a quantity which serves as an exponential upper bound of
. For each
, let
be a set of all
Let
be a set of all probability distributions
on
having the form:
For simplicity of notation, we use the notation
for
. We assume that
is a marginal distribution of
. For
, we simply write
. For
and
, we define
where, for each
, the following probability distribution and stochastic matrices:
appearing in the definition of
are chosen so that they are induced by the joint distribution
.
By Lemmas 3 and 4, we have the following proposition:
Proposition 1. For any , any , and any satisfying we have Proof. When
, the bound we wish to prove is obvious. In the following argument, we assume that
We define five random variables
by
By Lemma 3, for any
satisfying
we have
where we set
Applying Lemma 4 to the first term in the right member of Equation (26), we have
Solving Equation (28) with respect to
, we have
For this choice of
and Equation (27), we have
completing the proof. ☐
By Proposition 1, we have the following corollary.
Corollary 3. For any , for any , and for any satisfying we have We shall call
the communication potential. The above corollary implies that the analysis of
leads to an establishment of a strong converse theorem for Wyner–Ziv source coding problem. In the following argument, we drive an explicit lower bound of
. We use a new technique we call
the recursive method. The recursive method is a powerful tool to drive a single letterized exponent function for rates below the rate distortion function. This method is also applicable to prove the exponential strong converse theorems for other network information theory problems [
5,
6,
7]. Set
For each
, define a function of
by
For each
, we define the conditional probability distribution
by
where
are constants for normalization. For
, define
where we define
for
Then, we have the following lemma:
Lemma 5. For each , and for any , we have The equality in Equation (34) in Lemma 5 is obvious from Equations (29)–(31). Proofs of Equations (32) and (33) in this lemma are given in
Appendix I. Next, we define a probability distribution of the random pair
taking values in
by
where
is a constant for normalization given by
For
, define
where we define
. Set
Then, we have the following:
Proof. By the equality Equation (34) in Lemma 5, we have
Step (a) follows from the definition in Equation (36) of
We next prove Equation (39) in Lemma 6. Multiplying
to both sides of Equation (35), we have
Taking summations of Equations (41) and (42) with respect to
, we have
Step (a) follows from Equation (33) in Lemma 5. Step (b) follows from the definition in Equation (37) of . ☐
The following proposition is a mathematical core to prove our main result.
Proposition 2. For , we choose the parameter α such that Then, for any and for any , we have Proof. Then, by Lemma 6, we have
For each
, we recursively choose
so that
and choose
,
,
, and
appearing in
such that they are the distributions induced by
. Then, for each
⋯,
n, we have the following chain of inequalities:
Step (a) follows from Hölder’s inequality and the following identity:
Step (b) follows from Equation (43). Step (c) follows from the definition of
. Step (d) follows from that by Property 4 Part (a), the bound
, is sufficient to describe
. Hence, we have the following:
Step (a) follows from Equation (38) in Lemma 6. Step (b) follows from Equation (45). Since Equation (46) holds for any
and any
, we have
Thus, we have Equation (44) in Proposition 2. ☐
Proof of Theorem 3: Then, we have the following:
Step (a) follows from Corollary 3. Step (b) follows from Proposition 2 and Equation (47). Since the above bound holds for any positive
,
and
, we have
Thus, Equation (10) in Theorem 3 is proved. ☐
Proof of Corollary 2: Since
g is an inverse function of
, the definition in Equation (
13) of
is equivalent to
By the definition of
, we have that
for
. We assume that for
,
Then, there exists a sequence
such that for
, we have
Then, by Theorem 3, we have
for any
. We claim that for
, we have
∈
. To prove this claim, we suppose that
does not belong to
for some
. Then, we have the following chain of inequalities:
Step (a) follows from
and Property 4 Part (e). Step (b) follows from Equation (48). The bound of Equation (50) contradicts Equation (49). Hence, we have
∈
or equivalent to
for
, which implies that for
,
completing the proof. ☐