On the Reliability Function of Variable-Rate Slepian-Wolf Coding

Chen, Jun; He, Da-ke; Jagmohan, Ashish; Lastras-Montaño, Luis A.

doi:10.3390/e19080389

Open AccessArticle

On the Reliability Function of Variable-Rate Slepian-Wolf Coding^†

by

Jun Chen

^1,2,*,

Da-ke He

³,

Ashish Jagmohan

⁴ and

Luis A. Lastras-Montaño

⁴

¹

College of Electronic Information and Automation, Tianjin University of Science and Technology, Tianjin 300222, China

²

Department of Electrical and Computer Engineering, McMaster University, Hamilton, ON L8S 4K1, Canada

³

Google, Mountain View, CA 94043, USA

⁴

IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USA

^*

Author to whom correspondence should be addressed.

^†

This paper is an extended version of our paper published in the 45th Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 26–28 September 2007.

Entropy 2017, 19(8), 389; https://doi.org/10.3390/e19080389

Submission received: 13 June 2017 / Revised: 14 July 2017 / Accepted: 27 July 2017 / Published: 28 July 2017

(This article belongs to the Special Issue Multiuser Information Theory)

Download

Browse Figures

Versions Notes

Abstract

:

The reliability function of variable-rate Slepian-Wolf coding is linked to the reliability function of channel coding with constant composition codes, through which computable lower and upper bounds are derived. The bounds coincide at rates close to the Slepian-Wolf limit, yielding a complete characterization of the reliability function in that rate region. It is shown that variable-rate Slepian-Wolf codes can significantly outperform fixed-rate Slepian-Wolf codes in terms of rate-error tradeoff. Variable-rate Slepian-Wolf coding with rate below the Slepian-Wolf limit is also analyzed. In sharp contrast with fixed-rate Slepian-Wolf codes for which the correct decoding probability decays to zero exponentially fast if the rate is below the Slepian-Wolf limit, the correct decoding probability of variable-rate Slepian-Wolf codes can be bounded away from zero.

Keywords:

channel coding; duality; reliability function; Slepian-Wolf coding

1. Introduction

Consider the problem (see Figure 1) of compressing

X^{n} = (X_{1}, X_{2}, \dots, X_{n})

with side information

Y^{n} = (Y_{1}, Y_{2}, \dots, Y_{n})

available only at the decoder. Here

{(X_{i}, Y_{i})}_{i = 1}^{\infty}

is a joint memoryless source with zero-order joint probability distribution

P_{X Y}

on finite alphabet

X \times Y

. Let

P_{X}

and

P_{Y}

be the marginal probability distributions of X and Y induced by the joint probability distribution

P_{X Y}

. Without loss of generality, we shall assume

P_{X} (x) > 0, P_{Y} (y) > 0

for all

x \in X, y \in Y

. This problem was first studied by Slepian and Wolf in their landmark paper [1]. They proved a surprising result that the minimum rate for reconstructing

X^{n}

at the decoder with asymptotically zero error probability (as block length n goes to infinity) is

H (X | Y)

, which is the same as the case where the side information

Y^{n}

is also available at the encoder. The fundamental limit

H (X | Y)

is often referred to as the Slepian-Wolf limit. We shall assume

H (X | Y) > 0

throughout this paper.

Different from conventional lossless source coding, where most effort has been devoted to variable-rate coding schemes, research on Slepian-Wolf coding has almost exclusively focused on fixed-rate codes (see, e.g., [2,3,4,5] and the references therein). This phenomenon can be partly explained by the influence of channel coding. It is well known that there is an intimate connection between channel coding and Slepian-Wolf coding. Intuitively, one may view

Y^{n}

as the channel output generated by channel input

X^{n}

through discrete memoryless channel

P_{Y | X}

, where

P_{Y | X}

is the probability transition matrix from

X

to

Y

induced by the joint probability probability distribution

P_{X Y}

. Since

Y^{n}

is not available at the encoder, Slepian-Wolf coding is, in a certain sense, similar to channel coding without feedback. In a channel coding system, there is little incentive to use variable-rate coding schemes if no feedback link exists from the receiver to the transmitter. Therefore, it seems justifiable to focus on fixed-rate codes in Slepian-Wolf coding.

This viewpoint turns out to be misleading. We shall show that variable-rate Slepian-Wolf codes can significantly outperform fixed-rate codes in terms of rate-error tradeoff. Specifically, it is revealed that variable-rate Slepian-Wolf codes can beat the sphere-packing bound for fixed-rate Slepian-Wolf codes at rates close to the Slepian-Wolf limit. It is known [6] that the correct decoding probability of fixed-rate Slepian-Wolf codes decays to zero exponentially fast if the rate is below the Slepian-Wolf limit. Somewhat surprisingly, the decoding error probability of variable-rate Slepian-Wolf codes can be bounded away from one even when they are operated below the Slepian-Wolf limit, and the performance degrades gracefully as the rate goes to zero. Therefore, variable-rate Slepian-Wolf coding is considerably more robust.

The rest of this paper is organized as follows. In Section 2, we review the existing bounds on the reliability function of fixed-rate Slepian-Wolf coding, and point out the intimate connections with their counterparts in channel coding. In Section 3, we characterize the reliability function of variable-rate Slepian-Wolf coding by leveraging the reliability function of channel coding with constant composition codes. Computable lower and upper bounds are derived. The bounds coincide at rates close to the Slepian-Wolf limit. The correct decoding probability of variable-rate Slepian-Wolf coding with rate below the Slepian-Wolf limit is studied in Section 4. An illustrative example is given in Section 5. We conclude the paper in Section 6. Throughout this paper, we assume the logarithm function is to base e unless specified otherwise.

2. Fixed-Rate Slepian-Wolf Coding and Channel Coding

To facilitate the comparisons between the performances of fixed-rate Slepian-Wolf coding and variable-rate coding, we shall briefly review the existing bounds on the reliability function of fixed-rate Slepian-Wolf coding. It turns out that a most instructive way is to first consider their counterparts in channel coding. The reason is two-fold. First, it provides the setup to introduce several important definitions. Second and more important, it will be clear that the reliability function of fixed-rate Slepian-Wolf coding is closely related to that of channel coding; indeed, such a connection will be further explored in the context of variable-rate Slepian-Wolf coding.

For any probability distributions

P, Q

on

X

and probability transition matrices

V, W : X \to Y

, we use

H (P)

,

I (P, V)

,

D (Q ∥ P)

, and

D (W ∥ V | P)

to denote the standard entropy, mutual information, divergence, and conditional divergence functions; specifically, we have

\begin{matrix} H (P) = - \sum_{x} P (x) \log P (x), \\ I (P, V) = \sum_{x, y} P (x) V (y | x) \log \frac{V (y | x)}{\sum_{x^{'}} P (x^{'}) V (y | x^{'})}, \\ D (Q ∥ P) = \sum_{x} Q (x) \log \frac{Q (x)}{P (x)}, \\ D (W ∥ V | P) = \sum_{x, y} P (x) W (y | x) \log \frac{W (y | x)}{V (y | x)} . \end{matrix}

The main technical tool we need is the method of types. First, we shall quote a few basic definitions from [7]. Let

P (X)

denote the set of all probability distributions on

X

. The type of a sequence

x^{n} \in X^{n}

, denoted as

P_{x^{n}}

, is the empirical probability distribution of

x^{n}

. Let

P_{n} (X)

denote the set consisting of the possible types of sequences

x^{n} \in X^{n}

. For any

P \in P_{n} (X)

, the type class

T_{n} (P)

is the set of sequences in

X^{n}

of type P. We will make frequent use of the following elementary results:

| P_{n} {(X) | \leq (n + 1)}^{| X |},

(1)

\frac{1}{{(n + 1)}^{| X |}} e^{n H (P)} \leq | T_{n} (P) | \leq e^{n H (P)}, P \in P_{n} (X),

(2)

\prod_{i = 1}^{n} P (x_{i}) = e^{- n [D (Q ∥ P) + H (Q)]}, x^{n} \in T_{n} (Q), Q \in P_{n} (X), P \in P (X) .

(3)

A block code

C_{n}

is an ordered collection of sequences in

X^{n}

. We allow

C_{n}

to contain identical sequences. Moreover, for any set

A \subseteq X^{n}

, we say that

C_{n} \subseteq A

if

x^{n} \in A

for all

x^{n} \in C_{n}

. Note that

C_{n} \subseteq A

does not imply

| C_{n} | \leq | A |

. The rate of

C_{n}

is defined as

\begin{matrix} R (C_{n}) = \frac{1}{n} \log | C_{n} | . \end{matrix}

Given a channel

W_{Y | X} : X \to Y

, a block code

C_{n} \subseteq X^{n}

, and channel output

Y^{n} \in Y^{n}

, the output of the optimal maximum likelihood (ML) decoder is

\begin{matrix} {\hat{X}}^{n} = \arg \min_{x^{n} \in C_{n}} - \sum_{i = 1}^{n} \log W_{Y | X} (Y_{i} | x_{i}), \end{matrix}

where the ties are broken in an arbitrary manner. The average decoding error probability of block code

C_{n}

over channel

W_{Y | X}

is defined as

P_{e} (C_{n}, W_{Y | X}) = \frac{1}{| C_{n} |} \sum_{x^{n} \in C_{n}} \Pr {{\hat{X}}^{n} \neq x^{n} | x^{n} is transmitted} .

The maximum decoding error probability of block code

C_{n}

over channel

W_{Y | X}

is defined as

P_{e, \max} (C_{n}, W_{Y | X}) = \max_{x^{n} \in C_{n}} \Pr {{\hat{X}}^{n} \neq x^{n} | x^{n} is transmitted} .

The average correct decoding probability of block code

C_{n}

over channel

W_{Y | X}

is defined as

P_{c} (C_{n}, W_{Y | X}) = 1 - P_{e} (C_{n}, W_{Y | X}) .

Definition 1.

Given a channel

W_{Y | X} : X \to Y

, we say that an error exponent

E \geq 0

is achievable with block codes at rate R if for any

δ > 0

, there exists a sequence of block codes

{C_{n}}

such that

\begin{matrix} \underset{n \to \infty}{\lim \inf} R (C_{n}) \geq R - δ, \\ \underset{n \to \infty}{\lim \sup} - \frac{1}{n} \log P_{e} (C_{n}, W_{Y | X}) \geq E - δ . \end{matrix}

(4)

The largest achievable error exponent at rate R is denoted by

E (W_{Y | X}, R)

. The function

E (W_{Y | X}, \cdot)

is referred to as the reliability function of channel

W_{Y | X}

.

Similarly, we say that a correct decoding exponent

E^{c} \geq 0

is achievable with block channel codes at rate R if for any

δ > 0

, there exists a sequence of block codes

{C_{n}}

such that

\begin{matrix} \underset{n \to \infty}{\lim \inf} R (C_{n}) \geq R - δ, \\ \underset{n \to \infty}{\lim \inf} - \frac{1}{n} \log P_{c} (C_{n}, W_{Y | X}) \leq E^{c} + δ . \end{matrix}

The smallest achievable correct decoding exponent at rate R is denoted by

E^{c} (W_{Y | X}, R)

. It will be seen that

E^{c} (W_{Y | X}, R)

is positive if and only if

R > C (W_{Y | X})

, where

C (W_{Y | X}) ≜ \max_{Q_{X}} I (Q_{X}, W_{Y | X})

is the capacity of channel

W_{Y | X}

. Therefore, we shall refer to the function

E^{c} (W_{Y | X}, \cdot)

as the reliability function of channel

W_{Y | X}

above the capacity.

Remark 1.

Given any block code

C_{n}

of average decoding error probability

P_{e} (C_{n}, W_{Y | X})

, we can expurgate the worst half of the codewords so that the maximum decoding error probability of the resulting code is bounded above by

2 P_{e} (C_{n}, W_{Y | X})

. Therefore, the reliability function

E (W_{Y | X}, \cdot)

is unaffected if we replace

P_{e} (C_{n}, W_{Y | X})

by

P_{e, \max} (C_{n}, W_{Y | X})

in (4).

Definition 2.

Given a probability distribution

Q_{X} \in P (X)

and a channel

W_{Y | X} : X \to Y

, we say that an error exponent

E \geq 0

is achievable at rate R with constant composition codes of type approximately

Q_{X}

if for any

δ > 0

, there exists a sequence of block codes

{C_{n}}

with

C_{n} \subseteq T_{n} (P_{n})

for some

P_{n} \in P_{n} (X)

such that

\begin{matrix} \lim_{n \to \infty} ∥ P_{n} - Q_{X} ∥ = 0, \\ \underset{n \to \infty}{\lim \inf} R (C_{n}) \geq R - δ, \\ \underset{n \to \infty}{\lim \sup} - \frac{1}{n} \log P_{e} (C_{n}, W_{Y | X}) \geq E - δ, \end{matrix}

where

∥ \cdot ∥

is the

l_{1}

norm.

The largest achievable error exponent at rate R for constant composition codes of type approximately

Q_{X}

is denoted by

E (Q_{X}, W_{Y | X}, R)

. The function

E (Q_{X}, W_{Y | X}, \cdot)

is referred to as the reliability function of channel

W_{Y | X}

for constant composition codes of type approximately

Q_{X}

.

Similarly, we say that a correct decoding exponent

E^{c} \geq 0

is achievable at rate R with constant composition codes of type approximately

Q_{X}

if for any

δ > 0

, there exists a sequence of block codes

{C_{n}}

with

C_{n} \subseteq T_{n} (P_{n})

for some

P_{n} \in P_{n} (X)

such that

\begin{matrix} \lim_{n \to \infty} ∥ P_{n} - Q_{X} ∥ = 0, \\ \underset{n \to \infty}{\lim \inf} R (C_{n}) \geq R - δ, \\ \underset{n \to \infty}{\lim \inf} - \frac{1}{n} \log P_{c} (C_{n}, W_{Y | X}) \leq E^{c} + δ . \end{matrix}

(5)

The smallest achievable correct decoding exponent at rate R for constant composition codes of type approximately

Q_{X}

is denoted by

E^{c} (Q_{X}, W_{Y | X}, R)

.

Remark 2.

The reliability function

E (Q_{X}, W_{Y | X}, \cdot)

is unaffected if we replace

P_{e} (C_{n}, W_{Y | X})

by

P_{e, \max} (C_{n}, W_{Y | X})

in (5).

Let

{| t |}^{+} = \max {0, t}

and

d_{W_{Y | X}} (x, \tilde{x}) = - \log \sum_{y} \sqrt{W_{Y | X} (y | x) W_{Y | X} (y | \tilde{x})}

. Define

\begin{matrix} E_{e x} (Q_{X}, W_{Y | X}, R) = \min_{Q_{\tilde{X} | X} : Q_{X} = Q_{\tilde{X}}, I (Q_{X}, Q_{\tilde{X} | X}) \leq R} [E_{Q_{X \tilde{X}}} d_{W_{Y | X}} (X, \tilde{X}) + I (Q_{X}, Q_{\tilde{X} | X}) - R], \end{matrix}

(6)

\begin{matrix} E_{r c} (Q_{X}, W_{Y | X}, R) = \min_{V_{Y | X}} [D (V_{Y | X} ∥ W_{Y | X} | Q_{X}) + | I (Q_{X}, V_{Y | X}) {- R |}^{+}], \end{matrix}

(7)

\begin{matrix} E_{s p} (Q_{X}, W_{Y | X}, R) = \min_{V_{Y | X} : I (Q_{X}, V_{Y | X}) \leq R} D (V_{Y | X} ∥ W_{Y | X} | Q_{X}), \end{matrix}

(8)

where in (6),

Q_{\tilde{X}}

and

Q_{X \tilde{X}}

are respectively the marginal probability distribution of

\tilde{X}

and the joint probability distribution of X and

\tilde{X}

induced by

Q_{X}

and

Q_{\tilde{X} | X}

.

Let

R_{e x}^{\infty} (Q_{X}, W_{Y | X})

be the smallest

R \geq 0

with

E_{e x} (Q_{X}, W_{Y | X}, R) < \infty

. We have

\begin{matrix} R_{e x}^{\infty} (Q_{X}, W_{Y | X}) = \min_{Q_{\tilde{X} | X} : Q_{X} = Q_{\tilde{X}}, E_{Q_{X \tilde{X}}} d_{W_{Y | X}} (X, \tilde{X}) < \infty} I (Q_{X}, Q_{\tilde{X} | X}) . \end{matrix}

(9)

It is known ([7], Exercise 5.18) that

E_{e x} (Q_{X}, W_{Y | X}, R)

is a decreasing convex function of R for

R \geq R_{e x}^{\infty} (Q_{X}, W_{Y | X})

; moreover, the minimum in (9) is achieved at

Q_{X \tilde{X}}

if and only if

Q_{X \tilde{X}} (x, \tilde{x}) = \{\begin{matrix} c Q (x) Q (\tilde{x}) & if d_{W_{Y | X}} (x, \tilde{x}) < \infty, \\ 0 & otherwise, \end{matrix}

where the probability distribution Q and the constant c are uniquely determined by the condition

Q_{X} = Q_{\tilde{X}}

.

It is shown in ([8], Lemma 3) that, for some

R^{*} (Q_{X}, W_{Y | X}) \in [0, I (Q_{X}, W_{Y | X})]

, we have

\begin{matrix} \max \{E_{e x} (Q_{X}, W_{Y | X}, R), E_{r c} (Q_{X}, P_{Y | X}, R)\} = \{\begin{matrix} E_{e x} (Q_{X}, W_{Y | X}, R) & if R \leq R^{*} (Q_{X}, W_{Y | X}), \\ E_{r c} (Q_{X}, W_{Y | X}, R) & if R > R^{*} (Q_{X}, W_{Y | X}) . \end{matrix} \end{matrix}

(10)

It is also known ([7], Corollary 5.4) that

\begin{matrix} E_{r c} (Q_{X}, W_{Y | X}, R) = \{\begin{matrix} E_{s p} (Q_{X}, W_{Y | X}, R) & if R \geq R_{c r} (Q_{X}, W_{Y | X}), \\ E_{s p} (Q_{X}, W_{Y | X}, R_{c r}) + R_{c r} - R & if 0 \leq R \leq R_{c r} (Q_{X}, W_{Y | X}), \end{matrix} \end{matrix}

(11)

where

R_{c r} ≜ R_{c r} (Q_{X}, W_{Y | X})

is the smallest R at which the convex curve

E_{s p} (Q_{X}, W_{Y | X}, R)

meets its supporting line of slope −1. It is obvious that

R_{c r} (Q_{X}, W_{Y | X}) \leq I (Q_{X}, W_{Y | X})

.

Proposition 1.

R_{c r} (Q_{X}, W_{Y | X}) = I (Q_{X}, W_{Y | X})

if and only if the value of

\begin{matrix} \frac{W_{Y | X} (y | x)}{\sum_{x^{'}} Q_{X} (x^{'}) W_{Y | X} (y | x^{'})} \end{matrix}

does not depend on y for all

x, y

such that

Q_{X} (x) W_{Y | X} (y | x) > 0

.

Proof.

See Appendix A ☐

Define

R_{s p}^{\infty} (Q_{X}, W_{Y | X}) = \inf {R > 0 : E_{s p} (Q_{X}, W_{Y | X}, R) < \infty}

. It is known ([7], Exercise 5.3) that

\begin{matrix} R_{s p}^{\infty} (Q_{X}, W_{Y | X}) = \min I (Q_{X}, V_{Y | X}), \end{matrix}

(12)

where the minimum is taken over those

V_{Y | X}

’s for which

V_{Y | X} (y | x) = 0

whenever

W_{Y | X} (y | x) = 0

; in particular,

R_{s p}^{\infty} (Q_{X}, W_{Y | X}) > 0

if and only if for every

y \in Y

there exists an

x \in X

with

Q_{X} (x) > 0

and

W_{Y | X} (y | x) = 0

.

Proposition 2.

The minimum in (12) is achieved at

V_{Y | X} = W_{Y | X}

if and only if the value of

\begin{matrix} \frac{W_{Y | X} (y | x)}{\sum_{x^{'}} Q_{X} (x^{'}) W_{Y | X} (y | x^{'})} \end{matrix}

does not depend on y for all

x, y

such that

Q_{X} (x) W_{Y | X} (y | x) > 0

.

Proof.

The proof is similar to that of Proposition 1. The details are omitted. ☐

One can readily prove the following result by combining Propositions 1 and 2.

Proposition 3.

The following statements are equivalent:

$R_{c r} (Q_{X}, P_{Y | X}) = I (Q_{X}, W_{Y | X})$ ;
$R_{s p}^{\infty} (Q_{X}, P_{Y | X}) = I (Q_{X}, W_{Y | X})$ ;
the value of

$\begin{matrix} \frac{W_{Y | X} (y | x)}{\sum_{x^{'}} Q_{X} (x^{'}) W_{Y | X} (y | x^{'})} \end{matrix}$

does not depend on y for all $x, y$ such that $Q_{X} (x) W_{Y | X} (y | x) > 0$ .

Proposition 4.

$E (Q_{X}, W_{Y | X}, R) \geq \max {E_{e x} (Q_{X}, W_{Y | X}, R), E_{r c} (Q_{X}, W_{Y | X}, R)}$ ;
$E (Q_{X}, P_{Y | X}, R) \leq E_{s p} (Q_{X}, W_{Y | X}, R)$ with the possible exception of $R = R_{s p}^{\infty} (Q_{X}, W_{Y | X})$ at which point the inequality not necessarily holds;
$E^{c} (Q_{X}, W_{Y | X}, R) = \min_{V_{Y | X}} [D (V_{Y | X} ∥ W_{Y | X} | Q_{X}) + | R - I (Q_{X}; V_{Y | X}) |^{+}]$ .

Remark 3.

E_{e x} (Q_{X}, W_{Y | X}, R)

,

E_{r c} (Q_{X}, W_{Y | X}, R)

, and

E_{s p} (Q_{X}, W_{Y | X}, R)

are respectively the expurgated exponent, the random coding exponent, and the sphere packing exponent of channel

W_{Y | X}

for constant composition codes of type approximately

Q_{X}

. The results in Proposition 4 are well known [7,9]. However, bounding the decoding error probability of constant composition codes often serves as an intermediate step in characterizing the reliability function for general block codes; as a consequence, the reliability function for constant composition codes is rarely explicitly defined. Moreover,

E_{e x} (Q_{X}, W_{Y | X}, R)

,

E_{r c} (Q_{X}, W_{Y | X}, R)

, and

E_{s p} (Q_{X}, W_{Y | X}, R)

are commonly used to bound the decoding error probability of constant composition codes for a fixed block length n; therefore, it is implicitly assumed that

Q_{X}

is taken from

P_{n} (X)

(see, e.g., [7]). In contrast, we consider a sequence of constant composition codes with block length increasing to infinity and type converging to

Q_{X}

for some

Q_{X} \in P (X)

(see Definition 2). A continuity argument is required for passing

Q_{X}

from

P_{n} (X)

to

P (X)

. For completeness, we supply the proof in Appendix B. Note that different from

E (Q_{X}, W_{Y | X}, \cdot)

, the function

E^{c} (Q_{X}, W_{Y | X}, \cdot)

has been completely characterized.

Proposition 5.

$E (W_{Y | X}, R) = \sup_{Q_{X}} E (Q_{X}, W_{Y | X}, R)$ ,
$E^{c} (W_{Y | X}, R) = \inf_{Q_{X}} E^{c} (Q_{X}, W_{Y | X}, R)$ .

Remark 4.

In view of the fact that

E^{c} (Q_{X}, W_{Y | X}, R)

is a continuous function of

Q_{X}

defined on a compact set, we can replace “inf" with “min" in the above equation, i.e.,

\begin{matrix} E^{c} (W_{Y | X}, R) = \min_{Q_{X}} E^{c} (Q_{X}, W_{Y | X}, R) . \end{matrix}

(13)

Proof.

It is obvious that

E (W_{Y | X}, R) \geq \sup_{Q_{X}} E (Q_{X}, W_{Y | X}, R)

; the other direction follows from the fact that every block code

C_{n}

contains a constant composition code

C_{n}^{'}

with

P_{e, \max} (C_{n}^{'}, W_{Y | X}) \leq P_{e, \max} (C_{n}, W_{Y | X})

and

R (C_{n}^{'}) \geq R (C_{n}) - | X | \frac{\log (n + 1)}{n}

. Similarly, it is clear that

E^{c} (W_{Y | X}, R) \leq \inf_{Q_{X}} E^{c} (Q_{X}, W_{Y | X}, R)

; the other direction follows from the fact that given any block code

C_{n}

, one can construct a constant composition code

C_{n}^{'}

with

P_{c} (C_{n}^{'}, W_{Y | X}) \leq {(n + 1)}^{| X |} P_{c} (C_{n}, W_{Y | X})

and

R (C_{n}^{'}) = R (C_{n})

[9]. ☐

The expurgated exponent, random coding exponent, and sphere packing exponent of channel

W_{Y | X}

for general block codes are defined as follows:

expurgated exponent

$\begin{matrix} E_{e x} (W_{Y | X}, R) = \max_{Q_{X}} E_{e x} (Q_{X}, W_{Y | X}, R), \end{matrix}$

(14)
random coding exponent

$\begin{matrix} E_{r c} (W_{Y | X}, R) = \max_{Q_{X}} E_{r c} (Q_{X}, W_{Y | X}, R), \end{matrix}$

(15)
sphere packing exponent

$\begin{matrix} E_{s p} (W_{Y | X}, R) = \max_{Q_{X}} E_{s p} (Q_{X}, W_{Y | X}, R) . \end{matrix}$

(16)

Let

R_{s p}^{\infty} (W_{Y | X})

be the smallest R to the right of which

E_{s p} (W_{Y | X}, R)

is finite. It is known ([7], Exercise 5.3) and [10] that

\begin{matrix} R_{s p}^{\infty} (W_{Y | X}) & = \max_{Q_{X}} R_{s p}^{\infty} (Q_{X}, W_{Y | X}) \\ = - \log [\min_{Q_{X}} \max_{y} \sum_{x \in X : W_{Y | X} (y | x) > 0} Q_{X} (x)] . \end{matrix}

By Propositions 4 and 5, we recover the following well-known result [7,10]:

\begin{matrix} \max {E_{e x} (W_{Y | X}, R), E_{r c} (W_{Y | X}, R)} \leq E (W_{Y | X}, R) \leq E_{s p} (W_{Y | X}, R) \end{matrix}

(17)

with the possible exception of

R = R_{s p}^{\infty} (W_{Y | X})

at which point the second inequality in (17) not necessarily holds.

Now we proceed to review the results on the reliability function of fixed-rate Slepian-Wolf coding. A fixed-rate Slepian-Wolf code

ϕ_{n} (\cdot)

is a mapping from

X^{n}

to a set

A_{n}

. The rate of

ϕ_{n} (\cdot)

is defined as

\begin{matrix} R (ϕ_{n}) = \frac{1}{n} \log | A_{n} | . \end{matrix}

Given

ϕ_{n} (X^{n})

and

Y^{n}

, the output of the optimal maximum a posteriori (MAP) decoder is

\begin{matrix} {\hat{X}}^{n} & = \arg \min_{x^{n} : ϕ_{n} (x^{n}) = ϕ_{n} (X^{n})} - \sum_{i = 1}^{n} \log P_{X | Y} (x_{i} | Y_{i}) \\ = \arg \min_{x^{n} : ϕ_{n} (x^{n}) = ϕ_{n} (X^{n})} - \sum_{i = 1}^{n} \log P_{X Y} (x_{i}, Y_{i}), \end{matrix}

where the ties are broken in an arbitrary manner. The decoding error probability of Slepian-Wolf code

ϕ_{n} (\cdot)

is defined as

\begin{matrix} P_{e} (ϕ_{n}, P_{X Y}) = \Pr {{\hat{X}}^{n} \neq X^{n}} . \end{matrix}

The correct decoding probability of Slepian-Wolf code

ϕ_{n} (\cdot)

is defined as

\begin{matrix} P_{c} (ϕ_{n}, P_{X Y}) = 1 - P_{e} (ϕ_{n}, P_{X Y}) . \end{matrix}

Definition 3.

Given a joint probability distribution

P_{X Y}

, we say that an error exponent

E \geq 0

is achievable with fixed-rate Slepian-Wolf codes at rate R if for any

δ > 0

, there exists a sequence of fixed-rate Slepian-Wolf codes

{ϕ_{n}}

such that

\begin{matrix} \underset{n \to \infty}{\lim \sup} R (ϕ_{n}) \leq R + δ, \\ \underset{n \to \infty}{\lim \sup} - \frac{1}{n} \log P_{e} (ϕ_{n}, P_{X Y}) \geq E - δ . \end{matrix}

The largest achievable error exponent at rate R is denoted by

E_{f} (P_{X Y}, R)

. The function

E_{f} (P_{X Y}, \cdot)

is referred to as the reliability function of fixed-rate Slepian-Wolf coding.

Similarly, we say that a correct decoding exponent

E^{c} \geq 0

is achievable with fixed-rate Slepian-Wolf codes at rate R if for any

δ > 0

, there exists a sequence of fixed-rate Slepian-Wolf codes

{ϕ_{n}}

such that

\begin{matrix} \underset{n \to \infty}{\lim \sup} R (ϕ_{n}) \leq R + δ, \\ \underset{n \to \infty}{\lim \inf} - \frac{1}{n} \log P_{c} (ϕ_{n}, P_{X Y}) \leq E^{c} + δ . \end{matrix}

The smallest achievable correct decoding exponent at rate R is denoted by

E_{f}^{c} (P_{X Y}, R)

. It will be seen that

E_{f}^{c} (P_{X Y}, R)

is positive if and only if

R < H (X | Y)

. Therefore, we shall refer to the function

E_{f}^{c} (P_{X Y}, \cdot)

as the reliability function of fixed-rate Slepian-Wolf coding below the Slepian-Wolf limit.

The expurgated exponent, random coding scheme, and sphere packing exponent of fixed-rate Slepian-Wolf coding are defined as follows:

expurgated exponent

$\begin{matrix} E_{f, e x} (P_{X Y}, R) = \min_{Q_{X}} [D (Q_{X} ∥ P_{X}) + E_{e x} (Q_{X}, P_{Y | X}, H (Q_{X}) - R)], \end{matrix}$

(18)
random coding exponent

$\begin{matrix} E_{f, r c} (P_{X Y}, R) = \min_{Q_{X}} [D (Q_{X} ∥ P_{X}) + E_{r c} (Q_{X}, P_{Y | X}, H (Q_{X}) - R)], \end{matrix}$

(19)
sphere packing exponent

$\begin{matrix} E_{f, s p} (P_{X Y}, R) = \min_{Q_{X}} [D (Q_{X} ∥ P_{X}) + E_{s p} (Q_{X}, P_{Y | X}, H (Q_{X}) - R)] . \end{matrix}$

(20)

Equivalently, the random coding exponent and sphere packing exponent of fixed-rate Slepian-Wolf coding can be written as [11]:

\begin{matrix} E_{f, r c} (P_{X Y}, R) = \max_{0 \leq ρ \leq 1} \{- \log \sum_{y} {[\sum_{x} P_{X Y} {(x, y)}^{\frac{1}{1 + ρ}}]}^{1 + ρ} + ρ R\}, \\ E_{f, s p} (P_{X Y}, R) = sup_{ρ > 0} \{- \log \sum_{y} {[\sum_{x} P_{X Y} {(x, y)}^{\frac{1}{1 + ρ}}]}^{1 + ρ} + ρ R\} . \end{matrix}

To see the connection between the random coding exponent and the sphere packing exponent, we shall write them in the following parametric forms [11]:

\begin{matrix} R = H (X^{(ρ)} | Y^{(ρ)}), \\ E_{f, s p} (P_{X Y}, R) = D (P_{X^{(ρ)} Y^{(ρ)}} ∥ P_{X Y}), \end{matrix}

and

\begin{matrix} E_{f, r c} (P_{X Y}, R) = \{\begin{matrix} D (P_{X^{(ρ)} Y^{(ρ})} ∥ P_{X Y}) & if {H (X | Y) \leq R \leq H (X_{ρ} | Y_{ρ})|}_{ρ = 1}, \\ - \log \sum_{y} {[\sum_{x} \sqrt{P_{X Y} (x, y)}]}^{2} + R & if R > {H (X^{(ρ)} | Y^{(ρ)})|}_{ρ = 1}, \end{matrix} \end{matrix}

where the joint distribution of

(X^{(ρ)}, Y^{(ρ)})

is

P_{X^{(ρ)} Y^{(ρ)}}

, which is specified by

\begin{matrix} P_{Y^{(ρ)}} (y) = \frac{P_{Y} (y) {[\sum_{x} P_{X | Y} {(x | y)}^{\frac{1}{1 + ρ}}]}^{1 + ρ}}{\sum_{y^{'}} P_{Y} (y^{'}) {[\sum_{x} P_{X | Y} {(x | y^{'})}^{\frac{1}{1 + ρ}}]}^{1 + ρ}}, y \in Y, \end{matrix}

(21)

\begin{matrix} P_{X^{(ρ)} | Y^{(ρ)}} (x | y) = \frac{P_{X | Y} {(x | y)}^{\frac{1}{1 + ρ}}}{\sum_{x^{'}} P_{X | Y} {(x^{'} | y)}^{\frac{1}{1 + ρ}}}, x \in X, y \in Y . \end{matrix}

(22)

Define the critical rate

\begin{matrix} R_{f, c r} (P_{X Y}) = {H (X^{(ρ)} | Y^{(ρ)})|}_{ρ = 1} . \end{matrix}

Note that

E_{r c} (P_{X Y}, R)

and

E_{s p} (P_{X Y}, R)

coincide when

R \in [H (X | Y), R_{f, c r} (P_{X Y})]

. Let

R_{f, s p}^{\infty} (P_{X Y}) = \sup {R : E_{f, s p} (P_{X Y}, R) < \infty}

. It is shown in [12] that

\begin{matrix} R_{f, s p}^{\infty} (P_{X Y}) = \max_{y} \log | {x \in X : P_{X | Y} (x | y) > 0} | . \end{matrix}

It is well known [8,11,13] that the reliability function

E_{f} (P_{X Y}, \cdot)

is upper-bounded by

E_{f, s p} (P_{X Y}, \cdot)

and lower-bounded by

E_{f, r c} (P_{X Y}, \cdot)

and

E_{f, e x} (P_{X Y}, \cdot)

, i.e.,

\begin{matrix} \max {E_{f, r c} (P_{X Y}, R), E_{f, e x} (P_{X Y}, R)} \leq E_{f} (P_{X Y}, R) \leq E_{f, s p} (P_{X Y}, R) \end{matrix}

(23)

with the possible exception of

R = R_{f, s p}^{\infty} (P_{X Y})

at which point the second inequality in (23) not necessarily holds. Note that

E_{f} (P_{X Y}, R)

is completely characterized for

R \in [H (X | Y), R_{f, c r} (P_{X Y})]

.

Unlike

E_{f} (P_{X Y}, \cdot)

, the function

E_{f}^{c} (P_{X Y}, \cdot)

has been characterized for all R. Specifically, it is shown in [6,14] that

\begin{matrix} E_{f}^{c} (P_{X Y}, R) = \min_{Q_{X}} [D (Q_{X} ∥ P_{X}) + E^{c} (Q_{X}, P_{Y | X}, H (Q_{X}) - R)] . \end{matrix}

(24)

Comparing (14) with (18), (15) with (19), (16) with (20), and (13) with (24), one can easily see that there exists an intimate connection between fixed-rate Slepian-Wolf coding for source distribution

P_{X Y}

and channel coding for channel

P_{Y | X}

. This connection can be roughly interpreted as the manifestation of the following facts [15].

Given, for each type $Q_{X} \in P_{n} (X)$ , a constant composition code $C_{n} (Q_{X}) \subseteq T_{n} (Q_{X})$ with $R (C_{n} (Q_{X})) \approx H (Q_{X}) - R$ and $P_{e, \max} (C_{n} (Q_{X}), P_{Y | X}) \approx e^{- n E (Q_{X})}$ , one can use $C_{n} (Q_{X})$ to partition type class $T_{n} (Q_{X})$ into approximately $e^{n R}$ disjoint subsets such that each subset is a constant composition code of type $Q_{X}$ with the maximum decoding error probability over channel $P_{Y | X}$ approximately equal to or less than that of $C_{n} (Q_{X})$ . Note that these partitions, one for each type class, yield a fixed-rate Slepian-Wolf code of rate approximately R with $\Pr {{\hat{X}}^{n} \neq X^{n} | X^{n} \in T_{n} (Q_{X})} ⪅ e^{- n E (Q_{X})}$ . Since $\Pr {X^{n} \in T_{n} (Q_{X})} \approx e^{- n D (Q_{X} ∥ P_{X})}$ (cf. (2) and (3)), it follows that $\Pr {{\hat{X}}^{n} \neq X^{n}, X^{n} \in T_{n} (Q_{X})} ⪅ e^{- n [D (Q_{X} ∥ P_{X}) + E (Q_{X})]}$ . The overall decoding error probability $\Pr {{\hat{X}}^{n} \neq X^{n}}$ of the resulting Slepian-Wolf code can be upper-bounded, on the exponential scale, by $e^{- n [D (Q_{X}^{*} ∥ P_{X}) + E (Q_{X}^{*})]}$ , where $Q_{X}^{*} = \arg \min_{Q_{X}} D (Q_{X} ∥ P_{X}) + E (Q_{X})$ . In contrast, one has the freedom to choose $Q_{X}$ in channel coding, which explains why maximization (instead of minimization) is used in (14)–(16).
Given a fixed-rate Slepian-Wolf code $ϕ_{n} (\cdot)$ with $R (ϕ_{n}) \approx R$ and $P_{e} (ϕ_{n}, P_{X Y}) \approx e^{- n E}$ , one can, for each type $Q_{X} \in P_{n} (X)$ , lift out a constant composition code $C_{n} (Q_{X}) \subseteq T_{n} (Q_{X})$ with $R (C_{n} (Q_{X})) ⪆ H (Q_{X}) - R$ and $P_{e} (C_{n} (Q_{X}), P_{Y | X}) ⪅ e^{- n [E - D (Q_{X} ∥ P_{X})]}$ .
The correct decoding exponents for channel coding and fixed-rate Slepian-Wolf coding can be interpreted in a similar way. Note that in channel coding, to maximize the correct decoding probability one has to minimize the correct decoding exponent; this is why in (13) minimization (instead of maximization) is used.

Therefore, it should be clear that to characterize the reliability functions for channel coding and fixed-rate Slepian-Wolf coding, it suffices to focus on constant composition codes. It will be shown in the next section that a similar reduction holds for variable-rate Slepian-Wolf coding. Indeed, the reliability function for constant component codes plays a predominant role in determining the fundamental rate-error tradeoff in variable-rate Slepian-Wolf coding.

3. Variable-Rate Slepian-Wolf Coding: Above the Slepian-Wolf Limit

A variable-rate Slepian-Wolf code

φ_{n} (\cdot)

is a mapping from

X^{n}

to a binary prefix code

B_{n}

. Let

l (ϕ_{n} (x^{n}))

denote the length of binary string

ϕ_{n} (x^{n})

. The rate of variable-rate Slepian-Wolf code

ϕ_{n} (\cdot)

is defined as

\begin{matrix} R (φ_{n}, P_{X Y}) = \frac{1}{n \log_{2} e} E [l (φ_{n} (X^{n}))] . \end{matrix}

It is worth noting that

R (φ_{n}, P_{X Y})

depends on

P_{X Y}

only through

P_{X}

.

Given

φ_{n} (X^{n})

and

Y^{n}

, the output of the optimal maximum a posteriori (MAP) decoder is

\begin{matrix} {\hat{X}}^{n} & = \arg \min_{x^{n} : φ_{n} (x^{n}) = φ_{n} (X^{n})} - \sum_{i = 1}^{n} \log P_{X | Y} (x_{i} | Y_{i}) \\ = \arg \min_{x^{n} : φ_{n} (x^{n}) = φ_{n} (X^{n})} - \sum_{i = 1}^{n} \log P_{X Y} (x_{i}, Y_{i}), \end{matrix}

where the ties are broken in an arbitrary manner. The decoding error probability of variable-rate Slepian-Wolf code

φ_{n} (\cdot)

is defined as

\begin{matrix} P_{e} (φ_{n}, P_{X Y}) = \Pr {{\hat{X}}^{n} \neq X^{n}} . \end{matrix}

The correct decoding probability of Slepian-Wolf code

φ_{n} (\cdot)

is defined as

\begin{matrix} P_{c} (ϕ_{n}, P_{X Y}) = 1 - P_{e} (φ_{n}, P_{X Y}) . \end{matrix}

Definition 4.

Given a joint probability distribution

P_{X Y}

, we say that an error exponent

E \geq 0

is achievable with variable-rate Slepian-Wolf codes at rate R if for any

δ > 0

, there exists a sequence of variable-rate Slepian-Wolf codes

{φ_{n}}

such that

\begin{matrix} \underset{n \to \infty}{\lim \sup} R (φ_{n}, P_{X Y}) \leq R + δ, \\ \underset{n \to \infty}{\lim \sup} - \frac{1}{n} \log P_{e} (φ_{n}, P_{X Y}) \geq E - δ . \end{matrix}

The largest achievable error exponent at rate R is denoted by

E_{v} (P_{X Y}, R)

. The function

E_{v} (P_{X Y}, \cdot)

is referred to as the reliability function of variable-rate Slepian-Wolf coding.

The power of variable-rate Slepian-Wolf coding results from its flexibility in rate allocation. Since there are only polynomial number of types for any given n (cf. (1)), the encoder can convey the type information to the decoder using negligible amount of rate when n is large enough. Therefore, without loss of much generality, we can assume that the type of

X^{n}

is known to the decoder. Under this assumption, an optimal fixed-rate Slepian-Wolf encoder of rate R should partition

T_{n} (P)

into

\min {| T_{n} (P) |, e^{n R}}

disjoint subsets for each

P \in P_{n}

. It can be seen that the rate allocated to

T_{n} (P)

is always R if

| T_{n} (P) | \geq e^{n R}

. In general, the type

Q_{X}^{*}

that dominates the error probability of fixed-rate Slepian-Wolf coding is different from

P_{X}

. In contrast, for variable-rate Slepian-Wolf coding, we can losslessly compress the sequences of types that are bounded away

P_{X}

by allocating enough rate to those type classes (but its contribution to the overall rate is still negligible since the probability of those type classes are extremely small), and therefore, effectively eliminate the dominant error event in fixed-rate Slepian-Wolf coding. As a consequence, the types that can cause decoding error in variable-rate Slepian-Wolf coding must be very close to

P_{X}

. This is the main intuition underlying the proof of the following theorem. A similar argument has been used in the context of variable-rate Slepian-Wolf coding under mismatched decoding [16].

Theorem 1.

E_{v} (P_{X Y}, R) = E (P_{X}, P_{Y | X}, H (P_{X}) - R)

.

Proof.

The proof is divided into two parts. Firstly, we shall show that

E_{v} (P_{X Y}, R) \geq E (P_{X}, P_{Y | X}, H (P_{X}) - R)

. The main idea is that one can use a constant composition code

C_{n}

of type approximately

P_{X}

and rate approximately

H (P_{X}) - R

to construct a variable-rate Slepian-Wolf code

φ_{n^{'}} (\cdot)

with

n^{'} \approx n

,

R (φ_{n^{'}}, P_{X Y}) \approx R

, and

P_{e} (φ_{n^{'}}, P_{X Y}) \leq P_{e, \max} (C_{n}, P_{Y | X})

.

By Definition 2, for any

δ > 0

, there exists a sequence of constant composition codes

{C_{n}}

with

C_{n} \subseteq T_{n} (P_{n})

for some

P_{n} \in P_{n} (X)

such that

\begin{matrix} \lim_{n \to \infty} ∥ P_{n} - P_{X} ∥ = 0, \\ \underset{n \to \infty}{\lim \inf} R (C_{n}) \geq H (P_{X}) - R - δ, \\ \underset{n \to \infty}{\lim \sup} - \frac{1}{n} \log P_{e, \max} (C_{n}, P_{Y | X}) \geq E (P_{X}, P_{Y | X}, H (P_{X}) - R) - δ . \end{matrix}

Since

P_{X} (x) > 0

for all

x \in X

, we have

\begin{matrix} \max_{P \in P_{n} (X) \cap E (δ)} \max_{x} \frac{P_{n} (x)}{P (x)} \leq {(1 + δ)}^{2} \end{matrix}

for all sufficiently n, where

\begin{matrix} E (δ) = \{P \in P (X) : \max_{x} \frac{P_{X} (x)}{P (x)} \leq 1 + δ, H (P) \leq H (P_{X}) + δ, D (P ∥ P_{X}) \leq δ\} . \end{matrix}

Let

k_{n} = ⌈ {(1 + δ)}^{2} n ⌉

. When n is large enough, we can, for each

P \in P_{k_{n}} (X) \cap E (δ)

, construct a constant composition code

C_{k_{n}}^{'} (P)

of length

k_{n}

and type P by appending a fixed sequence in

X^{k_{n} - n}

to each codeword in

C_{n}

. It is easy to see that

\begin{matrix} | C_{k_{n}}^{'} (P) | = | C_{n} |, \end{matrix}

(25)

\begin{matrix} P_{e, \max} (C_{k_{n}}^{'} (P), P_{Y | X}) = P_{e, \max} (C_{n}, P_{Y | X}) \end{matrix}

(26)

for all

P \in P_{k_{n}} (X) \cap E (δ)

. One can readily show by invoking the covering lemma in [17] that for each

P \in P_{k_{n}} (X) \cap E (δ)

, there exist

L (k_{n})

permutations

π_{1}, \dots, π_{L (k_{n})}

of the integers

1, \dots, k_{n}

such that

\begin{matrix} ⋃_{i = 1}^{L (k_{n})} π_{i} (C_{k_{n}}^{'} (P)) = T_{k_{n}} (P), \end{matrix}

where

\begin{matrix} L (k_{n}) ≜ \max_{P \in P_{k_{n}} (X) \cap E (δ)} ⌊| C_{k_{n}}^{'} {(P) |}^{- 1} | T_{k_{n}} (P) | \log | T_{k_{n}} (P) | + 1⌋ . \end{matrix}

In view of (25), we can rewrite

L (k_{n})

as

\begin{matrix} L (k_{n}) = \max_{P \in P_{k_{n}} (X) \cap E (δ)} ⌊| C_{n} |^{- 1} | T_{k_{n}} (P) | \log | T_{k_{n}} (P) | + 1⌋ . \end{matrix}

Note that

\begin{matrix} P_{e, \max} (π_{i} (C_{k_{n}}^{'} (P)), P_{Y | X}) = P_{e, \max} (C_{k_{n}}^{'} (P), P_{Y | X}), i = 1, 2, \dots, L (k_{n}) . \end{matrix}

(27)

Given

π_{1} (C_{k_{n}}^{'} (P)), \dots, π_{L (k_{n})} (C_{k_{n}}^{'} (P))

, we can partition

T_{k_{n}} (P)

into

L (k_{n})

disjoint subsets:

\begin{matrix} T_{k_{n}} (P, 1) = π_{1} (C_{k_{n}}^{'} (P)), \\ T_{k_{n}} (P, i) = π_{i} (C_{k_{n}}^{'} (P)) ∖⋃_{j = 1}^{i - 1} π_{i} (C_{k_{n}}^{'} (P)), i = 2, \dots, L (k_{n}) . \end{matrix}

It is clear that

\begin{matrix} P_{e, \max} (T_{k_{n}} (P, i), P_{Y | X}) \leq P_{e, \max} (π_{i} (C_{k_{n}}^{'} (P)), P_{Y | X}), i = 1, 2, \dots, L (k_{n}) . \end{matrix}

(28)

Now construct a sequence of variable-rate Slepian-Wolf codes

{ϕ_{k_{n}} (\cdot)}

as follows.

The encoder sends the type of $x^{k_{n}}$ to the decoder, where each type is uniquely represented by a binary sequence of length $m_{1} (k_{n})$ .
If $x^{k_{n}} \in T_{k_{n}} (P)$ for some $P \notin E (δ)$ , the encoder sends $x^{k_{n}}$ losslessly to the decoder, where each $x^{k_{n}} \in T_{k_{n}} (P)$ is uniquely represented by a binary sequence of length $m_{2} (k_{n})$ .
If $x^{k_{n}} \in T_{k_{n}} (P)$ for some $P \in E (δ)$ , the encoder finds the set $π_{i^{*}} (C_{k_{n}}^{'} (P))$ that contains $x^{k_{n}}$ and sends the index $i^{*}$ to the decoder, where each index in ${1, 2, \dots, L (k_{n})}$ is uniquely represented by a binary sequence of length $m_{3} (k_{n})$ .

Specifically, we choose

\begin{matrix} m_{1} (k_{n}) = ⌈ \log_{2} | P_{k_{n}} (X) | ⌉, \\ m_{2} (k_{n}) = \max_{P \in P_{k_{n}} (X)} ⌈ \log_{2} | T_{k_{n}} (P) | ⌉, \\ m_{3} (k_{n}) = ⌈ \log_{2} L (k_{n}) ⌉ . \end{matrix}

Note that

\begin{matrix} R (φ_{k_{n}}, P_{X Y}) = \frac{m_{1} (k_{n}) + (1 - θ) m_{2} (k_{n}) + θ m_{3} (k_{n})}{k_{n} \log_{2} e}, \end{matrix}

where

\begin{matrix} θ = \sum_{P \in P_{k_{n}} (X) \cap E (δ)} \Pr {X^{k_{n}} \in T_{k_{n}} (P)} . \end{matrix}

It is easy to verify (cf. (1)–(3)) that

\begin{matrix} m_{1} (k_{n}) \leq | X | \log_{2} (k_{n} + 1) + 1, \\ m_{2} (k_{n}) \leq k_{n} \log_{2} | X | + 1, \\ 1 - θ \leq {(k_{n} + 1)}^{| X |} e^{- k_{n} δ} . \end{matrix}

Therefore, we have

\begin{matrix} \underset{n \to \infty}{\lim \sup} R (ϕ_{k_{n}}, P_{X Y}) & = \underset{n \to \infty}{\lim \sup} \frac{m_{3} (k_{n})}{k_{n} \log_{2} e} \\ \leq \max_{P \in E (δ)} H (P) - \frac{1}{{(1 + δ)}^{2}} \underset{n \to \infty}{lim inf} R (C_{n}) \\ \leq H (P_{X}) + δ - \frac{H (P_{X}) - R - δ}{{(1 + δ)}^{2}} . \end{matrix}

(29)

By (26)–(28) and the construction of

φ_{k_{n}} (\cdot)

, it is clear that

\begin{matrix} P_{e} (φ_{k_{n}}, P_{X Y}) & = \sum_{P \in P_{k_{n}} (X) \cap E (δ)} \sum_{i = 1}^{L (k_{n})} \Pr {X^{k_{n}} \in T_{k_{n}} (P, i)} \Pr {{\hat{X}}^{n} \neq X^{n} | X^{k_{n}} \in T_{k_{n}} (P, i)} \\ \leq \sum_{P \in P_{k_{n}} (X) \cap E (δ)} \sum_{i = 1}^{L (k_{n})} \Pr {X^{k_{n}} \in T_{k_{n}} (P, i)} P_{e, \max} (T_{k_{n}} (P, i), P_{Y | X}) \\ \leq \sum_{P \in P_{k_{n}} (X) \cap E (δ)} \sum_{i = 1}^{L (k_{n})} \Pr {X^{k_{n}} \in T_{k_{n}} (P, i)} P_{e, \max} (π_{i} (C_{k_{n}}^{'} (P)), P_{Y | X}) \\ = \sum_{P \in P_{k_{n}} (X) \cap E (δ)} \sum_{i = 1}^{L (k_{n})} \Pr {X^{k_{n}} \in T_{k_{n}} (P, i)} P_{e, \max} (C_{n}, P_{Y | X}) \\ \leq P_{e, \max} (C_{n}, P_{Y | X}), \end{matrix}

which implies

\begin{matrix} \underset{n \to \infty}{\lim \sup} - \frac{1}{k_{n}} \log P_{e} (φ_{k_{n}}, P_{X Y}) & \geq \underset{n \to \infty}{\lim \sup} - \frac{1}{k_{n}} \log P_{e, \max} (C_{n}, P_{Y | X}) \\ \geq \frac{E (P_{X}, P_{Y | X}, H (P_{X}) - R) - δ}{{(1 + δ)}^{2}} . \end{matrix}

(30)

In view of (29), (30), and the fact that

δ > 0

is arbitrary, we must have

E_{v} (P_{X Y}, R) \geq E (P_{X}, P_{Y | X}, H (P_{X}) - R)

(cf. Definition 4).

Now we proceed to show that

E_{v} (P_{X Y}, R) \leq E (P_{X}, P_{Y | X}, H (P_{X}) - R)

. The main idea is that one can extract a constant composition code of type approximately

P_{X}

and rate approximately

H (X) - R

or greater from a given variable-rate Slepian-Wolf code

φ_{n} (\cdot)

of rate approximately R such that the average decoding error probability of this constant composition code over channel

P_{Y | X}

is bounded from above by

γ P_{e} (φ_{n}, P_{X Y})

, where

γ

is a constant that does not depend on n.

By Definition 4, for any

δ > 0

, there exists a sequence of variable-rate Slepian-Wolf codes

{φ_{n}}

such that

\begin{matrix} \underset{n \to \infty}{\lim \sup} R (φ_{n}, P_{X Y}) \leq R + δ, \end{matrix}

(31)

\begin{matrix} \underset{n \to \infty}{\lim \sup} - \frac{1}{n} \log P_{e} (φ_{n}, P_{X Y}) \geq E_{v} (P_{X Y}, R) - δ . \end{matrix}

(32)

Suppose

φ_{n} (\cdot)

induces a partition of

T_{n} (P)

,

P \in P_{n} (X)

, into

N_{n} (P)

disjoint subsets

T_{n} (P, 1), \dots, T_{n} (P, N_{n} (P))

. Here the partition is defined as follows:

φ_{n} (x^{n}) = φ_{n} ({\tilde{x}}^{n})

if

x^{n}, {\tilde{x}}^{n} \in T_{n} (P, i)

for some i, and

φ_{n} (x^{n}) \neq φ_{n} ({\tilde{x}}^{n})

if

x^{n} \in T_{n} (P, i), {\tilde{x}}^{n} \in T_{n} (P, j)

for

i \neq j

. Let

\begin{matrix} r (T_{n} (P)) = \frac{1}{n \log_{2} e} E [l (φ_{n} (X^{n})) | X^{n} \in T_{n} (P)], P \in P_{n} (X) . \end{matrix}

It follows from the source coding theorem that

\begin{matrix} r (T_{n} (P)) & \geq \frac{1}{n} \sum_{i = 1}^{N_{n} (P)} \frac{| T_{n} (P, i) |}{| T_{n} (P) |} \log \frac{| T_{n} (P) |}{| T_{n} (P, i) |} . \end{matrix}

(33)

Define

\begin{matrix} F_{n} (δ) = \{(P, i) : \frac{1}{n} \log \frac{| T_{n} (P) |}{| T_{n} (P, i) |} \leq R + 2 δ, P \in P_{n} (X), i = 1, 2, \dots, N_{n} (P)\}, \\ F_{n}^{c} (δ) = \{(P, i) \notin F_{n} (δ) : P \in P_{n} (X), i = 1, 2, \dots, N_{n} (P)\}, \\ G_{n} (γ) = \{(P, i) : \Pr {{\hat{X}}^{n} \neq X^{n} | X^{n} \in T_{n} (P, i)} \leq γ P_{e} (φ_{n}, P_{X Y}), P \in P_{n} (X), i = 1, 2, \dots, N_{n} (P)\}, \\ G_{n}^{c} (γ) = \{(P, i) \notin G_{n} (γ) : P \in P_{n} (X), i = 1, 2, \dots, N_{n} (P)\}, \end{matrix}

where

\begin{matrix} γ > \frac{R + 2 δ}{δ} . \end{matrix}

(34)

Note that

\begin{matrix} R (φ_{n}, P_{X Y}) & = \sum_{P \in P_{n} (X)} \Pr {X^{n} \in T_{n} (P)} r (T_{n} (P)) \\ (35) & \geq \frac{1}{n} \sum_{P \in P_{n} (X)} \sum_{i = 1}^{N_{n} (P)} \Pr {X^{n} \in T_{n} (P, i)} \log \frac{| T_{n} (P) |}{| T_{n} (P, i) |} \\ \geq \frac{1}{n} \sum_{(P, i) \in F_{n}^{c} (δ)} \Pr {X^{n} \in T_{n} (P, i)} \log \frac{| T_{n} (P) |}{| T_{n} (P, i) |} \\ (36) & \geq (R + 2 δ) \sum_{(P, i) \in F_{n}^{c} (δ)} \Pr {X^{n} \in T_{n} (P, i)}, \end{matrix}

where (35) is due to (33). Combing (31) and (36) yields

\begin{matrix} \underset{n \to \infty}{\lim \sup} \sum_{(P, i) \in F_{n}^{c} (δ)} \Pr {X^{n} \in T_{n} (P, i)} \leq \frac{R + δ}{R + 2 δ} . \end{matrix}

(37)

Moreover, we have

\begin{matrix} \sum_{(P, i) \in G_{n}^{c} (γ)} \Pr {X^{n} \in T_{n} (P, i)} \leq \frac{1}{γ} \end{matrix}

(38)

since otherwise

\begin{matrix} P_{e} (φ_{n}, P_{X Y}) & = \sum_{P \in P_{n} (X)} \sum_{i = 1}^{N_{n} (P)} \Pr {X^{n} \in T_{n} (P, i)} \Pr {{\hat{X}}^{n} \neq X^{n} | X^{n} \in T_{n} (P, i)} \\ \geq \sum_{(P, i) \in G_{n}^{c} (γ)} \Pr {X^{n} \in T_{n} (P, i)} \Pr {{\hat{X}}^{n} \neq X^{n} | X^{n} \in T_{n} (P, i)} \\ > γ P_{e} (φ_{n}, P_{X Y}) \sum_{(P, i) \in G_{n}^{c} (γ)} \Pr {X^{n} \in T_{n} (P, i)} \\ \geq P_{e} (φ_{n}, P_{X Y}), \end{matrix}

which is absurd.

Define

\begin{matrix} S_{n} (δ) = \{P \in P_{n} (X) : H (P) \geq H (P_{X}) - δ, \max_{x} \frac{P (x)}{P_{X} (x)} \leq 1 + δ\}, \\ S_{n}^{c} (δ) = P_{n} (X) ∖ S_{n} (δ), \\ D_{n} (δ, γ) = \{(P, i) : (P, i) \in F_{n} (δ) \cap G_{n} (γ), P \in S_{n} (δ)\}, \\ D_{n}^{c} (δ, γ) = \{(P, i) \notin D_{n} (δ, γ) : P \in P_{n} (X), i = 1, 2, \dots, N_{n} (P)\} . \end{matrix}

It follows from the weak law of large numbers that

\begin{matrix} \lim_{n \to \infty} \sum_{P \in S_{n}^{c} (δ)} \Pr {X^{n} \in T_{n} (P)} = 0 . \end{matrix}

(39)

We have

\begin{matrix} \underset{n \to \infty}{\lim \inf} \sum_{(P, i) \in D_{n} (δ, γ)} \Pr {X^{n} \in T_{n} (P, i)} \\ = \underset{n \to \infty}{\lim \inf} \{1 - \sum_{(P, i) \in D_{n}^{c} (δ, γ)} \Pr {X^{n} \in T_{n} (P, i)}\} \\ \geq \underset{n \to \infty}{\lim \inf} \{1 - \sum_{(P, i) \in F_{n}^{c} (δ)} \Pr {X^{n} \in T_{n} (P, i)} - \sum_{(P, i) \in G_{n}^{c} (γ)} \Pr {X^{n} \in T_{n} (P, i)} \\ - \sum_{P \in S_{n}^{c} (δ)} \Pr {X^{n} \in T_{n} (P)}\} \\ (40) & \geq 1 - \frac{R + δ}{R + 2 δ} - \frac{1}{γ} \\ (41) & > 0, \end{matrix}

where (40) is due to (37)–(39), and (41) is due to (34). Therefore,

D_{n} (δ, γ)

is non-empty for all sufficiently large n. Pick an arbitrary

(P_{n}^{*}, i^{*})

from

D_{n} (δ, γ)

for each sufficiently large n. We can construct a constant composition code

C_{m_{n}}

of length

m_{n} = ⌈ (1 + δ) n ⌉

and type

P_{m_{n}}

for some

P_{m_{n}} \in P_{m_{n}} (X)

by concatenating a fixed sequence in

X^{m_{n} - n}

to each sequence in

T_{n} (P_{n}^{*}, i^{*})

such that

\begin{matrix} \lim_{n \to \infty} ∥ P_{m_{n}} - P_{X} ∥ = 0 . \end{matrix}

(42)

Note that

\begin{matrix} \underset{n \to \infty}{\lim \inf} R (C_{m_{n}}) & = \underset{n \to \infty}{\lim \inf} \frac{1}{m_{n}} \log | T_{n} (P_{n}^{*}, i^{*}) | \\ \geq \underset{n \to \infty}{\lim \inf} \frac{n}{m_{n}} [\frac{1}{n} \log | T_{n} (P_{n}^{*}) | - R - 2 δ] \\ \geq \frac{H (P_{X}) - R - 3 δ}{1 + δ} . \end{matrix}

(43)

Moreover, since

\begin{matrix} P_{e} (C_{m_{n}}, P_{Y | X}) = \Pr {{\hat{X}}^{n} \neq X^{n} | X^{n} \in T_{n} (P_{n}^{*}, i^{*})} \leq γ P_{e} (φ_{n}, P_{X Y}), \end{matrix}

it follows from (32) that

\begin{matrix} \underset{n \to \infty}{\lim \sup} - \frac{1}{m_{n}} \log P_{e} (C_{m_{n}}, P_{Y | X}) \geq \frac{E_{v} (P_{X Y}, R) - δ}{1 + δ} . \end{matrix}

(44)

In view of (42)–(44) and the fact that

δ > 0

is arbitrary, we must have

E_{v} (P_{X Y}, R) \leq E (P_{X}, P_{Y | X}, H (P_{X}) - R)

(cf. Definition 2). The proof is complete. ☐

The following result is an immediate consequence of Theorem 1 and Proposition 4.

Corollary 1.

Define

\begin{matrix} E_{v, e x} (P_{X Y}, R) = E_{e x} (P_{X}, P_{Y | X}, H (P_{X}) - R), \\ E_{v, r c} (P_{X Y}, R) = E_{r c} (P_{X}, P_{Y | X}, H (P_{X}) - R), \\ E_{v, s p} (P_{X Y}, R) = E_{s p} (P_{X}, P_{Y | X}, H (P_{X}) - R) . \end{matrix}

We have

$E_{v} (P_{X Y}, R) \geq \max {E_{v, e x} (P_{X Y}, R), E_{v, r c} (P_{X Y}, R)}$ ;
$E_{v} (P_{X Y}, R) \leq E_{v, s p} (P_{X Y}, R)$ with the possible exception of $R = H (P_{X}) - R_{s p}^{\infty} (P_{X}, P_{Y | X})$ at which point the inequality not necessarily holds.

Remark 5.

We have $E_{v} (P_{X Y}, R) = \infty$ for $R > H (P_{X}) - R_{e x}^{\infty} (P_{X}, P_{Y | X})$ , and $E_{v} (P_{X Y}, R) < \infty$ for $R < H (P_{X}) - R_{s p}^{\infty} (P_{X}, P_{Y | X})$ . Therefore, $H (P_{X}) - R_{e x}^{\infty} (P_{X}, P_{Y | X})$ and $H (P_{X}) - R_{s p}^{\infty} (P_{X}, P_{Y | X})$ are respectively the upper bound and the lower bound on the zero-error rate of variable-rate Slepian-Wolf coding.
In view of (11), we have

$\begin{matrix} E_{v} (P_{X Y}, R) = E_{v, s p} (P_{X Y}, R) = E_{s p} (P_{X}, P_{Y | X}, H (P_{X}) - R) \end{matrix}$

for $R \in [H (X | Y), H (P_{X}) - R_{c r} (P_{X}, P_{Y | X})]$ . Note that

$\begin{matrix} E_{v, s p} (P_{X Y}, R) \geq E_{f, s p} (P_{X Y}, R) \geq E_{f} (P_{X Y}, R), \end{matrix}$

where the first inequality is strict unless the minimum in (20) is achieved at $Q_{X} = P_{X}$ , (i.e., $P_{X^{(ρ)}} = P_{X}$ , where $P_{X^{(ρ)}}$ is the marginal distribution of $X^{(ρ)}$ induced by $P_{Y^{(ρ)}}$ and $P_{X^{(ρ)} | Y^{(ρ)}}$ in (21) and (22)). Therefore, variable-rate Slepian-Wolf coding can outperform fixed-rate Slepian-Wolf coding in terms of rate-error tradeoff.

For

R > H (P_{X}) - R_{c r} (P_{X}, P_{Y | X})

, it is possible to obtain upper bounds on

E_{v} (P_{X Y}, R)

that are tighter than

E_{v, s p} (P_{X Y}, R)

. Let

E_{e x} (P_{Y | X}, R)

and

E_{s p} (P_{Y | X}, R)

be respectively the expurgated exponent and the sphere packing exponent of channel

P_{Y | X}

. The straight-line exponent

E_{s l} (P_{Y | X}, R)

of channel

P_{Y | X}

[10] is the smallest linear function of R which touches the curve

E_{s p} (P_{Y | X}, R)

and also satisfies

\begin{matrix} E_{s l} (P_{Y | X}, 0) = E_{e x} (P_{Y | X}, 0), \end{matrix}

where

E_{e x} (P_{Y | X}, 0)

is assumed to be finite. Let

R_{s l} (P_{Y | X})

be the point at which

E_{s l} (P_{Y | X}, R)

and

E_{s p} (P_{Y | X}, R)

coincide. It is well known [10] that

E (P_{Y | X}, R) \leq E_{s l} (P_{Y | X}, R)

for

R \in (0, R_{s l} (P_{Y | X})]

. Since

E (P_{X}, P_{Y | X}, R) \leq E (P_{Y | X}, R)

, it follows from Theorem 1 that

\begin{matrix} E_{v} (P_{X Y}, R) \leq E_{s l} (P_{Y | X}, H (P_{X}) - R) \end{matrix}

for

R \in [\max {H (P_{X}) - R_{s l} (P_{Y | X}), 0}, H (P_{X}))

.

Note that the straight-line exponent holds for arbitrary block codes; one can obtain further improvement at high rates by leveraging bounds tailored to constant composition codes. Let

E_{e x}^{*} (Q_{X}, P_{Y | X}, 0)

be the concave upper envelope of

E_{e x} (Q_{X}, P_{Y | X}, 0)

considered as a function of

Q_{X}

. In view of ([7], Exercise 5.21), we have

\begin{matrix} E (Q_{X}, P_{Y | X}, R) \leq E_{e x}^{*} (Q_{X}, P_{Y | X}, 0) \end{matrix}

for any

Q_{X} \in P (X)

and

R > 0

. Now it follows from Theorem 1 that

\begin{matrix} E_{v} (P_{X Y}, R) \leq E_{e x}^{*} (P_{X}, P_{Y | X}, 0) \end{matrix}

for

R < H (P_{X})

.

The following theorem provides the second order expansion of

E_{v} (P_{X Y}, R)

at the Slepian-Wolf limit.

Theorem 2.

Assuming

R_{c r} (P_{X}, P_{X | Y}) < I (P_{X}, P_{Y | X})

(see Proposition 1 for the necessary and sufficient condition), we have

\begin{matrix} \lim_{r ↓ 0} \frac{E_{v} (P_{X Y}, H (X | Y) + r)}{r^{2}} = \frac{1}{2} {[\sum_{x, y} P_{X Y} (x, y) τ^{2} (x, y) - \sum_{x} P_{X} (x) {(\sum_{y} τ (x, y) P_{Y | X} (y | x))}^{2}]}^{- 1}, \end{matrix}

where

τ (x, y) = \log P_{Y} (y) - \log P_{Y | X} (y | x)

.

Remark 6.

If

R_{c r} (P_{X}, P_{Y | X}) = I (P_{X}, P_{Y | X})

, then we have

E_{v, r c} (P_{X Y}, R) = R - H (X | Y)

for

R \geq H (X | Y)

, which implies

\begin{matrix} \lim_{r ↓ 0} \frac{E_{v} (P_{X Y}, H (X | Y) + r)}{r^{2}} = \infty . \end{matrix}

It is also worth noting that the second order expansion of

E_{v} (P_{X Y}, R)

at the Slepian-Wolf limit yields the redundancy-error tradeoff constant of variable-rate Slepian-Wolf coding derived in [18].

Proof.

Since

R_{c r} (P_{X}, P_{X | Y}) < I (P_{X}; P_{Y | X})

, it follows that

H (X | Y) + r \in (H (X | Y), H (P_{X}) - R_{c r} (P_{X}, P_{Y | X}))

when r

(r > 0)

is sufficiently close to zero. In this case, we have

\begin{matrix} \frac{E_{v} (P_{X Y}, H (X | Y) + r)}{r^{2}} & = \frac{E_{s p} (P_{X}, P_{Y | X}, I (P_{X}, P_{Y | X}) - r)}{r^{2}} \\ = \min_{Q_{Y | X} : I (P_{X}, Q_{Y | X}) \leq I (P_{X}, P_{Y | X}) - r} \frac{D (Q_{Y | X} ∥ P_{Y | X} | P_{X})}{r^{2}} \\ = \min_{Q_{Y | X} : I (P_{X}, Q_{Y | X}) = I (P_{X}, P_{Y | X}) - r} \frac{D (Q_{Y | X} ∥ P_{Y | X} | P_{X})}{r^{2}}, \end{matrix}

where the last equality follows from the fact that

E_{s p} (P_{X}, P_{Y | X}, R)

is a strictly decreasing convex function of R for

R \in (R_{s p}^{\infty} (P_{X}, P_{Y | X}), I (P_{X}, P_{Y | X})]

.

Let

Δ (x, y) = Q_{Y | X} (y | x) - P_{Y | X} (y | x)

for

x \in X

,

y \in Y

. Let

Δ (y) = \sum_{x} P_{X} (x) Δ (x, y)

for

y \in Y

. By the Taylor expansion,

\begin{matrix} I (P_{X}, Q_{Y | X}) & = \sum_{x, y} P_{X} (x) (P_{Y | X} (y | x) + Δ (x, y)) \log (P_{Y | X} (y | x) + Δ (x, y)) \\ - \sum_{y} (P_{Y} (y) + Δ (y)) \log (P_{Y} (y) + Δ (y)) \\ = \sum_{x, y} P_{X} (x) (P_{Y | X} (y | x) + Δ (x, y)) (\log P_{Y | X} (y | x) + \frac{Δ (x, y)}{P_{Y | X} (y | x)} + o (Δ (x, y))) \\ - \sum_{y} (P_{Y} (y) + Δ (y)) (\log P_{Y} (y) + \frac{Δ (y)}{P_{Y} (y)} + o (Δ (y))) \\ = I (P_{X}, P_{Y | X}) - \sum_{y} (Δ (y) + Δ (y) \log P_{Y} (y) + o (Δ_{y})) \\ + \sum_{x, y} P_{X} (x) (Δ (x, y) + Δ (x, y) \log P_{Y | X} (y | x) + o (Δ (x, y))) \end{matrix}

and

\begin{matrix} D (Q_{Y | X} ∥ P_{Y | X} | P_{X}) & = \sum_{x, y} P_{X} (x) Q_{Y | X} (y | x) \log \frac{Q_{Y | X} (y | x)}{P_{Y | X} (y | x)} \\ = \sum_{x, y} P_{X} (x) (P_{Y | X} (y | x) + Δ (x, y)) \log (1 + \frac{Δ (x, y)}{P_{Y | X} (y | x)}) \\ = \sum_{x, y} P_{X} (x) (P_{Y | X} (y | x) + Δ (x, y)) (\frac{Δ (x, y)}{P_{Y | X} (y | x)} - \frac{Δ^{2} (x, y)}{2 P_{Y | X}^{2} (y | x)} + o (Δ^{2} (x, y))) \\ = \sum_{x, y} P_{X} (x) (\frac{Δ^{2} (x, y)}{2 P_{Y | X} (y | x)} + o (Δ^{2} (x, y))) . \end{matrix}

Here

f (z) = o (z)

means

\lim_{z \to 0} \frac{f (z)}{z} = 0

.

As

r ↓ 0

, we have

Δ (y) \to 0

,

Δ (x, y) \to 0

for all

x \in X, y \in Y

. Therefore, by ignoring the high order terms which do not affect the limit, we get

\begin{matrix} \lim_{r ↓ 0} \frac{E_{v} (P_{X Y}, H (X | Y) + r)}{r^{2}} = \lim_{r ↓ 0} \min \sum_{x, y} \frac{P_{X} (x) Δ^{2} (x, y)}{2 P_{Y | X} (y | x) r^{2}}, \end{matrix}

(45)

where the minimization is over

Δ (x, y)

(

x \in X, y \in Y

) subject to the constraints

$\sum_{y} Δ (x, y) = 0$ for all $x \in X$ ,
$\sum_{x, y} P_{X} (x) τ (x, y) Δ (x, y) = r$ .

Introduce the Lagrange multipliers

α (x)

(x \in X)

and

β

for these constraints, and define

\begin{matrix} G = \sum_{x, y} \frac{P_{X} (x) Δ^{2} (x, y)}{2 P_{Y | X} (y | x)} - \sum_{x, y} α (x) Δ (x, y) - β \sum_{x, y} P_{X} (x) τ (x, y) Δ (x, y) . \end{matrix}

The Karush-Kuhn-Tucker conditions yield

\begin{matrix} \frac{\partial G}{\partial Δ (x, y)} = - α (x) - β P_{X} (x) τ (x, y) + \frac{P_{X} (x) Δ (x, y)}{P_{Y | X} (y | x)} = 0, x \in X, y \in Y . \end{matrix}

Therefore, we have

\begin{matrix} Δ (x, y) = β τ (x, y) P_{Y | X} (y | x) + \frac{P_{Y | X} (y | x)}{P_{X} (x)} α (x) . \end{matrix}

(46)

Substituting (46) into constraint 1, we obtain

\begin{matrix} α (x) = - β P_{X} (x) \sum_{y} τ (x, y) P_{Y | X} (y | x), \end{matrix}

which, together with (46), yields

\begin{matrix} Δ (x, y) = β τ (x, y) P_{Y | X} (y | x) - β P_{Y | X} (y | x) \sum_{y^{'}} τ (x, y^{'}) P_{Y | X} (y^{'} | x) . \end{matrix}

(47)

Therefore, we have

\begin{matrix} \sum_{x, y} \frac{P_{X} (x) Δ^{2} (x, y)}{2 P_{Y | X} (y | x)} \\ = \frac{β^{2}}{2} \sum_{x, y} P_{X Y} (x, y) {[τ (x, y) - \sum_{y^{'}} τ (x, y^{'}) P_{Y | X} (y^{'} | x)]}^{2} \\ = \frac{β^{2}}{2} \sum_{x, y} P_{X Y} (x, y) [τ^{2} (x, y) - 2 τ (x, y) \sum_{y^{'}} τ (x, y^{'}) P_{Y | X} (y^{'} | x) + {(\sum_{y^{'}} τ (x, y^{'}) P_{Y | X} (y^{'} | x))}^{2}] \\ = \frac{β^{2}}{2} [\sum_{x, y} P_{X Y} (x, y) τ^{2} (x, y) - \sum_{x} P_{X} (x) {(\sum_{y} τ (x, y) P_{Y | X} (y | x))}^{2}] . \end{matrix}

(48)

Constraint 2 and (47) together yield

\begin{matrix} \frac{r^{2}}{β^{2}} & = \frac{1}{β^{2}} {(\sum_{x, y} P_{X} (x) τ (x, y) Δ (x, y))}^{2} \\ = {[\sum_{x, y} P_{X} (x) τ (x, y) (τ (x, y) P_{Y | X} (y | x) - P_{Y | X} (y | x) \sum_{y^{'}} τ (x, y^{'}) P_{Y | X} (y^{'} | x))]}^{2} \\ = {[\sum_{x, y} P_{X Y} (x, y) τ^{2} (x, y) - \sum_{x} P_{X} (x) {(\sum_{y} τ (x, y) P_{Y | X} (y | x))}^{2}]}^{2} . \end{matrix}

(49)

The proof is complete by substituting (48) and (49) back into (45). ☐

4. Variable-Rate Slepian-Wolf Coding: Below the Slepian-Wolf Limit

Definition 5.

Given a joint probability distribution

P_{X Y}

, we say that a correct decoding exponent

E^{c} \geq 0

is achievable with variable-rate Slepian-Wolf codes at rate R if for any

δ > 0

, there exists a sequence of variable-rate Slepian-Wolf codes

{φ_{n}}

such that

\begin{matrix} \underset{n \to \infty}{\lim \sup} R (φ_{n}, P_{X Y}) \leq R + δ, \\ \underset{n \to \infty}{\lim \inf} - \frac{1}{n} \log P_{c} (φ_{n}, P_{X Y}) \leq E^{c} + δ . \end{matrix}

The smallest achievable correct decoding exponent at rate R is denoted by

E_{v}^{c} (P_{X Y}, R)

.

In view of Theorem 1, it is tempting to conjecture that

E_{v}^{c} (P_{X Y}, R) = E^{c} (P_{X}, P_{Y | X}, H (P_{X}) - R)

. It turns out this is not true. We shall show that

E_{v}^{c} (P_{X Y}, R) = 0

for all R. Actually we have a stronger result—the correct decoding probability of variable-rate Slepian-Wolf coding can be bounded away from zero even when

R < H (X | Y)

. This is in sharp contrast with fixed-rate Slepian-Wolf coding for which the correct decoding probability decays to zero exponentially fast if the rate is below the Slepian-Wolf limit. To make the statement more precise, we need the following definition.

Definition 6.

Given a joint probability distribution

P_{X Y}

, we say that a correct decoding probability

P_{c, v} (P_{X Y}, R)

is achievable with variable-rate Slepian-Wolf codes at rate R if for any

δ > 0

, there exists a sequence of variable-rate Slepian-Wolf codes

{φ_{n}}

such that

\begin{matrix} \underset{n \to \infty}{\lim \sup} R (φ_{n}, P_{X Y}) \leq R + δ, \\ \underset{n \to \infty}{\lim \sup} P_{c} (φ_{n}, P_{X Y}) \geq P_{c, v} (P_{X Y}, R) - δ . \end{matrix}

The largest achievable correct decoding probability at rate R is denoted by

P_{c, v}^{\max} (P_{X Y}, R)

.

Theorem 3.

P_{c, v}^{\max} (P_{X Y}, R) = \frac{R}{H (X | Y)}

for

R \in (0, H (X | Y)]

.

Remark 7.

It is obvious that

P_{c, v}^{\max} (P_{X Y}, R) = 1

for

R > H (X | Y)

. Moreover, since

P_{c, v}^{\max} (P_{X Y}, R)

is a monotonically increasing function of R, it follows that

P_{c, v}^{\max} (P_{X Y}, 0) = 0

.

Proof.

The intuition underlying the proof is as follows. Assume the rate is below the Slepian-Wolf limit, i.e.,

R < H (X | Y)

. For each type P in the neighborhood of

P_{X}

, the rate allocated to the type class

T_{n} (P)

should be no less than

H (X | Y)

in order to correctly decode the sequences in

T_{n} (P)

. However, since almost all the probability are captured by the type classes whose types are in the neighborhood of

P_{X}

, there is no enough rate to protect all of them. Note that if the rate is evenly allocated among these type classes, none of them can get enough rate; consequently, the correct decoding probability goes to zero. A good way is to protect only a portion of them to accumulate enough rate. Specifically, we can protect

\frac{R}{H (X | Y)}

fraction of these type classes so that the rate allocated to each of them is about

H (X | Y)

and leave the remaining type classes unprotected. It turns out this strategy achieves the maximum correct decoding probability as the block length n goes to infinity. Somewhat interestingly, although

E_{v}^{c} (P_{X Y}, R) \neq E^{c} (P_{X}, P_{Y | X}, H (P_{X}) - R)

, the function

E^{c} (P_{X}, P_{Y | X}, \cdot)

does play a fundamental role in establishing the correct result.

The proof is divided into two parts. Firstly, we shall show that

P_{c, v}^{\max} (P_{X Y}, R) \geq \frac{R}{H (X | Y)}

. For any

ϵ > 0

, define

\begin{matrix} U (ϵ) = \{P \in P (X) : ∥ P - P_{X} ∥ \leq ϵ\} . \end{matrix}

Since

P_{X} (x) > 0

for all

x \in X

, we can choose

ϵ

small enough so that

\begin{matrix} q_{\min} (ϵ) ≜ \min_{P \in U (ϵ), x \in X} P (x) > 0 . \end{matrix}

Using Stirling’s approximation

\begin{matrix} \sqrt{2 π m} {(\frac{m}{e})}^{m} e^{\frac{1}{12 m + 1}} < m! < \sqrt{2 π m} {(\frac{m}{e})}^{m} e^{\frac{1}{12 m}}, \end{matrix}

we have, for any

P \in U (ϵ) \cap P_{n} (X)

,

\begin{matrix} \Pr (X^{n} \in T_{n} (P)) & = \frac{n!}{\prod_{x} (n P (x))!} \prod_{x} {[P_{X} (x)]}^{n P (x)} \\ \leq \frac{\sqrt{2 π n} e^{\frac{1}{12 n}}}{\prod_{x} \sqrt{2 π n P (x)}} e^{- n D (P ∥ P_{X})} \\ \leq \frac{\sqrt{2 π} e^{\frac{1}{12 n}}}{\prod_{x} \sqrt{2 π P (x)}} n^{- \frac{| X | - 1}{2}} \\ \leq \frac{\sqrt{2 π} e^{\frac{1}{12 n}}}{\prod_{x} \sqrt{2 π q_{\min} (ϵ)}} n^{- \frac{| X | - 1}{2}}, \end{matrix}

which implies that

\Pr (X^{n} \in T_{n} (P))

converges uniformly to zero as

n \to \infty

for all

P \in U (ϵ) \cap P_{n} (X)

. Moreover, it follows from the weak law of large numbers that

\begin{matrix} \lim_{n \to \infty} \sum_{P \in U (ϵ) \cap P_{n} (X)} \Pr (X^{n} \in T_{n} (P)) = 1 . \end{matrix}

Therefore, for any

δ > 0

,

R \in (0, H (X | Y)]

, and sufficiently large n, we can find a set

S_{n} \subseteq U (ϵ) \cap P_{n} (X)

such that

\begin{matrix} \frac{R}{H (X | Y)} - δ \leq \sum_{P \in S_{n}} \Pr (X^{n} \in T_{n} (P)) \leq \frac{R}{H (X | Y)} . \end{matrix}

Now consider a sequence of variable-rate Slepian-Wolf codes

{φ_{n} (\cdot)}

specified as follows.

The encoder sends of type of $X^{n}$ to the decoder, where each type is uniquely represented by a binary sequence of length $⌈ \log_{2} | P_{n} (X) | ⌉$ .
For each $P \in S_{n}$ , the encoder partitions the type class $T_{n} (P)$ into $L_{n}$ subsets $T_{n} (P, 1), T_{n} (P, 2), \dots, T_{n} (P, L_{n})$ . If $X^{n} \in T_{n} (P)$ for some $P \in S_{n}$ , the encoder finds the subset $T_{n} (P, i^{*})$ that contains $X^{n}$ and sends the index $i^{*}$ to the decoder, where each index in ${1, 2, \dots, L_{n}}$ is uniquely represented by a binary sequence of length $⌈ \log_{2} L_{n} ⌉$ .
The remaining type classes are left uncoded.

Specifically, we let

\begin{matrix} L_{n} = ⌈(2 {(n + 1)}^{{| X |}^{2}} e^{n (H (X | Y) + δ)})⌉ . \end{matrix}

It follows from ([8], Theorem 2) that for each

P \in S_{n}

, it is possible to partition the type class

T_{n} (P)

into

L_{n}

disjoint subsets

T_{n} (P, 1), T_{n} (P, 2), \dots, T_{n} (P, L_{n})

so that

\begin{matrix} - \frac{1}{n} \log \Pr ({\hat{X}}^{n} \neq X^{n} | X^{n} \in T_{n} (P)) \geq \min_{Q_{X} \in U (ϵ)} [E_{r c} (Q_{X}, P_{Y | X}, H (Q_{X}) - H (X | Y) - δ) - ϵ] \end{matrix}

uniformly for all

P \in S_{n}

when n is sufficiently large. In view of the fact that

E_{r c} (P_{X}, P_{Y | X}, I (P_{X}, P_{Y | X}) - δ) > 0

and that

E_{r c} (Q_{X}, P_{Y | X}, R)

as a function of the pair

(Q_{X}, R)

is uniformly equicontinuous, we have

\begin{matrix} \min_{Q_{X} \in U (ϵ)} [E_{r c} (Q_{X}, P_{Y | X}, H (Q_{X}) - H (X | Y) - δ) - ϵ] ≜ κ_{1} > 0 \end{matrix}

for sufficiently small

ϵ

.

For this sequence of constructed variable-rate Slepian-wolf codes

{φ_{n} (\cdot)}

, it can be readily verified that

\begin{matrix} \underset{n \to \infty}{\lim \sup} R (φ_{n}, P_{X Y}) & = \underset{n \to \infty}{\lim \sup} \frac{1}{n \log_{2} e} [⌈ \log_{2} | P_{n} (X) | ⌉ + \sum_{P \in S_{n}} \Pr {X^{n} \in T_{n} (P)} ⌈ \log_{2} L_{n} ⌉] \\ \leq \frac{R}{H (X | Y)} (H (X | Y) + δ) \end{matrix}

and

\begin{matrix} \underset{n \to \infty}{\lim \sup} P_{c} (φ_{n}, R) & \geq \underset{n \to \infty}{\lim \sup} \sum_{P \in S_{n}} \Pr {X^{n} \in T_{n} (P)} [1 - \Pr {{\hat{X}}^{n} \neq X^{n} | X^{n} \in T_{n} (P)}] \\ \geq \underset{n \to \infty}{\lim \sup} \sum_{P \in S_{n}} \Pr {X^{n} \in T_{n} (P)} (1 - e^{- n κ_{1}}) \\ \geq \frac{R}{H (X | Y)} - δ . \end{matrix}

Since

δ > 0

is arbitrary, it follows from Definition 6 that

P_{c, v}^{\max} (P_{X Y}, R) \geq \frac{R}{H (X | Y)}

.

Now we proceed to prove the other direction. It follows from Definition 6 that for any

δ > 0

, there exists a sequence of variable-rate Slepian-Wolf codes

{φ_{n} (\cdot)}

with

\begin{matrix} \underset{n \to \infty}{\lim \sup} R (φ_{n}, P_{X Y}) \leq R + δ, \\ \underset{n \to \infty}{\lim \sup} P_{c} (φ_{n}, P_{X Y}) \geq P_{c, v}^{\max} (R) - δ . \end{matrix}

Recall the definition of

T_{n} (P, 1), \dots, T_{n} (P, N_{n} (P))

as well as

r (T_{n} (P))

in the proof of Theorem 1. For

P \in P_{n} (X)

, define

\begin{matrix} I_{n} (P, δ) = \{i : \frac{1}{n} \log \frac{| T_{n} (P) |}{| T_{n} (P, i) |} \leq H (X | Y) - δ, i = 1, 2, \dots, N_{n} (P)\}, \\ I_{n}^{c} (P, δ) = \{i : \frac{1}{n} \log \frac{| T_{n} (P) |}{| T_{n} (P, i) |} > H (X | Y) - δ, i = 1, 2, \dots, N_{n} (P)\} . \end{matrix}

Note that

\begin{matrix} \sum_{i \in I_{n} (P, δ)} \frac{| T_{n} (P, i) |}{| T_{n} (P) |} \geq 1 - \frac{r (T_{n} (P))}{H (X | Y) - δ} \end{matrix}

since

\begin{matrix} r (T_{n} (P)) & \geq \frac{1}{n} \sum_{i = 1}^{N_{n} (P)} \frac{| T_{n} (P, i) |}{| T_{n} (P) |} \log \frac{| T_{n} (P) |}{| T_{n} (P, i) |} \\ \geq \frac{1}{n} \sum_{i \in I_{n}^{c} (P, δ)} \frac{| T_{n} (P, i) |}{| T_{n} (P) |} \log \frac{| T_{n} (P) |}{| T_{n} (P, i) |} \\ \geq (H (X | Y) - δ) \sum_{i \in I_{n}^{c} (P, δ)} \frac{| T_{n} (P, i) |}{| T_{n} (P) |}, \end{matrix}

(50)

where (50) is due to (33).

Each

T_{n} (P, i)

can be viewed as a constant composition code of type P and we have

\begin{matrix} \Pr {{\hat{X}}^{n} = X^{n} | X^{n} \in T_{n} (P, i)} = P_{c} (T_{n} (P, i), P_{Y | X}) . \end{matrix}

Note that for

P \in U (ϵ) \cap P_{n} (X)

and

i \in I_{n} (P, δ)

,

\begin{matrix} \frac{1}{n} \log | T_{n} (P, i) | & \geq \frac{1}{n} \log | T_{n} (P) | - H (X | Y) + δ \\ \geq H (P) - H (X | Y) + δ - | X | \frac{\log (n + 1)}{n} . \end{matrix}

Therefore, it follows from ([9], Lemma 5) that

\begin{matrix} - \frac{1}{n} \log P_{c} (T_{n} (P, i), P_{Y | X}) \geq \min_{Q_{X} \in U (ϵ)} E^{c} (Q_{X}, P_{Y | X}, H (Q_{X}) - H (X | Y) + δ - ϵ) - ϵ \end{matrix}

uniformly for all

P \in U (ϵ) \cap P_{n} (X)

and

i \in I_{n} (P, δ)

when n is sufficiently large. In view of the fact that

E^{c} (P_{X}, P_{Y | X}, I (P_{X}, P_{Y | X}) + δ) > 0

and that

E^{c} (Q_{X}, P_{Y | X}, R)

as a function of the pair

(Q_{X}, R)

is uniformly equicontinuous, we have

\begin{matrix} \min_{Q_{X} \in U (ϵ)} [E^{c} (Q_{X}, P_{Y | X}, H (Q_{X}) - H (X | Y) + δ - ϵ) - ϵ] ≜ κ_{2} > 0 \end{matrix}

for sufficiently small

ϵ

.

Now it is easy to see that

\begin{matrix} \underset{n \to \infty}{\lim \inf} P_{e} (φ_{n}, P_{X Y}) \\ \geq \underset{n \to \infty}{\lim \inf} \sum_{P \in U (ϵ) \cap P_{n} (X)} \Pr {X^{n} \in T_{n} (P)} \sum_{i \in I_{n} (P, δ)} \frac{| T_{n} (P, i) |}{| T_{n} (P) |} (1 - \Pr {{\hat{X}}^{n} = X^{n} | X^{n} \in T_{n} (P, i)}) \\ \geq \underset{n \to \infty}{\lim \inf} \sum_{P \in U (ϵ) \cap P_{n} (X)} \Pr {X^{n} \in T_{n} (P)} \sum_{i \in I_{n} (P, δ)} \frac{| T_{n} (P, i) |}{| T_{n} (P) |} (1 - e^{- n κ_{2}}) \\ \geq \underset{n \to \infty}{\lim \inf} \sum_{P \in U (ϵ) \cap P_{n} (X)} \Pr {X^{n} \in T_{n} (P)} (1 - \frac{r (T_{n} (P))}{H (X | Y) - δ}) (1 - e^{- n κ_{2}}) \\ \geq \underset{n \to \infty}{\lim \inf} \sum_{P \in U (ϵ) \cap P_{n} (X)} \Pr {X^{n} \in T_{n} (P)} (1 - e^{- n κ_{2}}) \\ - \underset{n \to \infty}{\lim \sup} \sum_{P \in U (ϵ) \cap P_{n} (X)} \Pr {X^{n} \in T_{n} (P)} \frac{r (T_{n} (P))}{H (X | Y) - δ} (1 - e^{- n κ_{2}}) \\ \geq \underset{n \to \infty}{\lim \inf} \sum_{P \in U (ϵ) \cap P_{n} (X)} \Pr {X^{n} \in T_{n} (P)} (1 - e^{- n κ_{2}}) \\ - \underset{n \to \infty}{\lim \sup} \sum_{P \in P_{n} (X)} \Pr {X^{n} \in T_{n} (P)} \frac{r (T_{n} (P))}{H (X | Y) - δ} (1 - e^{- n κ_{2}}) \\ = \underset{n \to \infty}{\lim \inf} \sum_{P \in U (ϵ) \cap P_{n} (X)} \Pr {X^{n} \in T_{n} (P)} (1 - e^{- n κ_{2}}) \\ - \underset{n \to \infty}{\lim \sup} \frac{R (φ_{n}, P_{X Y})}{H (X | Y) - δ} (1 - e^{- n κ_{2}}) \\ = 1 - \frac{R + δ}{H (X | Y) - δ}, \end{matrix}

which implies

\begin{matrix} \underset{n \to \infty}{\lim \sup} P_{c} (φ_{n}, P_{X Y}) \leq \frac{R + δ}{H (X | Y) - δ} . \end{matrix}

Therefore, we have

\begin{matrix} P_{c, v}^{\max} (P_{X Y}, R) - δ \leq \frac{R + δ}{H (X | Y) - δ} . \end{matrix}

Since

δ > 0

is arbitrary, this completes the proof. ☐

5. Example

Consider the joint distribution

P_{X Y}

over

Z_{2} \times Z_{2}

with

P_{X | Y} (1 | 0) = P_{X | Y} (0 | 1) = p

and

P_{Y} (0) = τ

. We assume

p \in (0, \frac{1}{2})

,

τ \in (0, \frac{1}{2}]

. It is easy to compute that

\begin{matrix} P_{X} (0) = 1 - P_{X} (1) = τ (1 - p) + (1 - τ) p, \\ P_{Y | X} (1 | 0) = 1 - P_{Y | X} (0 | 0) = \frac{(1 - τ) p}{τ (1 - p) + (1 - τ) p}, \\ P_{Y | X} (0 | 1) = 1 - P_{Y | X} (1 | 1) = \frac{τ p}{τ p + (1 - τ) (1 - p))} . \end{matrix}

For this joint distribution, we have

H (X | Y) = H_{b} (p)

, where

H_{b} (\cdot)

is the binary entropy function (i.e.,

H_{b} (p) = - p \log p - (1 - p) \log (1 - p)

). Given

R \in [0, \log 2]

, let q be the unique number satisfying

H_{b} (q) = R

and

q \leq \frac{1}{2}

. It can be verified that

\begin{matrix} E_{f, s p} (P_{X Y}, R) = D (q ∥ p), R \in [H_{b} (p), \log 2], \\ E_{f}^{c} (P_{X Y}, R) = D (q ∥ p), R \in [0, H_{b} (p)] . \end{matrix}

Note that

\begin{matrix} E_{e x} (Q_{X}, P_{Y | X}, 0) & = - \sum_{x, x^{'}} Q_{X} (x) Q_{X} (x^{'}) \log [\sum_{y} \sqrt{P_{Y | X} (y | x) P_{Y | X} (y | x^{'})}] \\ = - 2 Q_{X} (0) Q_{X} (1) \log [\sum_{y} \sqrt{P_{Y | X} (y | 0) P_{Y | X} (y | 1)}], \end{matrix}

which is a concave function of

Q_{X}

. Therefore,

\begin{matrix} E_{e x}^{*} (P_{X}, P_{Y | X}, 0) = E_{e x} (P_{X}, P_{Y | X}, 0) . \end{matrix}

Moreover, we have

\begin{matrix} E_{e x} (P_{Y | X}, 0) & = \max_{Q_{X}} E_{e x} (Q_{X}, P_{Y | X}, 0) \\ = - \frac{1}{2} \log [\sum_{y} \sqrt{P_{Y | X} (y | 0) P_{Y | X} (y | 1)}] . \end{matrix}

It is easy to show that

\begin{matrix} E_{v, s p} (P_{X Y}, H (P_{X})) & = E_{s p} (P_{X}, P_{X | Y}, 0) \\ = \min_{Q_{Y}} \sum_{x} P_{X} (x) \sum_{y} Q_{Y} (y) \log \frac{Q_{Y} (y)}{P_{Y | X} (y | x)}, \end{matrix}

where the minimizer

Q_{Y}^{*}

is given by

\begin{matrix} Q_{Y}^{*} (y) = \frac{\prod_{x} P_{Y | X} {(y | x)}^{P_{X} (x)}}{\sum_{y^{'}} \prod_{x} P_{Y | X} {(y^{'} | x)}^{P_{X} (x)}}, y \in Y . \end{matrix}

Define

\begin{matrix} E_{f, e r} (P_{X Y}, R) = \max {E_{f, e x} (P_{X Y}, R), E_{f, r c} (P_{X Y}, R)}, \\ E_{v, e r} (P_{X Y}, R) = \max {E_{v, e x} (P_{X Y}, R), E_{v, r c} (P_{X Y}, R)} . \end{matrix}

We have

\begin{matrix} E_{f} (P_{X Y}, R) \geq E_{f, e r} (P_{X Y}, R), \\ E_{v} (P_{X Y}, R) \geq E_{v, e r} (P_{X Y}, R) . \end{matrix}

It can be seen from Figure 2 that the achievable error exponent

E_{v, e r} (P_{X Y}, R)

of variable-rate Slepian-Wolf coding can completely dominate the sphere packing exponent

E_{f, s p} (P_{X Y}, R)

of fixed-rate Slepian-Wolf coding. The gain of variable-rate coding gradually diminishes as

τ \to \frac{1}{2}

(see Figure 3 and Figure 4).

6. Concluding Remarks

We have studied the reliability function of variable-rate Slepian-Wolf coding. An intimate connection between variable-rate Slepian-Wolf codes and constant composition codes has been revealed. It is shown that variable-rate Slepian-Wolf coding can outperform fixed-rate Slepian-Wolf coding in terms of rate-error tradeoff. Finally, we would like to mention that Theorem 1 has been generalized by Weinberger and Merhav in their recent paper on the optimal tradeoff between the error exponent and the excess-rate exponent of variable-rate Slepian-Wolf coding [19].

Acknowledgments

Jun Chen was supported in part by the Natural Sciences and Engineering Research Council of Canada through a Discovery Grant.

Author Contributions

All the authors contributed to the problem formulation and the proof of Theorem 2; Jun Chen established the remaining results and wrote the paper. All authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Proposition 1

In view of (7) and (11), we have

R_{c r} (Q_{X}, W_{Y | X}) = I (Q_{X}, W_{Y | X})

if and only if the minimum of the convex optimization problem

\begin{matrix} \min_{V_{Y | X}} D (V_{Y | X} ∥ W_{Y | X} | Q_{X}) + I (Q_{X}, V_{Y | X}) \end{matrix}

(A1)

is achieved at

V_{Y | X} = W_{Y | X}

. Let

V_{Y | X}^{*}

be a minimizer to the above optimization problem. Note that for

x, y

such that

Q_{X} (x) W_{Y | X} (y | x) = 0

, there is no loss of generality in setting

V_{Y | X}^{*} (y | x) = W_{Y | X} (y | x)

. Let

A = {x \in X : Q_{X} (x) > 0}

and

B_{x} = {y \in Y : W_{Y | X} (y | x) > 0}

for

x \in A

. We can rewrite (A1) in the following equivalent form:

\begin{matrix} \min_{V_{Y | X} (y | x) : x \in A, y \in B_{x}} \sum_{x \in A, y \in B_{x}} Q_{X} (x) V_{Y | X} (y | x) \log \frac{V_{Y | X}^{2} (y | x)}{W_{Y | X} (y | x) \sum_{x^{'} \in A} Q_{X} (x^{'}) V_{Y | X} (y | x^{'})} \end{matrix}

subject to

\begin{matrix} V_{Y | X} (y | x) \geq 0 for all x \in A, y \in B_{x}, \\ \sum_{y \in B_{x}} V_{Y | X} (y | x) = 1 for all x \in A . \end{matrix}

Define

\begin{matrix} G^{'} & = \sum_{x \in A, y \in B_{x}} Q_{X} (x) V_{Y | X} (y | x) \log \frac{V_{Y | X}^{2} (y | x)}{W_{Y | X} (y | x) \sum_{x^{'} \in A} Q_{X} (x^{'}) V_{Y | X} (y | x^{'})} \\ - \sum_{x \in A, y \in B_{x}} α (x, y) V_{Y | X} (y | x) - \sum_{x \in A, y \in B_{x}} β (x) V_{Y | X} (y | x), \end{matrix}

where

α (x, y) \in R_{+}

(

x \in A, y \in B_{x}

) and

β (x) \in R

(

x \in A

). The Karush-Kuhn-Tucker conditions yield

\begin{matrix} {\frac{\partial G^{'}}{\partial V_{Y | X} (y^{*} | x^{*})}|}_{V_{Y | X} (y^{*} | x^{*}) = V_{Y | X}^{*} (y^{*} | x^{*})} = 2 Q_{X} (x^{*}) \log V_{Y | X}^{*} (y^{*} | x^{*}) + Q_{X} (x^{*}) - Q_{X} (x^{*}) \log W_{Y | X} (y^{*} | x^{*}) \\ - Q_{X} (x^{*}) \log \sum_{x^{'} \in A} Q_{X} (x^{'}) V_{Y | X}^{*} (y^{*} | x^{'}) - α (x^{*}, y^{*}) - β (x^{*}) \\ = 0 for all x^{*} \in A, y^{*} \in B_{x^{*}}, \\ V_{Y | X}^{*} (y^{*} | x^{*}) \geq 0 for all x^{*} \in A, y^{*} \in B_{x^{*}}, \\ \sum_{y^{*} \in B_{x^{*}}} V_{Y | X}^{*} (y^{*} | x^{*}) = 1 for all x^{*} \in A, \\ α (x^{*}, y^{*}) V_{Y | X}^{*} (y^{*} | x^{*}) = 0 for all x^{*} \in A, y^{*} \in B_{x^{*}} . \end{matrix}

By the complementary slackness conditions (i.e.,

V_{Y | X}^{*} (y^{*} | x^{*}) > 0 \Rightarrow α (x^{*}, y^{*}) = 0

), we have

V_{Y | X}^{*} = W_{Y | X}

if and only if for all

x^{*} \in A

,

y^{*} \in B_{x^{*}}

,

\begin{matrix} Q_{X} (x^{*}) \log W_{Y | X} (y^{*} | x^{*}) + Q_{X} (x^{*}) - Q_{X} (x^{*}) \log \sum_{x^{'} \in A} Q_{X} (x^{'}) W_{Y | X} (y^{*} | x^{'}) - β (x^{*}) = 0, \end{matrix}

i.e., the value of

\begin{matrix} \frac{W_{Y | X} (y | x)}{\sum_{x^{'}} Q_{X} (x^{'}) W_{Y | X} (y | x^{'})} \end{matrix}

does not depend on y for all

x, y

such that

Q_{X} (x) W_{Y | X} (y | x) > 0

.

Appendix B. Proof of Proposition 4

It is known ([7], Exercise 5.17) that for every $R > 0$ , $δ > 0$ , and $P \in P_{n} (X)$ there exists a constant composition code $C_{n} \subseteq T_{n} (P)$ such that

$\begin{matrix} R (C_{n}) \geq R - δ, \\ - \frac{1}{n} \log P_{e, \max} (C_{n}, W_{Y | X}) \geq E_{e x} (P, W_{Y | X}, R) - δ \end{matrix}$

whenever $n \geq n_{0} (| X |, | Y |, δ)$ . Let $P_{n}$ be a sequence of types with $P_{n} \in P_{n} (X)$ and

$\begin{matrix} \lim_{n \to \infty} ∥ P_{n} - Q_{X} ∥ = 0 . \end{matrix}$

Define

$\begin{matrix} V_{n}^{*} = \arg \min [\sum_{x, \tilde{x}} P_{n} (x) V_{n} (\tilde{x} | x) d_{W_{Y | X}} (x, \tilde{x}) + I (P_{n}, V_{n}) - R], \end{matrix}$

where the minimization is over $V_{n} : X \to X$ subject to the constraints

$\begin{matrix} \sum_{x} P_{n} (x) V_{n} (\tilde{x} | x) = P_{n} (\tilde{x}) for all \tilde{x} \in X, \\ I (P_{n}, V_{n}) \leq R . \end{matrix}$

Note that ${V_{n}^{*}}$ must contain a converging subsequence ${V_{n_{k}}^{*}}$ . Define

$\begin{matrix} V^{*} = \lim_{k \to \infty} V_{n_{k}}^{*} . \end{matrix}$

It is easy to verify that

$\begin{matrix} \sum_{x \in X} Q_{X} (x) V^{*} (\tilde{x} | x) & = \lim_{k \to \infty} \sum_{x \in X} P_{n_{k}} (x) V_{n_{k}}^{*} (\tilde{x} | x) \\ = \lim_{k \to \infty} P_{n_{k}} (\tilde{x}) \\ = Q_{X} (\tilde{x}) f o r a l l \tilde{x} \in X, \\ I (Q_{X}, V^{*}) & = \lim_{k \to \infty} I (P_{n_{k}}, V_{n_{k}}^{*}) \\ \leq R . \end{matrix}$

Therefore, we have

$\begin{matrix} \underset{n \to \infty}{lim sup} E_{e x} (P_{n}, W_{Y | X}, R) \\ \geq \underset{k \to \infty}{lim sup} E_{e x} (P_{n_{k}}, W_{Y | X}, R) \\ = \underset{k \to \infty}{lim sup} \sum_{x, \tilde{x} \in X} P_{n_{k}} (x) V_{n_{k}}^{*} (\tilde{x} | x) d_{W_{Y | X}} (x, \tilde{x}) + I (P_{n_{k}}, V_{n_{k}}) - R \\ \geq \sum_{x, \tilde{x} \in X} Q_{X} (x) V^{*} (\tilde{x} | x) d_{P_{Y | X}} (x, \tilde{x}) + I (Q_{X}, V^{*}) - R \\ \geq E_{e x} (Q_{X}, W_{Y | X}, R) . \end{matrix}$

It is also known ([7], Theorem 5.2) that for every $R > 0$ , $δ > 0$ , and $P \in P_{n} (X)$ there exists a constant composition code $C_{n} \subseteq T_{n} (P)$ such that

$\begin{matrix} R (C_{n}) \geq R - δ, \\ - \frac{1}{n} \log P_{e, \max} (C_{n}, W_{Y | X}) \geq E_{r c} (P, W_{Y | X}, R) - δ \end{matrix}$

whenever $n \geq n_{0} (| X |, | Y |, δ)$ . So it can be readily shown that

$\begin{matrix} E (Q_{X}, W_{Y | X}, R) \geq E_{r c} (Q_{X}, W_{Y | X}, R) \end{matrix}$

by invoking the fact that $E_{r c} (P, W_{Y | X}, R)$ as a function of the pair $(P, R)$ is uniformly equicontinuous ([7], Lemma 5.5). The proof is complete.
By Definition 2, for every $R > 0$ , $δ > 0$ there exists a sequence of block channel codes ${C_{n}}$ with $C_{n} \subseteq T_{n} (P_{n})$ for some $P_{n} \in P_{n} (X)$ such that

$\begin{matrix} \lim_{n \to \infty} ∥ P_{n} - Q_{X} ∥ = 0, \\ \underset{n \to \infty}{\lim \inf} R (C_{n}) \geq R - δ, \\ \underset{n \to \infty}{\lim \inf} - \frac{1}{n} \log P_{e, \max} (C_{n}, W_{Y | X}) \geq E (Q_{X}, W_{Y | X}, R) - δ . \end{matrix}$

(A2)

For simplicity, we assume $R (C_{n}) \geq R - δ$ for all n. Now it follows from ([7], Theorem 5.3) that

$\begin{matrix} - \frac{1}{n} \log [2 P_{e, \max} (C_{n}, W_{Y | X})] \leq E_{s p} (P_{n}, W_{Y | X}, R - 2 δ) (1 + δ) \end{matrix}$

(A3)

whenever $n \geq n_{0} (| X |, | Y |, δ)$ . Let

$\begin{matrix} V_{Y | X}^{*} = \arg \min_{V_{Y | X} : I (Q_{X}, V_{Y | X}) \leq R - 3 δ} D (V_{Y | X} ∥ W_{Y | X} | Q_{X}) . \end{matrix}$

Without loss of generality, we can set $V_{Y | X}^{*} (\cdot | x) = W_{Y | X} (\cdot | x)$ for all $x \in {x^{'} \in X : Q_{X} (x^{'}) = 0}$ . It is easy to see that there exists an $ϵ > 0$ such that

$\begin{matrix} I (P, V_{Y | X}^{*}) \leq R - 2 δ, \\ D (V_{Y | X}^{*} ∥ W_{Y | X} | P) \leq D (V_{Y | X}^{*} ∥ W_{Y | X} | Q_{X}) + δ \end{matrix}$

for all $P \in P (X)$ with $∥ P - Q_{X} ∥ \leq ϵ$ . Therefore, for all sufficiently large n,

$\begin{matrix} E_{s p} (P_{n}, W_{Y | X}, R - 2 δ) & = \min_{V_{Y | X} : I (P_{n}, V_{Y | X}) \leq R - 2 δ} D (V_{Y | X} ∥ W_{Y | X} | P_{n}) \\ \leq D (V_{Y | X}^{*} ∥ W_{Y | X} | P_{n}) \\ \leq D (V_{Y | X}^{*} ∥ W_{Y | X} | Q_{X}) + δ \\ = E_{s p} (Q_{X}, W_{Y | X}, R - 3 δ) + δ . \end{matrix}$

(A4)

Combining (A2)–(A4), we get

$\begin{matrix} E (Q_{X}, W_{Y | X}, R) - δ \leq [E_{s p} (Q_{X}, W_{Y | X}, R - 3 δ) + δ] (1 + δ) . \end{matrix}$

In view of the fact that $δ > 0$ is arbitrary and that for fixed P and $W_{Y | X}$ , $E_{s p} (P, W_{Y | X}, R)$ is a decreasing continuous convex function of R in the interval where it is finite ([7], Lemma 5.4), the proof is complete.
It is known ([9], Lemma 5) that every constant composition code $C_{n}$ of common type P for some $P \in P_{n} (X)$ and rate $R (C_{n}) \geq R + δ$ (with $R > 0$ and $δ > 0$ ) has

$\begin{matrix} - \frac{1}{n} \log P_{c} (C_{n}, W_{Y | X}) \geq \min_{V_{Y | X}} [D (V_{Y | X} ∥ W_{Y | X} | P) + | R - I (P, V_{Y | X}) |^{+}] - δ \end{matrix}$

whenever $n \geq n_{0} (| X |, | Y |, δ)$ . Moreover, it is also known ([9], Lemma 2) ([7], Exercise 5.16) that for every $R > 0$ , $δ > 0$ , and $P \in P_{n} (X)$ there exists a constant composition code $C_{n} \subseteq T_{n} (P)$ such that

$\begin{matrix} R (C_{n}) \geq R - δ, \\ - \frac{1}{n} \log P_{c} (C_{n}, W_{Y | X}) \leq \min_{V_{Y | X}} [D (V_{Y | X} ∥ W_{Y | X} | P) + | R - I (P, V_{Y | X}) |^{+}] + δ \end{matrix}$

whenever $n \geq n_{0} (| X |, | Y |, δ)$ . In view of the fact that $\min_{V_{Y | X}} [D (V_{Y | X} ∥ W_{Y | X} | P)$ + $| R - I (P, V_{Y | X}) |^{+}]$ as a function of the pair $(P, R)$ is uniformly equicontinuous, it can be readily shown that

$\begin{matrix} E_{c} (Q_{X}, W_{Y | X}, R) = \min_{V_{Y | X}} [D (V_{Y | X} ∥ W_{Y | X} | Q_{X}) + | R - I (P, V_{Y | X}) |^{+}] . \end{matrix}$

The proof is complete.

References

Slepian, D.; Wolf, J.K. Noiseless coding of correlated information sources. IEEE Trans. Inf. Theory 1973, 19, 471–480. [Google Scholar] [CrossRef]
Csiszár, I. Linear codes for sources and source networks: Error exponents, universal coding. IEEE Trans. Inf. Theory 1982, 28, 585–592. [Google Scholar] [CrossRef]
Pradhan, S.S.; Ramchandran, K. Distributed source coding using syndromes (DISCUS): Design and construction. IEEE Trans. Inf. Theory 2003, 49, 626–643. [Google Scholar] [CrossRef]
Chen, J.; He, D.-K.; Jagmohan, A. The equivalence between Slepian-Wolf coding and channel coding under density evolution. IEEE Trans. Commun. 2009, 57, 2534–2540. [Google Scholar] [CrossRef]
Sun, Z.; Tian, C.; Chen, J.; Wong, K.M. LDPC code design for asynchronous Slepian-Wolf coding. IEEE Trans. Commun. 2010, 58, 511–520. [Google Scholar] [CrossRef]
Oohama, Y.; Han, T.S. Universal coding for the Slepian-Wolf data compression system and the strong converse theorem. IEEE Trans. Inf. Theory 1994, 40, 1908–1919. [Google Scholar] [CrossRef]
Csiszár, I.; Körner, J. Information Theory: Coding Theorems for Discrete Memoryless Systems; Academic: New York, NY, USA, 1981. [Google Scholar]
Csiszár, I.; Körner, J. Graph decomposition: A new key to coding theorems. IEEE Trans. Inf. Theory 1981, 27, 5–12. [Google Scholar] [CrossRef]
Dueck, G.; Körner, J. Reliability function of a discrete memoryless channel at rates above capacity. IEEE Trans. Inf. Theory 1979, 25, 82–85. [Google Scholar] [CrossRef]
Gallager, R.G. Information Theory and Reliable Communication; Wiley: New York, NY, USA, 1968. [Google Scholar]
Gallager, R.G. Source Coding with Side Information and Universal Coding, MIT LIDS Technical Report (LIDS-P-937). Unpublished work. 1976.
Chen, J.; He, D.-K.; Jagmohan, A.; Lastras-Montaño, L. On the redundancy-error tradeoff in Slepian-Wolf coding and channel coding. In Proceedings of the 2007 IEEE International Symposium on Information Theory, Nice, France, 24–29 Junuary 2007; pp. 1326–1330. [Google Scholar]
Csiszár, I.; Körner, J. Towards a general theory of source networks. IEEE Trans. Inf. Theory 1980, 26, 155–165. [Google Scholar] [CrossRef]
Chen, J.; He, D.-K.; Jagmohan, A.; Lastras-Montaño, L.A.; Yang, E.-H. On the linear codebook-level duality between Slepian-Wolf coding and channel coding. IEEE Trans. Inf. Theory 2009, 55, 5575–5590. [Google Scholar] [CrossRef]
Ahlswede, R.; Dueck, G. Good codes can be produced by a few permutations. IEEE Trans. Inf. Theory 1982, 28, 430–443. [Google Scholar] [CrossRef]
Chen, J.; He, D.-K.; Jagmohan, A. On the duality between Slepian-Wolf coding and channel coding under mismatched decoding. IEEE Trans. Inf. Theory 2009, 55, 4006–4018. [Google Scholar] [CrossRef]
Ahlswede, R. Coloring hypergraphs: A new approach to multi-user source coding—II. J. Comb. Inf. Syst. Sci. 1980, 5, 220–268. [Google Scholar]
He, D.-K.; Lastras-Montaño, L.; Yang, E.-H.; Jagmohan, A.; Chen, J. On the redundancy of Slepian-Wolf coding. IEEE Trans. Inf. Theory 2009, 55, 5607–5627. [Google Scholar] [CrossRef]
Weinberger, N.; Merhav, N. Optimum tradeoffs between the error exponent and the excess-rate exponent of variable-rate Slepian-Wolf coding. IEEE Trans. Inf. Theory 2015, 61, 2165–2190. [Google Scholar] [CrossRef]

Figure 1. Slepian-Wolf coding.

Figure 2.

p = 0.05

,

τ = 0.12

.

Figure 2.

p = 0.05

,

τ = 0.12

.

Figure 3.

p = 0.05

,

τ = 0.35

.

Figure 3.

p = 0.05

,

τ = 0.35

.

Figure 4.

p = 0.05

,

τ = 0.50

.

Figure 4.

p = 0.05

,

τ = 0.50

.

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, J.; He, D.-k.; Jagmohan, A.; Lastras-Montaño, L.A. On the Reliability Function of Variable-Rate Slepian-Wolf Coding. Entropy 2017, 19, 389. https://doi.org/10.3390/e19080389

AMA Style

Chen J, He D-k, Jagmohan A, Lastras-Montaño LA. On the Reliability Function of Variable-Rate Slepian-Wolf Coding. Entropy. 2017; 19(8):389. https://doi.org/10.3390/e19080389

Chicago/Turabian Style

Chen, Jun, Da-ke He, Ashish Jagmohan, and Luis A. Lastras-Montaño. 2017. "On the Reliability Function of Variable-Rate Slepian-Wolf Coding" Entropy 19, no. 8: 389. https://doi.org/10.3390/e19080389

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

On the Reliability Function of Variable-Rate Slepian-Wolf Coding^†

Abstract

1. Introduction

2. Fixed-Rate Slepian-Wolf Coding and Channel Coding

3. Variable-Rate Slepian-Wolf Coding: Above the Slepian-Wolf Limit

4. Variable-Rate Slepian-Wolf Coding: Below the Slepian-Wolf Limit

5. Example

6. Concluding Remarks

Acknowledgments

Author Contributions

Conflicts of Interest

Appendix A. Proof of Proposition 1

Appendix B. Proof of Proposition 4

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

On the Reliability Function of Variable-Rate Slepian-Wolf Coding †

Abstract

1. Introduction

2. Fixed-Rate Slepian-Wolf Coding and Channel Coding

3. Variable-Rate Slepian-Wolf Coding: Above the Slepian-Wolf Limit

4. Variable-Rate Slepian-Wolf Coding: Below the Slepian-Wolf Limit

5. Example

6. Concluding Remarks

Acknowledgments

Author Contributions

Conflicts of Interest

Appendix A. Proof of Proposition 1

Appendix B. Proof of Proposition 4

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

On the Reliability Function of Variable-Rate Slepian-Wolf Coding^†