Encoding Individual Source Sequences for the Wiretap Channel

Neri Merhav

doi:10.3390/e23121694

The Viterbi Faculty of Electrical and Computer Engineering, Technion-Israel Institute of Technology, Technion City, Haifa 3200003, Israel

Entropy2021, 23(12), 1694;https://doi.org/10.3390/e23121694

This article belongs to the Special Issue Wireless Networks: Information Theoretic Perspectives Ⅱ

Version Notes

Order Reprints

Abstract

We consider the problem of encoding a deterministic source sequence (i.e., individual sequence) for the degraded wiretap channel by means of an encoder and decoder that can both be implemented as finite-state machines. Our first main result is a necessary condition for both reliable and secure transmission in terms of the given source sequence, the bandwidth expansion factor, the secrecy capacity, the number of states of the encoder and the number of states of the decoder. Equivalently, this necessary condition can be presented as a converse bound (i.e., a lower bound) on the smallest achievable bandwidth expansion factor. The bound is asymptotically achievable by Lempel–Ziv compression followed by good channel coding for the wiretap channel. Given that the lower bound is saturated, we also derive a lower bound on the minimum necessary rate of purely random bits needed for local randomness at the encoder in order to meet the security constraint. This bound too is achieved by the same achievability scheme. Finally, we extend the main results to the case where the legitimate decoder has access to a side information sequence, which is another individual sequence that may be related to the source sequence, and a noisy version of the side information sequence leaks to the wiretapper.

Keywords:

wiretap channel; physical layer security; semantic security; individual sequence; finite-state machine; Lempel–Ziv algorithm; side information

1. Introduction

In his seminal paper, Wyner [1] introduced the wiretap channel as a model of secure communication over a degraded broadcast channel, without using a secret key, where the legitimate receiver has access to the output of the good channel and the wiretapper receives the output of the bad channel. The main idea is that the excess noise at the output of the wiretapper channel is utilized to secure the message intended to the legitimate receiver. Wyner fully characterized the best achievable trade-off between reliable communication to the legitimate receiver and the equivocation rate at the wiretapper, which was quantified in terms of the conditional entropy of the source given the output of the wiretapper channel. One of the most important concepts, introduced by Wyner, was the secrecy capacity, that is, the supremum of all coding rates that allow both reliable decoding at the legitimate receiver and full secrecy, where the equivocation rate saturates at the (unconditional) entropy rate of the source, or equivalently, the normalized mutual information between the source and the wiretap channel output is vanishingly small for large blocklength. The idea behind the construction of a good code for the wiretap channel is basically the same as the idea of binning: one designs a big code, that can be reliably decoded at the legitimate receiver, which is subdivided into smaller codes that are fed by purely random bits that are unrelated to the secret message. Each such sub-code can be reliably decoded individually by the wiretapper to its full capacity, thus leaving no further decoding capability for the remaining bits, which all belong to the real secret message.

During the nearly five decades that have passed since [1] was published, the wiretap channel model was extended and further developed in many aspects. We mention here just a few. Three years after Wyner, Csiszár and Körner [2] extended the wiretap channel to a general broadcast channel that is not necessarily degraded, allowing also a common message intended to both receivers. In the same year, Leung-Yan-Cheong and Hellman [3], studied the Gaussian wiretap channel, and proved, among other things, that its secrecy capacity is equal to the difference between the capacity of the legitimate channel and that of the wiretap channel. In [4], Ozarow and Wyner considered a somewhat different model, known as the type II wiretap channel, where the channel to the legitimate receiver is clean (noiseless), and the wiretapper can access a subset of the coded bits. In [5], Yamamoto extended the wiretap channel to include two parallel broadcast channels that connect one encoder and one legitimate decoder, and both channels are wiretapped by wiretappers that do not cooperate with each other. A few years later, the same author [6] further developed the scope of [1] in two ways: first, by allowing a private secret key to be shared between the encoder and the legitimate receiver, and secondly, by allowing a given distortion in the reproducing the source at the legitimate receiver. The main coding theorem of [6] suggests a three-fold separation principle, which asserts that no asymptotic optimality is lost if the encoder first applies a good lossy source code, then encrypts the compressed bits, and finally, applies a good channel code for the wiretap channel. In [7], this model in turn was generalized to allow source side information at the decoder and at the wiretapper in a degraded structure with application to systematic coding for the wiretap channel. The Gaussian wiretap channel model of [3] was also extended in two ways: the first is the Gaussian multiple access wiretap channel of [8], and the second is Gaussian interference wiretap channel of [9,10], where the encoder has access to the interference signal as side information. Wiretap channels with feedback were considered in [11], where it was shown that feedback is best used for the purpose of sharing a secret key as in [6,7]. More recent research efforts were dedicated to strengthening the secrecy metric from weak secrecy to strong secrecy, where the mutual information between the source and the wiretap channel output vanishes, even without normalization by the blocklength, as well as to semantic security, which is similar but refers even to the worst-case message source distribution; see, for example, Refs. [12,13,14,15,16], (Section 3.3 in [14]).

In this work, we look at Wyner’s wiretap channel model from a different perspective. Following the individual sequence approach pioneered by Ziv in [17,18,19], and continued in later works, such as [20,21], we consider the problem of encoding a deterministic source sequence (i.e., an individual sequence) for the degraded wiretap channel using finite-state encoders and finite-state decoders. One of the non-trivial issues associated with individual sequences, in the context of the wiretap channel, is how to define the security metric, as there is no probability distribution assigned to the source, and therefore, the equivocation, or the mutual information between the source and the wiretap channel output, cannot be well defined. In [20], a similar dilemma was encountered in the context of private key encryption of individual sequences, and in the converse theorem therein, it was assumed that the system is perfectly secure in the sense that the probability distribution of the cryptogram does not depend on the source sequence. In principle, it is possible to apply the same approach here, where the word ‘cryptogram’ is replaced by the ‘wiretap channel output’. However, in order to handle residual dependencies, which will always exist, it would be better to use a security metric that quantifies those small dependencies. To this end, it makes sense to adopt the above-mentioned maximum mutual information security metric (or, equivalently the semantic security metric), where the maximum is over all input assignments. After this maximization, this quantity depends only on the ‘channel’ between the source and the wiretap channel output.

Our first main result is a necessary condition (i.e., a converse to a coding theorem) for both reliable and secure transmission, which depends on: (i) the given individual source sequence, (ii) the bandwidth expansion factor, (iii) the secrecy capacity, (iv) the number of states of the encoder, (v) the number of states of the decoder, (vi) the allowed bit error probability at the legitimate decoder and (vii) the allowed maximum mutual information secrecy. Equivalently, this necessary condition can be presented as a converse bound (i.e., a lower bound) to the smallest achievable bandwidth expansion factor. The bound is asymptotically achievable by Lempel–Ziv (LZ) compression followed by a good channel coding scheme for the wiretap channel. Given that this lower bound is saturated, we then derive also a lower bound on the minimum necessary rate of purely random bits needed for adequate local randomness at the encoder, in order to meet the security constraint. This bound too is achieved by the same achievability scheme, a fact which may be of independent interest regardless of individual sequences and finite-state encoders and decoders (i.e., also for ordinary block codes in the traditional probabilistic setting). Finally, we extend the main results to the case where the legitimate decoder has access to a side information sequence, which is another individual sequence that may be related to the source sequence, and where a noisy version of the side information sequence leaks to the wiretapper. It turns out that in this case, the best strategy is the same as if one assumes that the wiretapper sees the clean side information sequence. While this may not be surprising as far as sufficiency is concerned (i.e., as an achievability result), it is less obvious in the context of necessity (i.e., a converse theorem).

The remaining part of this article is organized as follows. In Section 2, we establish the notation, provide some definitions and formalize the problem setting. In Section 3, we provide the main results of this article and discuss them in detail. In Section 4, the extension that incorporates side information is presented. Finally, in Section 5, the proofs of the main theorems are given.

2. Notation, Definitions, and Problem Setting

2.1. Notation

Throughout this paper, random variables are denoted by capital letters; specific values they may take are denoted by the corresponding lower case letters; and their alphabets are denoted by calligraphic letters. Random vectors, their realizations, and their alphabets are denoted, respectively, by capital letters, the corresponding lower case letters and calligraphic letters, all superscripted by their dimensions. For example, the random vector

X^{n} = (X_{1}, \dots, X_{n})

, (n – positive integer) may take a specific vector value

x^{n} = (x_{1}, \dots, x_{n})

in

X^{n}

, the n-th order Cartesian power of

X

, which is the alphabet of each component of this vector. Infinite sequences are denoted using the bold face font, e.g.,

x = (x_{1}, x_{2}, \dots)

. Segments of vectors are denoted by subscripts and superscripts that correspond to the start and the end locations; for example,

x_{i}^{j}

, for

i < j

integers, denotes

(x_{i}, x_{i + 1}, \dots, x_{j})

. When

i = 1

, the subscript is omitted.

Sources and channels are denoted by the letter P or Q, subscripted by the names of the relevant random variables/vectors and their conditionings, if applicable, following the standard notation conventions, e.g.,

Q_{X}

,

P_{Y | X}

, and so on, or by abbreviated names that describe their functionality. When there is no room for ambiguity, these subscripts are omitted. The probability of an event

E

will be denoted by

\Pr {E}

, and the expectation operator with respect to (w.r.t.) a probability distribution P is denoted by

E_{P} {\cdot}

. Again, the subscript is omitted if the underlying probability distribution is clear from the context or explicitly explained in the following text. The indicator function of an event

E

is denoted by

1 {E}

, that is,

1 {E} = 1

if

E

occurs; otherwise,

1 {E} = 0

.

Throughout considerably large parts of the paper, the analysis is carried out w.r.t. joint distributions that involve several random variables. Some of these random variables are induced from empirical distributions of deterministic sequences, while others are ordinary random variables. Random variables from the former kind are denoted with ‘hats’. As a simple example, consider a deterministic sequence,

x^{n}

, that is fed as an input to a memoryless channel defined by a single-letter transition matrix,

{P_{Y | X}, x \in X, y \in Y}

, and let

y^{n}

denote a realization of the corresponding channel output. Let

P_{\hat{X} \hat{Y}} (x, y) = \frac{1}{n} \sum_{i = 1}^{n} 1 {x_{i} = x, y_{i} = y}

denote the joint empirical distribution induced from

(x^{n}, y^{n})

. In addition to

P_{\hat{X} \hat{Y}} (x, y)

, we also define

P_{\hat{X} Y} (x, y) = E {P_{\hat{X} \hat{Y}} (x, y)}

, where now Y is an ordinary random variable. Clearly, the relation between the two distributions is given by

P_{\hat{X} Y} (x, y) = P_{\hat{X}} (x) \cdot P_{Y | X} (y | x)

, where

P_{\hat{X}} (x) = \sum_{y} P_{\hat{X} \hat{Y}} (x, y)

is the empirical marginal of

\hat{X}

. Such mixed joint distributions underlie certain information-theoretic quantities, for example,

I (\hat{X}; Y)

and

H (Y | \hat{X})

denote, respectively, the mutual information between

\hat{X}

and Y and the conditional entropy of Y given

\hat{X}

, both induced from

P_{\hat{X} Y}

. The same notation rules are applicable in more involved situations too.

2.2. Definitions and Problem Setting

We consider the system configuration of the degraded wiretap channel, depicted in Figure 1. Let

u = (u_{1}, u_{2}, \dots)

be a deterministic source sequence (i.e., individual sequence), whose symbols take values in a finite alphabet,

U

, of size

α

. This source sequence is to be conveyed reliably to a legitimate decoder while keeping it secret from a wiretapper, as described below. The encoding mechanism is as follows. The source sequence, u is first is divided into chunks of length k,

{\tilde{u}}_{i} = u_{i k + 1}^{i k + k} \in U^{k}

,

i = 0, 1, 2, \dots

, which are fed into a stochastic finite-state encoder, defined by the following equations:

\begin{matrix} \Pr {{\tilde{X}}_{i} = \tilde{x} | {\tilde{u}}_{i} = \tilde{u}, s_{i}^{e} = s} & = P (\tilde{x} | \tilde{u}, s), i = 0, 1, 2, \dots \end{matrix}

(1)

\begin{matrix} s_{i + 1}^{e} & = h ({\tilde{u}}_{i}, s_{i}^{e}), i = 0, 1, 2, \dots . \end{matrix}

(2)

We allow a stochastic encoder, in view of the fact that even the traditional, probabilistic setting, optimal coding for the wiretap channel (see [1] and later works) must be randomized in order to meet the security requirements. Here,

{\tilde{X}}_{i}

is a random vector taking values in

X^{m}

,

X

being the

β

-ary input alphabet of the channel and m being a positive integer;

\tilde{x} \in X^{m}

is a realization of

{\tilde{X}}_{i}

; and

s_{i}^{e}

is the state of the encoder at time i, which designates the memory of the encoder with regard to the past of the source sequence. In other words, at time instant i, whatever the encoder ‘remembers’ from

(u_{1}, u_{2}, \dots, u_{i - 1})

is stored in the variable

s_{i}^{e}

(for example, in the case of trellis coding, it can be a shift register of finite length, say p, that stores the most recent symbols of p,

(u_{i - p}, u_{i - p + 1}, \dots, u_{i - 1})

, or, in block coding, it can be the contents of the current block starting from its beginning up to the present). The state,

s_{i}^{e}

, takes values in a finite set of states,

S^{e}

, of size

q_{e}

. In the above equation, the variable

\tilde{u}

is any member of

U^{k}

. The function

h : U^{k} \times S^{e} \to S^{e}

is called the next-state function of the encoder. (More generally, we could define both

s_{i + 1}^{e}

and

{\tilde{x}}_{i}

to be random functions of the

({\tilde{u}}_{i}, s_{i}^{e})

by a conditional joint distribution,

\Pr {{\tilde{X}}_{i} = \tilde{x}, s_{i + 1}^{e} = s | {\tilde{u}}_{i} = \tilde{u}, s_{i}^{e} = s^{'}}

. However, it makes sense to let the encoder state sequence evolve deterministically in response to the input u since the state designates the memory of the encoder to past inputs.) Finally,

P (\tilde{x} | \tilde{u}, s)

,

\tilde{u} \in U^{k}

,

s \in S^{e}

,

\tilde{x} \in X^{m}

, is a conditional probability distribution function, i.e.,

{P (\tilde{x} | \tilde{u}, s)}

are all non-negative and

\sum_{\tilde{x}} P (\tilde{x} | \tilde{u}, s) = 1

for all

(\tilde{u}, s) \in U^{k} \times S^{e}

. The vector

{\tilde{x}}_{i}

designates the current output vector from the encoder in response to the current input source vector,

{\tilde{u}}_{i}

and its current state,

s_{i}^{e}

. Without loss of generality, we assume that the initial state of the encoder,

s_{0}^{e}

, is some fixed member of

S^{e}

. The ratio

λ \overset{▵}{=} \frac{m}{k}

(3)

is referred to as the bandwidth expansion factor. It should be pointed out that the parameters k and m are fixed integers, which are not necessarily large (e.g.,

k = 2

and

m = 3

are valid values of k and m). The concatenation of the output vectors from the encoder,

{\tilde{x}}_{0}, {\tilde{x}}_{1}, \dots

, is viewed as a sequence chunks of channel input symbols,

x_{1}, x_{2}, \dots

, with

{\tilde{x}}_{i} = x_{i m + 1}^{i m + m}

, similarly to the above-defined partition of the source sequence.

Figure 1. Wiretap channel model. Since the source and the channel may operate at different rates (

λ

channel symbols per source symbol), the time variables associated with source-related sequences and channel-related sequences are denoted differently, i.e., i and t, respectively.

The sequence of encoder outputs,

x_{1}, x_{2}, \dots

, is fed into a discrete memoryless channel (DMC), henceforth referred to as the main channel, whose corresponding outputs,

y_{1}, y_{2}, \dots

, are generated according to

\Pr {Y^{N} = y^{N} | X^{N} = x^{N}} = Q_{M} (y^{N} | x^{N}) = \prod_{i = 1}^{N} Q_{M} (y_{i} | x_{i}),

(4)

for every positive integer N and every

x^{N} \in X^{N}

and

y^{N} \in Y^{N}

. The channel output symbols,

{y_{i}}

, take values in a finite alphabet,

Y

, of size

γ

.

The sequence of channel outputs,

y_{1}, y_{2}, \dots

, is divided into chunks of length m,

{\tilde{y}}_{i} = y_{i m + 1}^{i m + m}

,

i = 0, 1, 2, \dots

, which are fed into a deterministic finite-state decoder, defined according to the following recursive equations:

\begin{matrix} {\tilde{v}}_{i} & = & f ({\tilde{y}}_{i}, s_{i}^{d}) \end{matrix}

(5)

\begin{matrix} s_{i + 1}^{d} & = & g ({\tilde{y}}_{i}, s_{i}^{d}), \end{matrix}

(6)

where the variables in the equations are defined as follows:

{s_{i}^{d}}

is the sequence of states of the decoder (which again, designate the finite memory, this time, at the decoder). Each

s_{i}^{d}

takes values in a finite set,

S^{d}

of size

q_{d}

. The variable

{\tilde{v}}_{i} \in U^{k}

is the i-th chunk of k source reconstruction symbols, i.e.,

{\tilde{v}}_{i} = v_{i k + 1}^{i k + k}

,

i = 0, 1, \dots

, which form the decoder output. The function

f : Y^{m} \times S^{d} \to U^{k}

is called the output function of the decoder and the function

g : Y^{m} \times S^{d} \to S^{d}

is the next-state function of the decoder. The concatenation of the decoder output vectors,

{\tilde{v}}_{0}, {\tilde{v}}_{1}, \dots

, forms the entire stream of reconstruction symbols,

v_{1}, v_{2}, \dots

.

The output of the main channel,

y_{1}, y_{2}, \dots

, is fed into another DMC, henceforth referred to as the wiretap channel, which generates, in response, a corresponding sequence,

z_{1}, z_{2}, \dots

, according to

\Pr {Z^{N} = z^{N} | Y^{N} = y^{N}} = Q_{W} (z^{N} | y^{N}) = \prod_{i = 1}^{N} Q_{W} (z_{i} | y_{i}),

(7)

where

{Z_{i}}

and

{z_{i}}

take values in a finite alphabet

Z

. We denote the cascade of channels

Q_{M}

and

Q_{W}

by

Q_{M W}

, that is

Q_{M W} (z | x) = \sum_{y \in Y} Q_{M} (y | x) Q_{W} (z | y) .

(8)

We seek a communication system

(P, h, f, g)

which satisfies two requirements:

For a given $ϵ_{r} > 0$ , the system satisfies the following reliability requirement: The bit error probability is guaranteed to be less than $ϵ_{r}$ , i.e.,

$P_{b} \overset{▵}{=} \frac{1}{k} \sum_{i = 1}^{k} \Pr {V_{i} \neq u_{i}} \leq ϵ_{r}$

(9)

for every $(u_{1}, \dots, u_{k})$ and every combination of initial states of the encoder and the decoder, where $\Pr {\cdot}$ is defined w.r.t. the randomness of the encoder and the main channel.
For a given $ϵ_{s} > 0$ , the system satisfies the following security requirement: For every sufficiently large positive integer n,

$max_{μ} I_{μ} (U^{n}; Z^{N}) \leq n ϵ_{s},$

(10)

where $N = n λ$ and $I_{μ} (U^{n}; Z^{N})$ is the mutual information between $U^{n}$ and $Z^{N}$ , induced by an input distribution $μ = {μ (u^{n}), u^{n} \in U^{n}}$ and the system, ${P (z^{N} | u^{n}), u^{n} \in U^{n}, z^{N} \in Z^{N}}$ .

As for the reliability requirement, note that the larger k is, the less stringent the requirement becomes. Concerning the security requirement, ideally, we would like to have perfect secrecy, which means that

P (z^{N} | u^{n})

would be independent of

u^{n}

(see also [20]), but it is more realistic to allow a small deviation from this idealization. This security metric is actually the maximum mutual information metric, or equivalently (see [15]) the semantic security, as mentioned in the Introduction.

2.3. Preliminaries and Background

We need two more definitions along with some background associated with them. The first is the secrecy capacity [1,14], which is the supremum of all coding rates for which there exist block codes that maintain both an arbitrarily small error probability at the legitimate decoder and an equivocation arbitrarily close to the unconditional entropy of the source. The secrecy capacity is given by

C_{s} = max_{P_{X}} I (X; Y | Z) = max_{P_{X}} [I (X; Y) - I (X; Z)],

(11)

with

P_{X Y Z} (x, y, z) = P_{X} (x) \times Q_{M} (y | x) Q_{W} (z | y)

for all

(x, y, z) \in X \times Y \times Z

.

The second quantity we need to define is the LZ complexity [22]. In their famous paper [22], Ziv and Lempel actually developed a deterministic counterpart of source coding theory, where instead of imposing assumptions on probabilistic mechanisms that generate the data (i.e., memoryless sources, Markov sources, and general stationary sources), whose relevance to real-world data compression may be subject to dispute, they considered arbitrary, deterministic source sequences (i.e., individual sequences, in their terminology), but they imposed instead a limitation on the resources of the encoder (or the data compression algorithm): they assumed that it has limited storage capability (i.e., limited memory) of past data when encoding the current source symbol. This limited storage was modeled in terms of a finite-state machine, where the state variable of the encoder evolves recursively in time in response to the input and designates the information that the encoder ‘remembers’ from the past input (just like in the model description in Section 2.2 above). As mentioned earlier, a simple example of such a state variable can be the contents of a finite shift register, fed sequentially by the source sequence in which case the state contains a finite number of the most recent source symbols. This individual-sequence approach is appealing, because it is much more realistic to assume practical limitations on the encoder (which is under the control of the system designer) than to make assumptions on the statistics of the data to be compressed.

Ziv and Lempel developed an asymptotically optimal, practical compression algorithm (which is used in almost every computer), that is well known as the Lempel–Ziv (LZ) algorithm. This algorithm has several variants. One of them, which is called the LZ78 algorithm (where ‘78’ designates the year 1978), is based on the notion of incremental parsing: given the the source vector,

u^{n}

, the incremental parsing procedure sequentially parses this sequence into distinct phrases such that each new parsed phrase is the shortest string that has not been obtained before as a phrase, with a possible exception of the last phrase, which might be incomplete. Let

c (u^{n})

denote the number of resulting phrases. For example, if

n = 10

and

u^{10} = (0000110110)

, then incremental parsing (from left to right) yields

(0, 00, 01, 1, 011, 0)

and so,

c (u^{10}) = 6

. We define the LZ complexity of the individual sequence,

u^{n}

, as

ρ_{LZ} (u^{n}) \overset{▵}{=} \frac{c (u^{n}) log c (u^{n})}{n} .

(12)

As was shown by Ziv and Lempel in their seminal paper [22], for large n, the LZ complexity,

ρ_{LZ} (u^{n})

, is essentially the best compression ratio that can be achieved by any information lossless, finite-state encoder (up to some negligibly small terms, for large n), and it can be viewed as the individual-sequence analogue of the entropy rate.

3. Results

Before moving on to present our first main result, a simple comment is in order. Even in the traditional probabilistic setting, given a source with entropy H and a channel with capacity C, reliable communication cannot be accomplished unless

H \leq λ C

, where

λ

is the bandwidth expansion factor. Since both H and C are given and only

λ

is under the control of the system designer, it is natural to state this condition as a lower bound to bandwidth expansion factor, i.e.,

λ \geq H / C

. By the same token, in the presence of a secrecy constraint,

λ

must not fall below

H / C_{s}

. Our converse theorems for individual sequences are presented in the same spirit, where the entropy H at the numerator is replaced by an expression whose main term is the Lempel–Ziv compressibility.

We assume, without essential loss of generality, that k divides n (otherwise, omit the last

(n mod k)

symbols of

u^{n}

and replace n by

k \cdot ⌊ n / k ⌋

without affecting the asymptotic behavior as

n \to \infty

). Our first main result is the following.

Theorem 1.

Consider the problem setting defined in Section 2. If there exists a stochastic encoder with

q_{e}

states and a decoder with

q_{d}

states that together satisfy the reliability constraint (9) and the security constraint (10), then the bandwidth expansion factor λ must be lower bounded as follows.

λ \geq \frac{ρ_{L Z} (u^{n}) - Δ (ϵ_{r}) - ϵ_{s} - ζ_{n} (q_{d}, k)}{C_{s}},

(13)

where

Δ (ϵ_{r}) \overset{▵}{=} h_{2} (ϵ_{r}) + ϵ_{r} \cdot log (α - 1),

(14)

with

h_{2} (ϵ_{r}) = - ϵ_{r} log ϵ_{r} - (1 - ϵ_{r}) log (1 - ϵ_{r})

being the binary entropy function, and

ζ_{n} (q_{d}, k) = min_{{ℓ d i v i d e s n / k}} [\frac{log q_{d} + 1}{k ℓ} + \frac{2 k ℓ {(log α + 1)}^{2}}{(1 - ϵ_{n}) log n} + \frac{2 k ℓ α^{2 k ℓ} log α}{n}],

(15)

with

ϵ_{n} \to 0

as

n \to \infty

.

The proof of Theorem 1, like all other proofs in this article, is deferred to Section 5.

Discussion.

A few comments are in order with regard to Theorem 1.

1.: Irrelevance of $q_{e}$ . It is interesting to note that as far as the encoding and decoding resources are concerned, the lower bound depends on k and $q_{d}$ , but not on the number of states of the encoder, $q_{e}$ . This means that the same lower bound continues to hold, even if the encoder has an unlimited number of states. Pushing this to the extreme, even if the encoder has room to store the entire past, the lower bound of Theorem 1 would remain unaltered. The crucial bottleneck is, therefore, in the finite memory resources associated with the decoder, where the memory may help to reconstruct the source by exploiting empirical dependencies with the past. The dependence on $q_{e}$ , however, appear later when we discuss local randomness resources as well as in the extension to the case of decoder side information.
2.: The redundancy term $ζ_{n} (q_{d}, k)$ . A technical comment is in order concerning the term $ζ_{n} (q_{d}, k)$ , which involves minimization over all divisors of $n / k$ , where we have already assumed that $n / k$ is integer. Strictly speaking, if $n / k$ happens to be a prime, this minimization is not very meaningful, as $ζ_{n} (q_{d}, k)$ would be relatively large. If this the case, a better bound is obtained if one omits some of the last symbols of $u^{n}$ , thereby reducing n to, say, $n^{'}$ so that $n^{'} / k$ has a richer set of factors. Consider, for example, the choice $ℓ = ℓ_{n} = ⌊ \sqrt{log n} ⌋$ (instead of minimizing over ℓ) and replace $n / k$ by the $n / k - (n / k mod ℓ_{n})$ , without essential loss of tightness. This way, $ζ_{n} (q_{d}, k)$ would tend to zero as $n \to \infty$ , for fixed k and $q_{d}$ .
3.: Achievability. Having established that $ζ_{n} (q_{d}, k) \to 0$ , and given that $ϵ_{r}$ and $ϵ_{s}$ are small, it is clear that the main term at the numerator of the lower bound of Theorem 1 is the term $ρ_{LZ} (u^{n})$ , which is, as mentioned earlier, the individual-sequence analogue of the entropy of the source [22]. In other words, $λ$ cannot be much smaller than $λ_{L} (u^{n}) = ρ_{LZ} (u^{n}) / C_{s}$ . A matching achievability scheme would most naturally be based on separation: first apply variable-rate compression of $u^{n}$ to about $n ρ_{LZ} (u^{n})$ bits using the LZ algorithm [22], and then feed the resulting compressed bit-stream into a good code for the wiretap channel [1] with codewords of length about

$N = n λ_{L} (u^{n}) \sim \frac{n ρ_{LZ} (u^{n})}{C_{s} (1 - δ)},$

(16)

where $δ$ is an arbitrarily small (but positive) margin to keep the coding rate strictly smaller than $C_{s}$ . However, to this end, the decoder must know N. One possible solution is that before the actual encoding of each $u^{n}$ , one would use a separate, auxiliary fixed code that encodes the value of the number of compressed bits, $n ρ_{LZ} (u^{n})$ , using $log (n log α)$ bits (as $n log α$ is about the number of possible values that $n ρ_{LZ} (u^{n})$ can take) and protect it using a channel code of rate less than $C_{s} (1 - δ)$ . Since the length of this auxiliary code grows only logarithmically with n (as opposed to the ‘linear’ growth of $n ρ_{LZ} (u^{n})$ ), the overhead in using the auxiliary code is asymptotically negligible. The auxiliary code and the main code are used alternately: first the auxiliary code, and then the main code for each n-tuple of the source. The main channel code is actually an array of codes, one for each possible value of $n ρ_{LZ} (u^{n})$ . Once the auxiliary decoder has decoded this number, the corresponding main decoder is used. Overall, the resulting bandwidth expansion factor is about

$λ \approx \frac{n ρ_{LZ} (u^{n}) + log (n log α)}{n C_{s} (1 - δ)} = \frac{ρ_{LZ} (u^{n})}{C_{s} (1 - δ)} + O (\frac{log n}{n}) .$

(17)

Another, perhaps simpler and better, approach is to use the LZ algorithm in the mode of a variable-to-fixed length code: let the length of the channel codeword, N, be fixed, and start to compress $u = (u_{1}, u_{2}, \dots)$ until obtaining $n ρ_{LZ} (u^{n}) = N \cdot C_{s} (1 - δ)$ compressed bits. Then,

$λ = \frac{N}{n} = \frac{ρ_{LZ} (u^{n})}{C_{s} (1 - δ)} .$

(18)

Of course, these coding schemes require decoder memory that grows exponentially in n, and not just a fixed number, $q_{d}$ , and therefore strictly speaking, there is a gap between the achievability and the converse result of Theorem 2. However, this gap is closed asymptotically, once we take the limit of $q_{d} \to 0$ after the limit $n \to \infty$ , and we consider successive application of these codes over many blocks. The same approach appears also in [17,18,19,22] as well as in later related work.

This concludes the discussion on Theorem 1. □

We next focus on local randomness resources that are necessary when the full secrecy capacity is exploited. Specifically, suppose that the stochastic encoder

{P (\tilde{x} | \tilde{u}, s), \tilde{x} \in X^{n}, \tilde{u} \in U^{k}, s \in S^{e}}

is implemented as a deterministic encoder with an additional input of purely random bits, i.e.,

{\tilde{x}}_{i} = a ({\tilde{u}}_{i}, s_{i}^{e}, {\tilde{b}}_{i}),

(19)

where

{\tilde{b}}_{i} = b_{i j + 1}^{i j + j}

is a string of j purely random bits. The question is the following: how large must j be in order to achieve full secrecy? Equivalently, what is the minimum necessary rate of random bits for local randomness at the encoder for secure coding at the maximum reliable rate? In fact, this question may be interesting on its own right, regardless of the individual-sequence setting and finite-state encoders and decoders, but even for ordinary block coding (which is the special case of

q_{e} = q_{d} = 1

) and in the traditional probabilistic setting. The following theorem answers this question.

Theorem 2.

Consider the problem setting defined in Section 2 and let λ meet the lower bound of Theorem 1. If there exists an encoder (19) with

q_{e}

states and a decoder with

q_{d}

states that jointly satisfy the reliability constraint (9) and the security constraint (10), then

j \geq m I (X^{*}; Z^{*}) - k ϵ_{s} - \frac{log q_{e}}{ℓ}

(20)

where

X^{*}

is the random variable that achieves

C_{s}

and ℓ is the achiever of

ζ_{n} (q_{d}, k)

.

Note that the lower bound of Theorem 2 depends on

q_{e}

, as opposed to Theorem 1, where it depends only on

q_{d}

. Since

ϵ_{s}

is assumed small and

ℓ \to \infty

, it is clear that main term is

m I (X^{*}; Z^{*})

, i.e., the bit rate must be essentially at least as large as

I (X^{*}; Z^{*})

random bits per channel use, or equivalently,

λ I (X^{*}; Z^{*})

bits per source symbol. It is interesting to note that Wyner’s code [1] asymptotically achieves this bound when the coding rate saturates the secrecy capacity because the subcode that can be decoded by the wiretapper (within each given bin) is of the rate of about

I (X^{*}; Z^{*})

, and it encodes just the bits of the local randomness. So when working at the full secrecy capacity, Wyner’s code is optimal not only in terms of the optimal trade-off between reliability and security, but also in terms of minimum consumption of local, purely random bits.

4. Side Information at the Decoder with Partial Leakage to the Wiretapper

Consider next an extension of our model to the case where there are side information sequences,

w^{n} = (w_{1}, \dots, w_{n})

and

{\dot{w}}^{n} = ({\dot{w}}_{1}, \dots, {\dot{w}}_{n})

, available to the decoder and the wiretapper, respectively; see Figure 2. For the purpose of a converse theorem, we assume that

w^{n}

is available to the encoder too, whereas in the achievability part, we comment also on the case where it is not. We assume that

w^{n}

is a deterministic sequence, but

{\dot{w}}^{n}

is a realization of a random vector

{\dot{W}}^{n} = ({\dot{W}}_{1}, \dots, {\dot{W}}_{n})

, which is a noisy version of

w^{n}

. In other words, it is generated from

w^{n}

by another memoryless channel,

Q_{{\dot{W}}^{n} | W^{n}} ({\dot{w}}^{n} | w^{n}) = \prod_{i = 1}^{n} Q_{\dot{W} | W} ({\dot{w}}_{i} | w_{i})

. The symbols of

{w_{i}}

and

{{\dot{w}}_{i}}

take values in finite alphabets,

W

and

\dot{W}

, respectively. There are two extreme important special cases: (i)

{\dot{W}}^{n} = w^{n}

almost surely, which is the case of totally insecure side information that fully leaks to the wiretapper, and (ii)

{\dot{W}}^{n}

is degenerated (or independent of

w^{n}

), which is the case of secure side information with no leakage to the wiretapper. Every intermediate situation between these two extremes is a situation of partial leakage. The finite-state encoder model is now re-defined according to

\begin{matrix} \Pr {{\tilde{X}}_{i} = \tilde{x} | {\tilde{u}}_{i} = \tilde{u}, {\tilde{w}}_{i} = \tilde{w}, s_{i}^{e} = s} & = & P (\tilde{x} | \tilde{u}, \tilde{w}, s), i = 0, 1, 2, \dots \end{matrix}

(21)

\begin{matrix} s_{i + 1}^{e} & = & h ({\tilde{u}}_{i}, {\tilde{w}}_{i}, s_{i}^{e}), i = 0, 1, 2, \dots, \end{matrix}

(22)

where

{\tilde{w}}_{i} = w_{i k + 1}^{i k + k}

,

i = 0, 1, \dots, n / k - 1

. Likewise, the decoder is given by

\begin{matrix} {\tilde{v}}_{i} & = & f ({\tilde{y}}_{i}, {\tilde{w}}_{i}, s_{i}^{d}) \end{matrix}

(23)

\begin{matrix} s_{i + 1}^{d} & = & g ({\tilde{y}}_{i}, {\tilde{w}}_{i}, s_{i}^{d}), \end{matrix}

(24)

and the wiretapper has access to

Z^{N}

and

{\dot{W}}^{n}

. Accordingly, the security constraint is modified as follows: for a given

ϵ_{s} > 0

and for every sufficiently large n,

max_{μ} I_{μ} (U^{n}; Z^{N} | {\dot{W}}^{n}) \leq n ϵ_{s},

(25)

where

I_{μ} (U^{n}; Z^{N} | {\dot{W}}^{n})

is the conditional mutual information between

U^{n}

and

Z^{N}

given

{\dot{W}}^{n}

, induced by

μ = {μ (u^{n}, {\dot{w}}^{n}), u^{n} \in U^{n}, {\dot{w}}^{n} \in {\dot{W}}^{n}}

and the system,

{P (z^{N} | u^{n}), u^{n} \in U^{n}, z^{N} \in Z^{N}}

, where

μ (u^{n}, {\dot{w}}^{n}) = \sum_{w^{n}} μ (u^{n}, w^{n}) Q_{{\dot{W}}^{n} | W^{n}} ({\dot{w}}^{n} | w^{n})

.

Figure 2. Wiretap channel model with side information.

In order to present the extension of Theorem 1 to incorporate side information, we first need to define the extension of the LZ complexity to include side information, namely, to define the conditional LZ complexity (see also [23]). Given

u^{n}

and

w^{n}

, let us apply the incremental parsing procedure of the LZ algorithm to the sequence of pairs

((u_{1}, w_{1}), (u_{2}, w_{2}), \dots, (u_{n}, w_{n}))

. According to this procedure, all phrases are distinct with a possible exception of the last phrase, which might be incomplete. Let

c (u^{n}, w^{n})

denote the number of distinct phrases. As an example (which appears also in [23]), if

\begin{matrix} u^{6} & = & 0 | 1 | 0 0 | 0 1 | \\ w^{6} & = & 0 | 1 | 0 1 | 0 1 | \end{matrix}

then

c (u^{6}, w^{6}) = 4

. Let

c (w^{n})

denote the resulting number of distinct phrases of

w^{n}

, and let

w (l)

denote the l-th distinct w-phrase,

l = 1, 2, . . ., c (w^{n})

. In the above example,

c (w^{6}) = 3

. Denote by

c_{l} (u^{n} | w^{n})

the number of occurrences of

w (l)

in the parsing of

w^{n}

, or equivalently, the number of distinct u-phrases that jointly appear with

w (l)

. Clearly,

\sum_{l = 1}^{c (w^{n})} c_{l} (u^{n} | w^{n}) = c (u^{n}, w^{n})

. In the above example,

w (1) = 0

,

w (2) = 1

,

w (3) = 01

,

c_{1} (u^{6} | w^{6}) = c_{2} (u^{6} | w^{6}) = 1

, and

c_{3} (u^{6} | w^{6}) = 2

. Now, the conditional LZ complexity of

u^{n}

given

w^{n}

is defined as

ρ_{L Z} (u^{n} | w^{n}) \overset{▵}{=} \frac{1}{n} \sum_{l = 1}^{c (w^{n})} c_{l} (u^{n} | w^{n}) log c_{l} (u^{n} | w^{n}) .

(26)

We are now ready to present the main result of this section.

Theorem 3.

Consider the problem setting defined in Section 2 along with the above–mentioned modifications to incorporate side information. If there exists a stochastic encoder with

q_{e}

states and a decoder with

q_{d}

states that together satisfy the reliability constraint (9) and the security constraint (25), then its bandwidth expansion factor λ must be lower bounded as follows.

λ \geq \frac{ρ_{L Z} (u^{n} | w^{n}) - Δ (ϵ_{r}) - ϵ_{s} - η_{n} (q_{e} \cdot q_{d}, k)}{C_{s}},

(27)

where

η_{n} (q_{e} \cdot q_{d}, k) = min_{{ℓ d i v i d e s n / k}} [\frac{log (q_{d} q_{e}) + 1}{k ℓ} + \frac{log (4 A^{2})}{(1 - ϵ_{n}) log n} + \frac{A^{2} log (4 A^{2})}{n}],

(28)

with

ϵ_{n} \to 0

as

n \to \infty

and

A = [{(α ω)}^{k ℓ + 1} - 1] / [α ω - 1]

, ω being the size of

W

.

Note that the lower bound of Theorem 3 does not depend on the noisy side information at the wiretapper or on the channel

Q_{\dot{W} | W}

that generates it from

w^{n}

. It depends only on

u^{n}

and

w^{n}

in terms of the data available in the system. Clearly, as it is a converse theorem, if it allows the side information to be available also at the encoder, then it definitely applies also to the case where the encoder does not have access to

w^{n}

. Interestingly, the encoder and the legitimate decoder act as if the wiretapper has the clean side information,

w^{n}

. While it is quite obvious that protection against availability of

w^{n}

at the wiretapper is sufficient for protection against availability of

{\dot{W}}^{n}

(as

{\dot{W}}^{n}

is a degraded version of

w^{n}

), it is not quite trivial that this should be also necessary, as the above converse theorem asserts. It is also interesting to note that here, the bound depends also on

q_{e}

, and not only

q_{d}

, as in Theorem 1. However, this dependence on

q_{e}

disappears in the special case where

{\dot{W}}^{n} = w^{n}

with probability one.

We next discuss the achievability of the lower bound of Theorem 3. If the encoder has access to

w^{n}

, then the first step would be to apply the conditional LZ algorithm (see ([23], proof of Lemma 2) [24]), thus compressing

u^{n}

to about

n ρ_{LZ} (u^{n} | w^{n})

bits. The second step would be good channel coding for the wiretap channel, using the same methods as described in the previous section. If, however, the encoder does not have access to

w^{n}

, the channel coding part is still as before, but the situation with the source coding part is somewhat more involved since neither the encoder nor the decoder can calculate the target bit rate,

ρ_{LZ} (u^{n} | w^{n})

, as neither party has access to both

u^{n}

and

w^{n}

. However, this source coding rate can essentially be achieved, provided that there is a low-rate noiseless feedback channel from the legitimate decoder to the encoder. The following scheme is in the spirit of the one proposed by Draper [25], but with a few modifications.

The encoder implements random binning for all source sequences in

U^{n}

, that is, for each member of

U^{n}

an index is drawn independently, under the uniform distribution over

{0, 1, 2, \dots, α^{n} - 1}

, which is represented by its binary expansion,

b (u^{n})

, of length

n log α

bits. We select a large positive integer r, but keep

r ≪ n

(say,

r = \sqrt{n}

or

r = {log}^{2} n

). The encoder transmits the bits of

b (u^{n})

incrementally, r bits at a time, until it receives from the decoder ACK. Each chunk of r bits is fed into a good channel code for the wiretap channel, at a rate slightly less than

C_{s}

. At the decoder side, this channel code is decoded (correctly, with high probability, for large r). Then, for each i (

i = 1, 2, \dots

), after having decoded the i-th chunk of r bits of

b (u^{n})

, the decoder creates the list

A_{i} (u^{n}) = {{\dot{u}}^{n} : {[b ({\dot{u}}^{n})]}^{i r} = {[b (u^{n})]}^{i r}}

, where

{[b ({\dot{u}}^{n})]}^{l}

denotes the string formed by the first l bits of

b ({\dot{u}}^{n})

. For each

{\dot{u}}^{n} \in A_{i} (u^{n})

, the decoder calculates

ρ_{LZ} ({\dot{u}}^{n} | w^{n})

. We fix an arbitrarily small

δ > 0

, which controls the trade-off between error probability and compression rate. If

n ρ_{LZ} ({\dot{u}}^{n} | w^{n}) \leq i \cdot r - n δ

for some

{\dot{u}}^{n} \in A_{i} (u^{n})

, the decoder sends ACK on the feedback channel and outputs the reconstruction,

{\dot{u}}^{n}

, with the smallest

ρ_{LZ} ({\dot{u}}^{n} | w^{n})

among all members of

A_{i} (u^{n})

. If no member of

A_{i} (u^{n})

satisfies

n ρ_{LZ} ({\dot{u}}^{n} | w^{n}) \leq i \cdot r - n δ

, the receiver waits for the next chunk of r compressed bits, and it does not send ACK. The probability of source-coding error after the i-th chunk is upper bounded by

\begin{matrix} P_{e} (i) & \overset{(a)}{\leq} & | {{\dot{u}}^{n} \neq u^{n} : n ρ_{LZ} ({\dot{u}}^{n} | w^{n}) \leq i \cdot r - n δ} | \cdot 2^{- i \cdot r} \\ \overset{(b)}{\leq} & {exp}_{2} \{i \cdot r - n δ + O (\frac{n log (log n)}{log n})\} \cdot 2^{- i \cdot r} \\ = & {exp}_{2} \{- n δ + O (\frac{n log (log n)}{log n})\} \\ \to & 0 a s n \to \infty, \end{matrix}

(29)

where in (a), the factor

2^{- i \cdot r}

is the probability that

{[b ({\dot{u}}^{n})]}^{i r} = {[b (u^{n})]}^{i r}

for each member of the set

{{\dot{u}}^{n} \neq u^{n} : n ρ_{LZ} ({\dot{u}}^{n} | w^{n}) \leq i \cdot r - n δ}

and (b) is based on ([23], Equation (A.13)). Clearly, it is guaranteed that an ACK is received at the encoder (and hence the transmission stops), no later than after the transmission of chunk no.

i^{*}

, where

i^{*}

is the smallest integer i such that

i \cdot r \geq n ρ_{LZ} (u^{n} | w^{n}) + n δ

, namely,

i^{*} = ⌈ [n ρ_{LZ} (u^{n} | w^{n}) + n δ] / r ⌉

, which is the stage at which at least the correct source sequence begins to satisfy the condition

n ρ_{LZ} (u^{n} | w^{n}) \leq i \cdot r - n δ

. Therefore, the compression ratio is no worse than

i^{*} \cdot r / n = ⌈ n [ρ_{LZ} (u^{n} | w^{n}) + δ] / r ⌉ \cdot r / n \leq ρ_{LZ} (u^{n} | w^{n}) + δ + r / n

. The overall probability of source-coding error is then upper bounded by

P_{e} = \Pr ⋃_{i = 1}^{i^{*}} {error at state i} \leq \sum_{i = 1}^{i^{*}} P_{e} (i) \leq (\frac{n log α}{r} + 1) \cdot {exp}_{2} \{- n δ + O (\frac{n log (log n)}{log n})\},

(30)

which still tends to zero as

n \to \infty

. As for channel-coding errors, the probability that at least one chunk is decoded incorrectly is upper bounded by

(\frac{n log α}{r} + 1) \cdot e^{- r E}

, where E is an achievable error exponent of channel coding at the given rate. Thus, if r grows at any rate faster than logarithmic, but sub-linearly in n, then the overall channel-coding error probability tends to zero and, at the same time, the compression redundancy,

r / n

, tends to zero too.

To show that the security constraint (25) is satisfied too, consider an arbitrary assignment

μ

of random vectors

(U^{n}, W^{n})

, and let us denote by B the string of

I (X^{N}; Z^{N}) - N ϵ

bits of local randomness in Wyner’s code [1]. Then,

\begin{matrix} I (X^{N}; Z^{N}) & = & H (Z^{N}) - H (Z^{N} | X^{N}) \\ \overset{(a)}{\geq} & H (Z^{N}) - H (Z^{N} | U^{n}, B) \\ \overset{(b)}{\geq} & H (Z^{N} | {\dot{W}}^{n}) - H (Z^{N} | U^{n}, B) \\ \overset{(c)}{=} & H (Z^{N} | {\dot{W}}^{n}) - H (Z^{N} | U^{n}, B, {\dot{W}}^{n}) \\ = & I (U^{n}, B; Z^{N} | {\dot{W}}^{n}) \\ = & H (U^{n}, B | {\dot{W}}^{n}) - H (U^{n}, B | Z^{n}, {\dot{W}}^{n}) \\ = & H (U^{n} | {\dot{W}}^{n}) + H (B | U^{n}, {\dot{W}}^{n}) - H (U^{n} | Z^{N}, {\dot{W}}^{n}) - H (B | Z^{N}, {\dot{W}}^{n}, U^{n}) \\ \overset{(d)}{=} & H (U^{n} | {\dot{W}}^{n}) + H (B) - H (U^{n} | Z^{N}, {\dot{W}}^{n}) - H (B | Z^{N}, {\dot{W}}^{n}, U^{n}) \\ \overset{(e)}{\geq} & H (U^{n} | {\dot{W}}^{n}) + H (B) - H (U^{n} | Z^{N}, {\dot{W}}^{n}) - H (B | Z^{N}, U^{n}) \\ \overset{(f)}{\geq} & H (U^{n} | {\dot{W}}^{n}) + [I (X^{N}; Z^{N}) - N ϵ] - H (U^{n} | Z^{N}, {\dot{W}}^{n}) - n δ_{n} \\ = & I (X^{N}; Z^{N}) + I_{μ} (U^{n}; Z^{N} | {\dot{W}}^{n}) - n (λ ϵ + δ_{m}), \end{matrix}

(31)

where (a) is due to

(U^{n}, B) \to X^{N} \to Z^{N}

being a Markov chain, (b) is due to conditioning reducing entropy, (c) is due to

{\dot{W}}^{n} \to (U^{n}, B) \to Z^{N}

being a Markov chain, (d) is due to B being independent of

(U^{n}, {\dot{W}}^{n})

, (e) is due to conditioning reducing entropy, and (f) is due to, in Wyner coding, B being able to be reliably decoded given that

(Z^{N}, U^{n})

(

δ_{n}

is understood to be small, and recall that

W^{n}

is not needed in the channel decoding phase, but only in the Slepian–Wold decoding phase), and that the length of B is chosen to be

I (X^{N}; Z^{N}) - N ϵ

. Comparing the right-most side to the left-most side, we readily obtain

I_{μ} (U^{n}; Z^{N} | {\dot{W}}^{n}) \leq n (λ ϵ + δ_{n}),

(32)

which can be made arbitrarily small.

5. Proofs

We begin this section by establishing more notation conventions to be used throughout all proofs.

Let

n ≫ k

be a positive integer and let ℓ be such that

K \overset{▵}{=} ℓ \cdot k

divides n. Consider the partition of

u^{n}

into

n / K

non-overlapping blocks of length K,

\begin{matrix} ({\tilde{u}}_{0}, {\tilde{u}}_{1}, \dots, {\tilde{u}}_{ℓ - 1}), ({\tilde{u}}_{ℓ}, {\tilde{u}}_{ℓ + 1}, \dots, {\tilde{u}}_{2 ℓ - 1}), \dots, ({\tilde{u}}_{n / k - ℓ}, {\tilde{u}}_{n / k - ℓ + 1}, {\tilde{u}}_{n / k - 1}) \\ = & (u_{1}^{K}, u_{K + 1}^{2 K}, \dots, u_{n - K + 1}^{n}) \end{matrix}

(33)

and apply the same partition to

v^{n}

. The corresponding channel input and output sequences are of length

N = n λ

. Let

M = ℓ \cdot m = K λ

and consider the parallel partition of the channels input and output sequences according to

\begin{matrix} ({\tilde{x}}_{0}, {\tilde{x}}_{1}, \dots, {\tilde{x}}_{ℓ - 1}), ({\tilde{x}}_{ℓ}, {\tilde{x}}_{ℓ + 1}, \dots, {\tilde{x}}_{2 ℓ - 1}), \dots, ({\tilde{x}}_{N / m - ℓ}, {\tilde{x}}_{N / m - ℓ + 1}, \dots, {\tilde{x}}_{N / m - 1}) \\ ({\tilde{y}}_{0}, {\tilde{y}}_{1}, \dots, {\tilde{y}}_{ℓ - 1}), ({\tilde{y}}_{ℓ}, {\tilde{y}}_{ℓ + 1}, \dots, {\tilde{y}}_{2 ℓ - 1}), \dots, ({\tilde{y}}_{N / m - ℓ}, {\tilde{y}}_{N / m - ℓ + 1}, \dots, {\tilde{y}}_{N / m - 1}) \\ ({\tilde{z}}_{0}, {\tilde{z}}_{1}, \dots, {\tilde{z}}_{ℓ - 1}), ({\tilde{z}}_{ℓ}, {\tilde{z}}_{ℓ + 1}, \dots, {\tilde{z}}_{2 ℓ - 1}), \dots, ({\tilde{z}}_{N / m - ℓ}, {\tilde{z}}_{N / m - ℓ + 1}, \dots, {\tilde{z}}_{N / m - 1}) . \end{matrix}

(34)

For the sake of brevity, we henceforth denote

({\tilde{u}}_{i ℓ}, \dots, {\tilde{u}}_{(i + 1) ℓ - 1})

by

{\tilde{u}}_{i ℓ}^{(i + 1) ℓ - 1}

and use the same notation rule for all other sequences. Next, define the joint empirical distribution

\begin{matrix} P_{{\hat{U}}^{K} {\hat{X}}^{M} {\hat{Y}}^{M} {\hat{Z}}^{M} {\hat{S}}^{e} {\hat{S}}^{d}} (u^{K}, x^{M}, y^{M}, z^{M}, s^{e}, s^{d}) = \\ \frac{K}{n} \sum_{i = 0}^{n / K - 1} δ {{\tilde{u}}_{i ℓ}^{(i + 1) ℓ - 1} = u^{K}, {\tilde{x}}_{i ℓ}^{(i + 1) ℓ - 1} = x^{M}, {\tilde{y}}_{i ℓ}^{(i + 1) ℓ - 1} = y^{M}, \\ {\tilde{z}}_{i ℓ}^{(i + 1) ℓ - 1} = z^{M}, s_{i ℓ + 1}^{e} = s^{e}, s_{i ℓ + 1}^{d} = s^{d}}, \end{matrix}

(35)

and

P_{{\hat{U}}^{K} X^{M} Y^{M} Z^{M} {\hat{S}}^{e} S^{d}} (u^{K}, x^{M}, y^{M}, z^{M}, s^{e}, s^{d}) = E \{P_{{\hat{U}}^{K} {\hat{X}}^{M} {\hat{Y}}^{M} {\hat{Z}}^{M} {\hat{S}}^{e} {\hat{S}}^{d}} (u^{K}, x^{M}, y^{M}, z^{M}, s^{e}, s^{d})\},

(36)

where the expectation is w.r.t. both the randomness of the encoder and the randomness of both channels. Note that

P_{{\hat{U}}^{K} X^{M} Y^{M} Z^{M} {\hat{S}}^{e}} (u^{K}, x^{M}, y^{M}, z^{M}, s^{e}) = P_{{\hat{U}}^{K} {\hat{S}}^{e}} (u^{K}, s^{e}) P (x^{M} | u^{K}, s^{e}) Q_{M} (y^{M} | x^{M}) Q_{W} (z^{M} | y^{M}) .

(37)

where

\begin{matrix} P (x^{M} | u^{K}, s^{e}) & = & \prod_{j = 0}^{ℓ - 1} P ({\tilde{x}}_{j} | {\tilde{u}}_{j}, s_{j}^{e}), s_{0}^{e} = s^{e} \end{matrix}

(38)

\begin{matrix} Q_{M} (y^{M} | x^{M}) & = & \prod_{j = 0}^{M - 1} Q_{M} (y_{i} | x_{i}) \end{matrix}

(39)

\begin{matrix} Q_{W} (z^{M} | y^{M}) & = & \prod_{j = 0}^{M - 1} Q_{M} (z_{i} | y_{i}) . \end{matrix}

(40)

Note also that the bit error probability (in the absence of side information) under this distribution is

\begin{matrix} \frac{1}{K} E {d_{H} ({\hat{U}}^{K}, f (Y^{M}, S^{d}))} \\ = & \frac{1}{K} \sum_{u^{K}, y^{M}, s^{e}, s^{d}} P_{{\hat{U}}^{K} Y^{M} S^{e} S^{d}} (u^{K}, y^{M}, s^{e}, s^{d}) d_{H} (u^{K}, f (y^{M}, s^{d})) \\ = & \frac{1}{K} \sum_{u^{K}, y^{M}, s^{d}} \frac{K}{n} \sum_{i = 0}^{n / K - 1} E [δ {{\tilde{u}}_{i ℓ}^{(i + 1) ℓ - 1} = u^{K}, s_{i ℓ + 1}^{e} = s^{e}, {\tilde{y}}_{i ℓ}^{(i + 1) ℓ - 1} = y^{M}, s_{i ℓ + 1}^{d} = s^{d}] \times \\ d_{H} (u^{K}, f (y^{M}, s^{d})) \\ = & \frac{1}{n} \sum_{i = 0}^{n / K - 1} \sum_{y^{M}, s^{d}} F (y^{M}, s^{d} | u_{i K + 1}^{i K + K}, s_{i ℓ + 1}^{e}) d_{H} (u_{i K + 1}^{i K + K}, f (y^{M}, s^{d})) \\ = & \frac{1}{n} \sum_{i = 1}^{n} E {d_{H} (u_{i}, V_{i})}, \end{matrix}

(41)

where

f (Y^{M}, S^{d})

is induced by ℓ successive applications of the decoder output function with inputs

Y^{m}, Y_{m + 1}^{2 m}, \dots, Y_{M - m + 1}^{M}

and the initial state

S^{d}

, and where

F (y^{M}, s^{d} | u^{K}, s^{e}) = \sum_{x^{M}} P (x^{M} | u^{K}, s^{e}) Q_{M} (y^{M} | x^{M}) P_{S^{d} | Y^{M}} (s^{d} | y^{M}) .

(42)

5.1. Proof of Theorem 1

Beginning with the reliability constraint, we have

\begin{matrix} I ({\hat{U}}^{K}; Y^{M}, S^{d}) & = & H ({\hat{U}}^{K}) - H ({\hat{U}}^{K} | Y^{M}, S^{d}) \\ = & H ({\hat{U}}^{K}) - H ({\hat{U}}^{K} | Y^{M}) + I (S^{d}; {\hat{U}}^{K} | Y^{M}) \\ \leq & I ({\hat{U}}^{K}; Y^{M}) + H (S^{d} | Y^{M}) \\ \leq & I (X^{M}; Y^{M}) + log q_{d} . \end{matrix}

(43)

On the other hand,

\begin{matrix} I ({\hat{U}}^{K}; Y^{M}, S^{d}) & = & H ({\hat{U}}^{K}) - H ({\hat{U}}^{K} | Y^{M}, S^{d}) \\ \geq & H ({\hat{U}}^{K}) - K Δ (ϵ_{r}), \end{matrix}

(44)

and so,

\begin{matrix} I (X^{M}; Y^{M}) & \geq & H ({\hat{U}}^{K}) - K Δ (ϵ_{r}) - log q_{d} \\ \overset{▵}{=} & K \cdot R (u^{n}, q_{d}, ϵ_{r}) \\ = & M \cdot \frac{R (u^{n}, q_{d}, ϵ_{r})}{λ} . \end{matrix}

(45)

Following [1], we define the function

Γ [R] = max_{{P_{X} : I (X; Y) \geq R}} I (X; Y | Z) = max_{{P_{X} : I (X; Y) \geq R}} [I (X; Y) - I (X; Z)],

(46)

which is monotonically non–increasing and concave ([1], Lemma 1). Regarding the security constraint,

\begin{matrix} H ({\hat{U}}^{K}) - K ϵ_{s} & \overset{(a)}{\leq} & H ({\hat{U}}^{K}) - max_{μ} I_{μ} (U^{K}; Z^{M}) \\ \leq & H ({\hat{U}}^{K}) - I ({\hat{U}}^{K}; Z^{M}) \\ = & H ({\hat{U}}^{K} | Z^{M}) - H ({\hat{U}}^{K} | Y^{M}, Z^{M}, S^{d}) + H ({\hat{U}}^{K} | Y^{M}, Z^{M}, S^{d}) \\ = & H ({\hat{U}}^{K} | Z^{M}) - H ({\hat{U}}^{K} | Y^{M}, Z^{M}) + I (S^{d}; {\hat{U}}^{K} | Y^{M}, Z^{M}) + H ({\hat{U}}^{K} | Y^{M}, Z^{M}, S^{d}) \\ \overset{(b)}{\leq} & I ({\hat{U}}^{K}; Y^{M} | Z^{M}) + log q_{d} + K Δ (ϵ_{r}) \\ \overset{(c)}{\leq} & I (X^{M}; Y^{M} | Z^{M}) + log q_{d} + K Δ (ϵ_{r}) \\ \overset{(d)}{\leq} & \sum_{i = 1}^{M} I (X_{i}; Y_{i} | Z_{i}, Y^{i - 1}) + log q_{d} + K Δ (ϵ_{r}) \\ = & \sum_{i = 1}^{M} \sum_{y^{i - 1}} P_{Y^{i - 1}} (y^{i - 1}) I (X_{i}; Y_{i} | Z_{i}, Y^{i - 1} = y^{i - 1}) + log q_{d} + K Δ (ϵ_{r}) \\ \overset{(e)}{\leq} & M \cdot \frac{1}{M} \sum_{i = 1}^{M} \sum_{y^{i - 1}} P_{Y^{i - 1}} (y^{i - 1}) Γ [I (X_{i}; Y_{i} | Y^{i - 1} = y^{i - 1})] + log q_{d} + K Δ (ϵ_{r}) \\ \overset{(f)}{\leq} & M \cdot Γ [\frac{1}{M} \sum_{i = 1}^{M} \sum_{y^{i - 1}} P_{Y^{i - 1}} (y^{i - 1}) I (X_{i}; Y_{i} | Y^{i - 1} = y^{i - 1})] + log q_{d} + K Δ (ϵ_{r}) \\ = & M \cdot Γ [\frac{1}{M} \sum_{i = 1}^{M} I (X_{i}; Y_{i} | Y^{i - 1})] + log q_{d} + K Δ (ϵ_{r}) \\ = & M \cdot Γ [\frac{1}{M} \sum_{i = 1}^{M} \{H (Y_{i} | Y^{i - 1}) - H (Y_{i} | X_{i}, Y^{i - 1})\}] + log q_{d} + K Δ (ϵ_{r}) \\ = & M \cdot Γ [\frac{1}{M} \sum_{i = 1}^{M} \{H (Y_{i} | Y^{i - 1}) - H (Y_{i} | X_{i})\}] + log q_{d} + K Δ (ϵ_{r}) \\ = & M \cdot Γ [\frac{1}{M} \{H (Y^{M}) - H (Y^{M} | X^{M})\}] + log q_{d} + K Δ (ϵ_{r}) \\ = & M \cdot Γ [\frac{I (X^{M}; Y^{M})}{M}] + log q_{d} + K Δ (ϵ_{r}) \\ \overset{(g)}{\leq} & M \cdot Γ [\frac{R (u^{n}, q_{d}, ϵ_{r})}{λ}] + log q_{d} + K Δ (ϵ_{r}) \\ \leq & M \cdot Γ [\frac{R (u^{n}, q_{d}, ϵ_{r}) - ϵ_{s}}{λ}] + log q_{d} + K Δ (ϵ_{r}), \end{matrix}

(47)

where

P_{Y^{i - 1}} (y^{i - 1}) = \sum_{y_{i}^{M}} P_{Y^{M}} (y^{M})

, (a) is due to the security constraint, (b) follows from Fano’s inequality and the fact that

I (S^{d}; {\hat{U}}^{K} | Y^{M}, Z^{M}) \leq H (S^{d}) \leq log q_{d}

, (c) is due to the data processing inequality and the fact that

{\hat{U}}^{K} \to X^{M} \to Y^{M}

is a Markov chain given

Z^{M}

, (d) is as in ([1], Equation (37)), (e) is by the definition of Wyner’s function

Γ (\cdot)

, (f) is by the concavity of this function, and (g) is by (45) and the decreasing monotonicity of the function

Γ (\cdot)

. Thus,

\frac{R (u^{n}, q_{d}, ϵ_{r}) - ϵ_{s}}{λ} \leq Γ [\frac{R (u^{n}, q_{d}, ϵ_{r}) - ϵ_{s}}{λ}]

(48)

or

\frac{R (u^{n}, q_{d}, ϵ_{r}) - ϵ_{s}}{λ} \leq C_{s}

(49)

which is

R (u^{n}, q_{d}, ϵ_{r}) \leq λ C_{s} + ϵ_{s}

(50)

or, equivalently,

\frac{H ({\hat{U}}^{K})}{K} \leq λ C_{s} + ϵ_{s} + Δ (ϵ_{r}) + \frac{log q_{d}}{K} .

(51)

Finally, we apply the inequality ([20], Equation (18))

\frac{H ({\hat{U}}^{K})}{K} \geq ρ_{LZ} (u^{n}) - \frac{2 K {(log α + 1)}^{2}}{(1 - ϵ_{n}) log n} - \frac{2 K α^{2 K} log α}{n} - \frac{1}{K},

(52)

to obtain

ρ_{LZ} (u^{n}) \leq λ C_{s} + ϵ_{s} + Δ (ϵ_{r}) + ζ_{n} (q_{d}, k),

(53)

which completes the proof of Theorem 1.

5.2. Proof of Theorem 2

Consider the following extension of the joint distribution to include a random variable that represents

{b_{i}}

, as follows:

\begin{matrix} P_{{\hat{U}}^{K} B^{J} X^{M} Y^{M} Z^{M} {\hat{S}}^{e} S^{d}} (u^{K}, b^{J}, x^{M}, y^{M}, z^{M}, s^{e}, s^{d}) = \\ \frac{K}{n} \sum_{i = 0}^{n / K - 1} E [δ {{\tilde{u}}_{i ℓ}^{(i + 1) ℓ - 1} = u^{K}, {\tilde{b}}_{i ℓ}^{(i + 1) ℓ - 1} = b^{J}, {\tilde{x}}_{i ℓ}^{(i + 1) ℓ - 1} = x^{M}, {\tilde{y}}_{i ℓ}^{(i + 1) ℓ - 1} = y^{M}, \\ {\tilde{z}}_{i ℓ}^{(i + 1) ℓ - 1} = z^{M}, s_{i ℓ + 1}^{e} = s^{e}, s_{i ℓ + 1}^{d} = s^{d}}], \end{matrix}

(54)

where

J = j ℓ

and

{\tilde{b}}_{i ℓ}^{(i + 1) ℓ - 1} = ({\tilde{b}}_{i ℓ}, {\tilde{b}}_{i ℓ + 1}, \dots, {\tilde{b}}_{(i + 1) ℓ - 1})

. Next, consider the following chain of inequalities

\begin{matrix} K ϵ_{s} & \geq & max_{μ} I_{μ} (U^{K}; Z^{M}) \\ \geq & I ({\hat{U}}^{K}; Z^{M}) \\ = & I ({\hat{U}}^{K}, B^{J}, S^{e}; Z^{M}) - I (B^{J}, S^{e}; Z^{M} | {\hat{U}}^{K}) \\ \overset{(a)}{=} & I (X^{M}; Z^{M}) - I (B^{J}, S^{e}; Z^{M} | {\hat{U}}^{K}) \\ \geq & I (X^{M}; Z^{M}) - H (B^{J}, S^{e} | {\hat{U}}^{K}) \\ \geq & I (X^{M}; Z^{M}) - H (B^{J}, S^{e}) \\ \geq & I (X^{M}; Z^{M}) - H (B^{J}) - H (S^{e}) \\ \geq & I (X^{M}; Z^{M}) - J - log q_{e}, \end{matrix}

(55)

where (a) is due to the fact that, on the one hand,

X^{M}

is a deterministic function of

({\hat{U}}^{K}, B^{J}, S^{e})

, which implies that

I ({\hat{U}}^{K}, B^{J}, S^{e}; Z^{M}) \geq I (X^{M}; Z^{M})

, but on the other hand,

({\hat{U}}^{K}, B^{J}, S^{e}) \to X^{M} \to Z^{M}

is a Markov chain and so,

I ({\hat{U}}^{K}, B^{J}, S^{e}; Z^{M}) \leq I (X^{M}; Z^{M})

, hence the equality. Thus,

J \geq I (X^{M}; Z^{M}) - K ϵ_{s} - log q_{e},

(56)

or

j \geq \frac{I (X^{M}; Z^{M})}{ℓ} - k ϵ_{s} - \frac{log q_{e}}{ℓ} \geq \frac{m I (X^{M}; Z^{M})}{M} - k ϵ_{s} - \frac{log q_{e}}{ℓ} .

(57)

The meaning of this result is the following: once one finds a communication system that complies with both the security constraint and the reliability constraint, then the amount of local randomization is lower bounded in terms of the induced mutual information,

I (X^{M}; Z^{M})

, as above. By the hypothesis of Theorem 2, the secrecy capacity is saturated, and hence

P_{X^{M}}

must coincide with the product distribution,

{[P_{X^{*}}]}^{M}

, yielding

I (X^{M}; Z^{M}) / M = I (X^{*}; Z^{*})

. Thus,

j \geq m I (X^{*}; Z^{*}) - k ϵ_{s} - \frac{log q_{e}}{ℓ} .

(58)

This completes the proof of Theorem 2.

5.3. Outline of the Proof of Theorem 3

The proof follows essentially the same steps as those of the proof of Theorem 1, except that everything should be conditioned on the side information, but there are also some small twists. We, therefore, only provide a proof outline and highlight the differences.

The auxiliary joint distribution is now extended to read

\begin{matrix} P_{{\hat{U}}^{K} {\hat{W}}^{K} {\dot{W}}^{K} X^{N} Y^{N} Z^{N} {\hat{S}}^{e} S^{d}} (u^{K}, w^{K}, {\dot{w}}^{K}, x^{N}, y^{N}, z^{N}, s^{e}, s^{d}) = \\ \frac{K}{m} \sum_{i = 0}^{m / K - 1} E [δ {{\tilde{u}}_{i ℓ}^{(i + 1) ℓ - 1} = u^{K}, {\tilde{w}}_{i ℓ}^{(i + 1) ℓ - 1} = w^{K}, {\tilde{\dot{w}}}_{i ℓ}^{(i + 1) ℓ - 1} = {\dot{w}}^{K}, {\tilde{x}}_{i ℓ}^{(i + 1) ℓ - 1} = x^{M}, \\ {\tilde{y}}_{i ℓ}^{(i + 1) ℓ - 1} = y^{M}, {\tilde{z}}_{i ℓ}^{(i + 1) ℓ - 1} = z^{M}, s_{i ℓ + 1}^{e} = s^{e}, s_{i ℓ + 1}^{d} = s^{d}}] . \end{matrix}

(59)

Note that

\begin{matrix} P_{{\hat{U}}^{K} {\hat{W}}^{K} {\dot{W}}^{K} Z^{M} {\hat{S}}^{e}} (u^{k}, w^{k}, {\dot{w}}^{k}, z^{M}, s^{e}) \\ = & \frac{K}{n} \sum_{i = 0}^{n / K - 1} E [δ {{\tilde{u}}_{i ℓ}^{(i + 1) ℓ - 1} = u^{K}, {\tilde{w}}_{i ℓ}^{(i + 1) ℓ - 1} = w^{K}, {\tilde{\dot{w}}}_{i ℓ}^{(i + 1) ℓ - 1} = {\dot{w}}^{K}, {\tilde{z}}_{i ℓ}^{(i + 1) ℓ - 1} = z^{M}, s_{i ℓ + 1}^{e} = s^{e}}] \\ = & \frac{K}{n} \sum_{i = 0}^{n / K - 1} δ {{\tilde{u}}_{i ℓ}^{(i + 1) ℓ - 1} = u^{K}, {\tilde{w}}_{i ℓ}^{(i + 1) ℓ - 1} = w^{K}, s_{i ℓ + 1}^{e} = s^{e}} \cdot Q_{{\dot{W}}^{K} | W^{K}} ({\dot{w}}^{K} | w^{K}) \cdot G (z^{M} | u^{K}, s^{e}) \\ = & P_{{\hat{U}}^{K} {\hat{W}}^{K} {\hat{S}}^{e}} (u^{K}, w^{K}, s^{e}) \cdot Q_{{\dot{W}}^{K} | W^{K}} ({\dot{w}}^{K} | w^{K}) \cdot G (z^{M} | u^{K}, s^{e}), \end{matrix}

(60)

where

G (z^{M} | u^{K}, s^{e}) = \sum_{x^{M}} P (x^{M} | u^{K}, s^{e}) Q_{M W} (z^{M} | x^{M}) .

(61)

It follows that

{\dot{W}}^{K} \to {\hat{W}}^{K} \to ({\hat{U}}^{K}, {\hat{S}}^{e}) \to Z^{M}

is a Markov chain under

P_{{\hat{U}}^{K} {\hat{W}}^{K} {\dot{W}}^{K} Z^{M} {\hat{S}}^{e}}

. In other words, the legitimate decoder has side information of better quality than that of the wiretapper. First, observe that

\begin{matrix} I_{μ} (U^{n}; Z^{N} | W^{n}) & = & H_{μ} (Z^{N} | W^{n}) - H_{μ} (Z^{N} | W^{n}, U^{n}) \\ \leq & H_{μ} (Z^{N} | W^{n}) - H_{μ} (Z^{N} | W^{n}, U^{n}, S^{e}) \\ = & H_{μ} (Z^{N} | W^{n}) - H_{μ} (Z^{N} | U^{n}, S^{e}) \\ \leq & H_{μ} (Z^{N} | {\dot{W}}^{n}) - H_{μ} (Z^{N} | U^{n}, S^{e}) \\ = & H_{μ} (Z^{N} | {\dot{W}}^{n}) - H_{μ} (Z^{N} | {\dot{W}}^{n}, U^{n}, S^{e}) \\ \leq & H_{μ} (Z^{N} | {\dot{W}}^{n}) - H_{μ} (Z^{N} | {\dot{W}}^{n}, U^{n}) + log q_{e} \\ = & I_{μ} (U^{n}; Z^{N} | {\dot{W}}^{n}) + log q_{e} . \end{matrix}

(62)

The reliability constraint is handled exactly as in the proof of Theorem 1, except that everything should be conditioned on

{\hat{W}}^{K}

. The result of this is

\begin{matrix} I (X^{M}; Y^{M} | {\hat{W}}^{K}) & \geq & H ({\hat{U}}^{K} | {\hat{W}}^{K}) - K Δ (ϵ_{r}) - log q_{d} \\ \overset{▵}{=} & K \cdot R (u^{n}, w^{n}, q_{d}, ϵ_{r}) \\ = & M \cdot \frac{R (u^{n}, w^{n}, q_{d}, ϵ_{r})}{λ} . \end{matrix}

(63)

Regarding the security constraint, we begin with the following manipulation.

\begin{matrix} H ({\hat{U}}^{K} | Z^{M}, {\dot{W}}^{K}) & = & H ({\hat{U}}^{K} | {\dot{W}}^{K}) - I ({\hat{U}}^{K}; Z^{M} | {\dot{W}}^{K}) \\ = & H ({\hat{U}}^{K} | {\dot{W}}^{K}) - H ({\hat{U}}^{K} | {\hat{W}}^{K}) + H ({\hat{U}}^{K} | {\hat{W}}^{K}) - I ({\hat{U}}^{K}; Z^{M} | {\dot{W}}^{K}) \\ \overset{(a)}{\leq} & H ({\hat{U}}^{K} | {\dot{W}}^{K}) - H ({\hat{U}}^{K} | {\hat{W}}^{K}) + H ({\hat{U}}^{K} | {\hat{W}}^{K}) - I ({\hat{U}}^{K}; Z^{M} | {\hat{W}}^{K}) + log q_{e} \\ = & H ({\hat{U}}^{K} | {\dot{W}}^{K}) - H ({\hat{U}}^{K} | {\hat{W}}^{K}) + H ({\hat{U}}^{K} | Z^{M}, {\hat{W}}^{K}) + log q_{e} \\ = & H ({\hat{U}}^{K} | {\dot{W}}^{K}) - H ({\hat{U}}^{K} | {\hat{W}}^{K}) + H ({\hat{U}}^{K} | Z^{M}, {\hat{W}}^{K}) - \\ H ({\hat{U}}^{K} | Y^{M}, Z^{M}, S^{d}, {\hat{W}}^{K}) + H ({\hat{U}}^{K} | Y^{M}, Z^{M}, S^{d}, {\hat{W}}^{K}) + log q_{e} \\ = & H ({\hat{U}}^{K} | {\dot{W}}^{K}) - H ({\hat{U}}^{K} | {\hat{W}}^{K}) + H ({\hat{U}}^{K} | Z^{M}, {\hat{W}}^{K}) - H ({\hat{U}}^{K} | Y^{M}, Z^{M}, {\hat{W}}^{K}) + \\ I (S^{d}; {\hat{U}}^{K} | Y^{M}, Z^{M}, {\hat{W}}^{K}) + H ({\hat{U}}^{K} | Y^{M}, Z^{M}, S^{d}, {\hat{W}}^{K}) + log q_{e} \\ \overset{(b)}{\leq} & H ({\hat{U}}^{K} | {\dot{W}}^{K}) - H ({\hat{U}}^{K} | {\hat{W}}^{K}) + I ({\hat{U}}^{K}; Y^{M} | Z^{M}, {\hat{W}}^{K}) + log q_{d} + \\ K Δ (ϵ_{r}) + log q_{e} \\ \leq & H ({\hat{U}}^{K} | {\dot{W}}^{K}) - H ({\hat{U}}^{K} | {\hat{W}}^{K}) + I ({\hat{U}}^{K}, {\hat{S}}^{e}; Y^{M} | Z^{M}, {\hat{W}}^{K}) + log q_{d} + \\ K Δ (ϵ_{r}) + log q_{e} \\ \overset{(c)}{\leq} & H ({\hat{U}}^{K} | {\dot{W}}^{K}) - H ({\hat{U}}^{K} | {\hat{W}}^{K}) + I (X^{M}; Y^{M} | Z^{M}, {\hat{W}}^{K}) + log (q_{e} q_{d}) + K Δ (ϵ_{r}), \end{matrix}

(64)

where in (a) we used Equation (62), in (b) we used Fano’s inequality, and in (c) we used the data processing inequality as

({\hat{U}}^{K}, {\hat{S}}^{e}) \to X^{M} \to Y^{M}

is a Markov chain (also conditioned on

({\hat{W}}^{K}, Z^{M})

). The next step is to further upper bound the term

I (X^{M}; Y^{M} | Z^{M}, {\hat{W}}^{K})

. This is carried out very similarly as in the proof of Theorem 1, except that everything is conditioned also on

{\hat{W}}^{K}

. We then obtain

\begin{matrix} H ({\hat{U}}^{K} | Z^{M}, {\dot{W}}^{K}) & \leq & H ({\hat{U}}^{K} | {\dot{W}}^{K}) - H ({\hat{U}}^{K} | {\hat{W}}^{K}) + M \cdot Γ [\frac{R (u^{n}, w^{n}, q_{d}, ϵ_{r})}{λ}] + \\ log (q_{e} q_{d}) + K Δ (ϵ_{r}), \end{matrix}

(65)

or, equivalently,

\begin{matrix} H ({\hat{U}}^{K} | {\hat{W}}^{K}) - M \cdot Γ [\frac{R (u^{n}, w^{n}, q_{d}, ϵ_{r})}{λ}] \\ \leq & H ({\hat{U}}^{K} | {\dot{W}}^{K}) - H ({\hat{U}}^{K} | Z^{M}, {\dot{W}}^{K}) + log (q_{e} q_{d}) + K Δ (ϵ_{r}) \\ = & I ({\hat{U}}^{K}; Z^{M} | {\dot{W}}^{K}) + log (q_{e} q_{d}) + K Δ (ϵ_{r}) \\ \leq & K ϵ_{s} + log (q_{e} q_{d}) + K Δ (ϵ_{r}), \end{matrix}

(66)

or

\begin{matrix} R (u^{n}, w^{n}, q_{e} \cdot q_{d}, ϵ_{r}) & \leq & λ \cdot Γ [\frac{R (u^{n}, w^{n}, q_{d}, ϵ_{r})}{λ}] \\ \leq & λ \cdot Γ [\frac{R (u^{n}, w^{n}, q_{e} \cdot q_{d}, ϵ_{r})}{λ}] \end{matrix}

(67)

which is the same as

R (u^{n}, w^{n}, q_{e} \cdot q_{d}, ϵ_{r}) \leq λ \cdot C_{s} .

(68)

or

\frac{H ({\hat{U}}^{K} | {\hat{W}}^{K})}{K} \leq λ \cdot C_{s} + ϵ_{s} + Δ (ϵ_{r}) + \frac{log (q_{e} \cdot q_{d})}{K} .

(69)

The proof is completed by combining the last inequality with the following inequality ([26] Equations (17)–(19), [27] Equations (55)–(57)):

\frac{H ({\hat{U}}^{K} | {\hat{W}}^{K})}{K} \geq ρ_{LZ} (u^{n} | w^{n}) - \frac{log (4 A^{2})}{(1 - ϵ_{n}) log n} - \frac{A^{2} log (4 A^{2})}{n} - \frac{1}{K},

(70)

where

A = [{(α ω)}^{K + 1} - 1] / [α ω - 1]

,

ω

being the alphabet size of

W

.

Funding

This research received no external funding.

Data Availability Statement

Data sharing not applicable.

Acknowledgments

Interesting discussions with Alejandro Cohen are acknowledged with thanks.

Conflicts of Interest

The author declares no conflict of interest.

References

Wyner, A.D. The wire-tap channel. Bell Syst. Tech. J. 1975, 54, 1355–1387. [Google Scholar] [CrossRef]
Csiszár, I.; Körner, J. Broadcast channels with confidential messages. IEEE Trans. Inform. Theory 1978, 24, 339–348. [Google Scholar] [CrossRef]
Leung-Yan-Cheong, S.K.; Hellman, M.E. The Gaussian wire-tap channel. IEEE Trans. Inform. Theory 1978, 24, 451–456. [Google Scholar] [CrossRef]
Ozarow, L.H.; Wyner, A.D. Wire–tap channel II. In Proceedings of the Eurocrypt 84, Workshop on Advances in Cryptology: Theory and Applications of Cryptographic Techniques, Paris, France, 9–11 April 1985; pp. 33–51. [Google Scholar]
Yamamoto, H. Coding theorems for secret sharing communication systems with two noisy channels. IEEE Trans. Inform. Theory 1989, 35, 572–578. [Google Scholar] [CrossRef][Green Version]
Yamamoto, H. Rate–distortion theory for the Shannon cipher system. IEEE Trans. Inform. Theory 1997, 43, 827–835. [Google Scholar] [CrossRef]
Merhav, N. Shannon’s secrecy system with informed receivers an its application to systematic coding for wiretapped channels. IEEE Trans. Inform. Theory 2008, 54, 2723–2734. [Google Scholar] [CrossRef]
Tekin, E.; Yener, A. The Gaussian multiple access wire–Tap channel. IEEE Trans. Inform. Theory 2008, 54, 5747–5755. [Google Scholar] [CrossRef]
Mitrpant, C. Information Hiding—An Application of Wiretap Channels with Side Information. Ph.D. Thesis, der Universitaet Duisburg–Essen, Essen, Germany, 2003. [Google Scholar]
Mitrpant, C.; Vinck, A.J.H.; Luo, Y. An achievable region for the Gaussian wiretap channel with side information. IEEE Trans. Inform. Theory 2006, 52, 2181–2190. [Google Scholar] [CrossRef]
Ardestanizadeh, E.; Franceschetti, M.; Javidi, T.; Kim, Y.-H. Wiretap channel with secure rate–Limited feedback. IEEE Trans. Inform. Theory 2009, 55, 5353–5361. [Google Scholar] [CrossRef]
Hayashi, M. Upper bounds of eavesdropper’s performances in finite-length code with the decoy method. Phy. Rev. A 2007, 76, 012329, Erratum in Phys. Rev. A 2009, 79, 019901E. [Google Scholar] [CrossRef]
Hayashi, M.; Matsumoto, R. Secure multiplex coding with dependent and non-uniform multiple messages. IEEE Trans. Inform. Theory 2016, 62, 2355–2409. [Google Scholar] [CrossRef]
Bloch, M.; Barros, J. Physical-Layer Security: From Information Theory to Security Engineering; Cambridge University Press: New York, NY, USA, 2011. [Google Scholar]
Bellare, M.; Tessaro, S.; Vardy, A. Semantic security for the wiretap channel. In Advances in Cryptology—CRYPTO 2012; Safavi-Naini, R., Canetti, R., Eds.; CRYPTO 2012 Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2012; Volume 7417. [Google Scholar] [CrossRef]
Goldfeld, Z.; Cuff, P.; Permuter, H.H. Semantic security capacity for wiretap channels of type II. In Proceedings of the 2016 IEEE International Symposium on Information Theory (ISIT 2016), Barcelona, Spain, 10–15 July 2016; pp. 2799–2803. [Google Scholar]
Ziv, J. Coding theorems for individual sequences. IEEE Trans. Inform. Theory 1978, 24, 405–412. [Google Scholar] [CrossRef]
Ziv, J. Distortion–rate theory for individual sequences. IEEE Trans. Inform. Theory 1980, 26, 137–143. [Google Scholar] [CrossRef]
Ziv, J. Fixed–rate encoding of individual sequences with side information. IEEE Trans. Inform. Theory 1984, 30, 348–352. [Google Scholar] [CrossRef]
Merhav, N. Perfectly secure encryption of individual sequences. IEEE Trans. Inform. Theory 2013, 58, 1302–1310. [Google Scholar] [CrossRef]
Merhav, N. On the data processing theorem in the semi-deterministic setting. IEEE Trans. Inform. Theory 2014, 60, 6032–6040. [Google Scholar] [CrossRef][Green Version]
Ziv, J.; Lempel, A. Compression of individual sequences via variable–Rate coding. IEEE Trans. Inform. Theory 1978, 24, 530–536. [Google Scholar] [CrossRef]
Ziv, J. Universal decoding for finite-state channels. IEEE Trans. Inform. Theory 1985, 31, 453–460. [Google Scholar] [CrossRef]
Uyematsu, T.; Kuzuoka, S. Conditional Lempel–Ziv complexity and its application to source coding theorem with side information. IEICE Trans. Fundam. 2003, E86-A, 2615–2617. [Google Scholar]
Draper, S. Universal incremental Slepian–Wolf Coding. In Proceedings of the 43rd Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 29 September–1 October 2004; pp. 1332–1341. [Google Scholar]
Merhav, N. Universal detection of messages via finite–State channels. IEEE Trans. Inform. Theory 2000, 46, 2242–2246. [Google Scholar] [CrossRef]
Merhav, N. Guessing individual sequences: Generating randomized guesses using finite—State machines. IEEE Trans. Inform. Theory 2020, 66, 2912–2920. [Google Scholar] [CrossRef]

Figure 1. Wiretap channel model. Since the source and the channel may operate at different rates (

λ

channel symbols per source symbol), the time variables associated with source-related sequences and channel-related sequences are denoted differently, i.e., i and t, respectively.

Figure 2. Wiretap channel model with side information.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Encoding Individual Source Sequences for the Wiretap Channel

Abstract

1. Introduction

2. Notation, Definitions, and Problem Setting

2.1. Notation

2.2. Definitions and Problem Setting

2.3. Preliminaries and Background

3. Results

4. Side Information at the Decoder with Partial Leakage to the Wiretapper

5. Proofs

5.1. Proof of Theorem 1

5.2. Proof of Theorem 2

5.3. Outline of the Proof of Theorem 3

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics