Distributed Hypothesis Testing with Privacy Constraints

Gilani, Atefeh; Belhadj Amor, Selma; Salehkalaibar, Sadaf; Tan, Vincent Y. F.

doi:10.3390/e21050478

Open AccessEditor’s ChoiceArticle

Distributed Hypothesis Testing with Privacy Constraints

by

Atefeh Gilani

¹,

Selma Belhadj Amor

²,

Sadaf Salehkalaibar

^1,* and

Vincent Y. F. Tan

²

¹

Department of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran 14171614418, Iran

²

Department of Electrical and Computer Engineering, National University of Singapore, Singapore 117583, Singapore

^*

Author to whom correspondence should be addressed.

Entropy 2019, 21(5), 478; https://doi.org/10.3390/e21050478

Submission received: 6 February 2019 / Revised: 8 March 2019 / Accepted: 20 March 2019 / Published: 7 May 2019

(This article belongs to the Special Issue Probabilistic Methods in Information Theory, Hypothesis Testing, and Coding)

Download

Browse Figures

Versions Notes

Abstract

:

We revisit the distributed hypothesis testing (or hypothesis testing with communication constraints) problem from the viewpoint of privacy. Instead of observing the raw data directly, the transmitter observes a sanitized or randomized version of it. We impose an upper bound on the mutual information between the raw and randomized data. Under this scenario, the receiver, which is also provided with side information, is required to make a decision on whether the null or alternative hypothesis is in effect. We first provide a general lower bound on the type-II exponent for an arbitrary pair of hypotheses. Next, we show that if the distribution under the alternative hypothesis is the product of the marginals of the distribution under the null (i.e., testing against independence), then the exponent is known exactly. Moreover, we show that the strong converse property holds. Using ideas from Euclidean information theory, we also provide an approximate expression for the exponent when the communication rate is low and the privacy level is high. Finally, we illustrate our results with a binary and a Gaussian example.

Keywords:

hypothesis testing; privacy; mutual information; testing against independence; zero-rate communication

1. Introduction

In the distributed hypothesis testing (or hypothesis testing with communication constraints) problem, some observations from the environment are collected by the sensors in a network. They describe these observations over the network which are finally received by the decision center. The goal is to guess the joint distribution governing the observations at terminals. In particular, there are two possible hypotheses

H = 0

or

H = 1

, where the joint distribution of the observations is specified under each of them. The performance of this system is characterized by two criteria: the type-I and the type-II error probabilities. The probability of deciding on

H = 1

(respectively

H = 0

) when the original hypothesis is

H = 0

(respectively

H = 1

) is referred to as the type-I error (type-II error) probability. There are several approaches for defining the performance of a hypothesis test. First, we can maximize the exponent (exponential rate of decay) of the Bayesian error probability. Second, we can impose that the type-II error probability decays exponentially fast and we can then maximize the exponent of the type-I error probability; this is known as the Hoeffding regime. The approach in this work is the Chernoff-Stein regime in which we upper bound the type-II error probability by a non-vanishing constant and we maximize the exponent of the type-II error probability.

A special case of interest is testing against independence where the joint distribution under

H = 1

is the product of the marginals under

H = 0

. The optimal exponent of type-II error probability for testing against independence is determined by Ahlswede and Csiszár in [1]. Several extensions of this basic problem are studied for a multi-observer setup [2,3,4,5,6], a multi-decision center setup [7,8] and a setup with security constraints [9]. The main idea of the achievable scheme in these works is typicality testing [10,11]. The sensor finds a jointly typical codeword with its observation and sends the corresponding bin index to the decision center. The final decision is declared based on typicality check of the received codeword with the observation at the center. We note that the coding scheme employed here is reminiscent of those used for source coding with side information [12] and for different variants of the information bottleneck problem [13,14,15,16].

1.1. Injecting Privacy Considerations into Our System

We revisit the distributed hypothesis testing problem from a privacy perspective. In many applications such as healthcare systems, there is a need to randomize the data before publishing it. For example, hospitals often have large amounts of medical records of their patients. These records are useful for performing various statistical inference tasks, such as learning about causes of a certain ailment. However, due to privacy considerations of the patients, the data cannot be published as is. The data needs to be sanitized, quantized, perturbed and then be fed to a management center before statistical inference, such as hypothesis testing, is being done.

In the proposed setup, we use a privacy mechanism to sanitize the observation at the terminal before it is compressed; see Figure 1. The compression is performed at a separate terminal called transmitter, which communicates the randomized data over a noiseless link of rate R to a receiver. The hypothesis testing is performed using the received data (the compression index and additional side information) to determine the correct hypothesis governing the original observations. The privacy criterion is defined by the mutual information [17,18,19,20] of the published and original data.

There is a long history of research to provide appropriate metrics to measure privacy. To quantify the information leakage an observation

\hat{X}

can induce on a latent variable X, Shannon’s mutual information

I (X; \hat{X})

is considered in [17,18,19,20]. Smith [18] proposed to use Arimoto’s mutual information of order ∞,

I_{\infty} (X; \hat{X})

. Barthe and Köpf [21,22,23] proposed the maximal information leakage

{max}_{P_{X}} I_{\infty} (X; \hat{X})

. We refer the reader to [24] for a survey on the existing information leakage measures. A different line of works, in statistics, computer science, and other related fields, concerns differential privacy, initially proposed in [25]. Furthermore, a generalized notion—

(ϵ, δ)

-differential privacy [26]—provides a unified mathematical framework for data privacy. The reader is referred to the survey by Dwork [27] and the statistical framework studied by Wasserman and Zhou [28] and the references therein.

The privacy mechanism can be either memoryless or non-memoryless. In the former, the distribution of the randomized data at each time instant depends on the original sequence at the same time and not on the previous history of the data.

1.2. Description of Our System Model

We propose a coding scheme for the proposed setup. The idea is that the sensor, upon observing the source sequence, performs a typicality test and obtains its belief of the hypothesis. If the belief is

H = 0

, it publishes the randomized data based on a specific memoryless mechanism. However, if its belief is

H = 1

, it sends an all-zero sequence to let the transmitter know about its decision. The transmitter communicates the received data, which is a sanitized version of the original data or an all-zero sequence, over the noiseless link to the receiver. In this scheme, the whole privacy mechanism is non-memoryless since the typicality check of the source sequence which uses the history of the observation, determines the published data. It is shown that the achievable error exponent recovers previous results on hypothesis testing with zero and positive communication rates in [10]. Our work is related to a recent work [29] where a general hypothesis testing setup is considered from a privacy perspective. However, in [29], the problem at hand is different from ours. The authors consider equivocation and average distortion as possible measures of privacy whereas we constrain the mutual information between the original and released (published) data.

A difference of the proposed scheme with some previous works is highlighted as follows. The privacy mechanism even if it is memoryless, cannot be viewed as a noiseless link of a rate equivalent to the privacy criterion. Particularly, the proposed model is different from cascade hypothesis testing problem of [8] or similar works [3,4] which consider consecutive noiseless links for data compression and distributed hypothesis testing. The difference comes from the fact that in these works, a codeword is chosen jointly typical with the observed sequence at the terminal and its corresponding index is sent over the noiseless link. However, in our model, the randomized sequence is not necessarily jointly typical with the original sequence. Thus, there is a need for an achievable scheme which lets the transmitter know whether the original data is typical or not.

The problem of hypothesis testing against independence with a memoryless privacy mechanism is also considered. A coding scheme is proposed where the sensor outputs the randomized data based on the memoryless privacy mechanism. The optimality of the achievable type-II error exponent is shown by providing a strong converse. Specializing the optimal error exponent to a binary example shows that an increase in the privacy criterion (a less stringent privacy mechanism) results in a larger type-II error exponent. Thus, there exists a trade-off between privacy and hypothesis testing criteria. The optimal type-II error exponent is further studied for the case of restricted privacy mechanism and zero-rate communication. The Euclidean approach of [30,31,32,33] is used to approximate the error exponent for this regime. The result confirms the trade-off between the privacy criterion and type-II error exponent. Finally, a Gaussian setup is proposed and its optimal error exponent is established.

1.3. Main Contributions

The contributions of the paper are listed in the following:

An achievable type-II error exponent is proposed using a non-memoryless privacy mechanism (Theorem 1 in Section 3);
The optimal error exponent of testing against independence with a memoryless privacy mechanism is determined. In addition, a strong converse is also proved (Theorem 2 in Section 4.1);
A binary example is proposed to show the trade-off between the privacy and error exponent (Section 4.3);
An Euclidean approximation [30] of the error exponent is provided (Section 4.4);
A Gaussian setup is proposed and its optimal error exponent is derived (Proposition 2 in Section 4.5).

1.4. Notation

The notation mostly follows [34]. Random variables are denoted by capital letters, e.g., X, Y, and their realizations by lower case letters, e.g., x, y. The alphabet of the random variable X is denoted as

X

. Sequences of random variables and their realizations are denoted by

(X_{i}, \dots, X_{j})

and

(x_{i}, \dots, x_{j})

and are abbreviated as

X_{i}^{j}

and

x_{i}^{j}

. We use the alternative notation

X^{j}

when

i = 1

. Vectors and matrices are denoted by boldface letters, e.g.,

k

,

W

. The

ℓ_{2}

-norm of

k

is denoted as

∥ k ∥

. The notation

k^{T}

denotes the transpose of

k

.

The probability mass function (pmf) of a discrete random variable X is denoted as

P_{X}

, the conditional pmf of X given Y is denoted as

P_{X | Y}

. The notation

D (P_{X} ∥ Q_{X})

denotes the Kullback-Leibler (KL) divergence between two pmfs

P_{X}

and

Q_{X}

. The total variation distance between two pmfs

P_{X}

and

Q_{X}

is denoted by

| P_{X} - Q_{X} | = \frac{1}{2} \sum_{x} | P_{X} (x) - Q_{X} (x) |

. We use

tp (x^{n}, y^{n})

to denote the joint type of

(x^{n}, y^{n})

.

For a given

P_{X Y}

and a positive number

μ

, we denote by

T_{μ}^{n} (P_{X Y})

, the set of jointly

μ

-typical sequences [34], i.e., the set of all

(x^{n}, y^{n})

whose joint type is within

μ

of

P_{X Y}

(in the sense of total-variation distance). The notation

T^{n} (P_{X})

denotes for the type class of the type

P_{X}

.

The notation

h_{b} (\cdot)

denotes the binary entropy function,

h_{b}^{- 1} (\cdot)

its inverse over

[0, \frac{1}{2}]

, and

a ⋆ b ≜ a (1 - b) + (1 - a) b

for

0 \leq a, b \leq 1

. The differential entropy of a continuous random variable X is

h (X)

. All logarithms

log (\cdot)

are taken with respect to base 2.

1.5. Organization

The remainder of the paper is organized as follows. Section 2 describes a mathematical setup for our proposed problem. Section 3 discusses hypothesis testing with general distributions. The results for hypothesis testing against independence with a memoryless privacy mechanism are provided in Section 4. The paper is concluded in Section 5.

2. System Model

Let

X

,

Y

, and

\hat{X}

be arbitrary finite alphabets and let n be a positive integer. Consider the hypothesis testing problem with communication and privacy constraints depicted in Figure 1. The first terminal in the system, the Randomizer, receives the sequence

X^{n} = (X_{1}, \dots, X_{n}) \in X^{n}

and outputs the sequence

{\hat{X}}^{n} = ({\hat{X}}_{1}, \dots, {\hat{X}}_{n}) \in {\hat{X}}^{n}

, which is a noisy version of

X^{n}

under a privacy mechanism determined by the conditional probability distribution

P_{{\hat{X}}^{n} | X^{n}}

; the second terminal, the Transmitter, receives the sequence

{\hat{X}}^{n}

; the third terminal, the Receiver, observes the side-information sequence

Y^{n} = (Y_{1}, \dots, Y_{n}) \in Y^{n}

. Under the null hypothesis

\begin{matrix} H = 0 : (X^{n}, Y^{n}) \sim i . i . d . P_{X Y}, \end{matrix}

(1)

whereas under the alternative hypothesis

\begin{matrix} H = 1 : (X^{n}, Y^{n}) \sim i . i . d . Q_{X Y}, \end{matrix}

(2)

for two given pmfs

P_{X Y}

and

Q_{X Y}

.

The privacy mechanism is described by the conditional pmf

P_{{\hat{X}}^{n} | X^{n}}

which maps each sequence

X^{n} \in X^{n}

to a sequence

{\hat{X}}^{n} \in {\hat{X}}^{n}

. For any

({\hat{x}}^{n}, x^{n}, y^{n}) \in {\hat{X}}^{n} \times X^{n} \times Y^{n}

, the joint distributions considering the privacy mechanism are given by

\begin{matrix} P_{\hat{X} X Y}^{n} ({\hat{x}}^{n}, x^{n}, y^{n}) & ≜ P_{{\hat{X}}^{n} | X^{n}} ({\hat{x}}^{n} | x^{n}) \cdot \prod_{i = 1}^{n} P_{X Y} (x_{i}, y_{i}), \end{matrix}

(3)

\begin{matrix} Q_{\hat{X} X Y}^{n} ({\hat{x}}^{n}, x^{n}, y^{n}) & ≜ P_{{\hat{X}}^{n} | X^{n}} ({\hat{x}}^{n} | x^{n}) \cdot \prod_{i = 1}^{n} Q_{X Y} (x_{i}, y_{i}) . \end{matrix}

(4)

A memoryless/local privacy mechanism is defined by a conditional pmf

P_{\hat{X} | X}

which stochastically and independently maps each entry

X_{i} \in X

of

X^{n}

to a released

{\hat{X}}_{i} \in \hat{X}

to construct

{\hat{X}}^{n}

. Consequently, for the memoryless privacy mechanism, the conditional pmf

P_{{\hat{X}}^{n} | X^{n}} ({\hat{x}}^{n} | x^{n})

factorizes as follows:

\begin{matrix} P_{{\hat{X}}^{n} | X^{n}} ({\hat{x}}^{n} | x^{n}) = \prod_{i = 1}^{n} P_{\hat{X} | X} ({\hat{x}}_{i} | x_{i}) = P_{\hat{X} | X}^{n} ({\hat{x}}^{n} | x^{n}), \forall ({\hat{x}}^{n}, x^{n}) \in {\hat{X}}^{n} \times X^{n} . \end{matrix}

(5)

There is a noise-free bit pipe of rate R from the transmitter to the receiver. Upon observing

{\hat{X}}^{n}

, the transmitter computes the message

M = ϕ^{(n)} ({\hat{X}}^{n})

using a possibly stochastic encoding function

ϕ^{(n)} : {\hat{X}}^{n} \to {0, \dots, ⌊ 2^{n R} ⌋}

and sends it over the bit pipe to the receiver.

The goal of the receiver is to produce a guess of

H

using a decoding function

g^{(n)} : Y^{n} \times {0, . . ., ⌊ 2^{n R} ⌋} \to {0, 1}

based on the observation

Y^{n}

and the received message M. Thus the estimate of the hypothesis is

\hat{H} = g^{(n)} (Y^{n}, M)

.

This induces a partition of the sample space

{\hat{X}}^{n} \times X^{n} \times Y^{n}

into an acceptance region

A_{n}

defined as follows:

\begin{matrix} A_{n} ≜ \{({\hat{x}}^{n}, x^{n}, y^{n}) : g^{(n)} (y^{n}, ϕ^{(n)} ({\hat{x}}^{n})) = 0\}, \end{matrix}

(6)

and a rejection region denoted by

A_{n}^{c}

.

Definition 1.

For any

ϵ \in [0, 1)

and for a given rate-privacy pair

(R, L) \in R_{+}^{2}

, we say that a type-II exponent

θ \in R_{+}

is

(ϵ, R, L)

-achievable if there exists a sequence of functions and conditional pmfs

(ϕ^{(n)}, g^{(n)}, P_{{\hat{X}}^{n} | X^{n}})

, such that the corresponding sequences of type-I and type-II error probabilities at the receiver are defined as

\begin{matrix} α_{n} ≜ P_{\hat{X} X Y}^{n} (A_{n}^{c}) a n d β_{n} ≜ Q_{\hat{X} X Y}^{n} (A_{n}), \end{matrix}

(7)

respectively, and they satisfy

\begin{matrix} \underset{n \to \infty}{lim sup} α_{n} & \leq ϵ and \underset{n \to \infty}{lim inf} \frac{1}{n} log \frac{1}{β_{n}} \geq θ . \end{matrix}

(8)

Furthermore, the privacy measure

T_{n} ≜ \frac{1}{n} I (X^{n}; {\hat{X}}^{n}),

(9)

satisfies

\begin{matrix} \underset{n \to \infty}{lim sup} T_{n} \leq L . \end{matrix}

(10)

The optimal exponent

θ_{ϵ}^{*} (R, L)

is the supremum of all

(ϵ, R, L)

-achievable

θ \in R_{+}

.

3. General Hypothesis Testing

3.1. Achievable Error Exponent

The following presents an achievable error exponent for the proposed setup.

Theorem 1.

For a given

ϵ \in [0, 1)

and a rate-privacy pair

(R, L) \in R_{+}^{2}

, the optimal type-II error exponent

θ_{ϵ}^{*} (R, L)

for the multiterminal hypothesis testing setup under the privacy constraint L and the rate constraint R satisfies

\begin{matrix} r C l θ_{ϵ}^{*} (R, L) \geq max_{\begin{matrix} P_{U | \hat{X}}, P_{\hat{X} | X} : \\ R \geq I (U; \hat{X}) \\ L \geq I (X; \hat{X}) \end{matrix}} min_{\begin{matrix} {\tilde{P}}_{U \hat{X} X Y} \in P_{U \hat{X} X Y} \end{matrix}} D ({\tilde{P}}_{U \hat{X} X Y} ∥ P_{U | \hat{X}} P_{\hat{X} | X} Q_{X Y}), \end{matrix}

(11)

where the set

P_{U \hat{X} X Y}

is defined as

P_{U \hat{X} X Y} ≜ \{{\tilde{P}}_{U \hat{X} X Y} |\begin{matrix} {\tilde{P}}_{X} = P_{X}, \\ {\tilde{P}}_{U Y} = P_{U Y}, \\ {\tilde{P}}_{U \hat{X}} = P_{U \hat{X}} \end{matrix}\} .

(12)

Given

P_{U | \hat{X}}

and

P_{\hat{X} | X}

, the mutual informations in (11) are calculated according to the following joint distribution:

\begin{matrix} P_{U \hat{X} | X} & ≜ P_{U | \hat{X}} \cdot P_{\hat{X} | X} . \end{matrix}

(13)

Proof.

The coding scheme is given in the following section. For the analysis, see Appendix A. □

3.2. Coding Scheme

In this section, we propose a coding scheme for Theorem 1, under fixed rate and privacy constraints

(R, L) \in R_{+}^{2}

. Fix the joint distribution

P_{U \hat{X} X Y}

as in (13). Let

P_{U} (u)

be the marginal distribution of

U \in U

defined as

\begin{matrix} P_{U} (u) ≜ \sum_{\hat{x} \in \hat{X}} P_{U | \hat{X}} (u | \hat{x}) \sum_{x \in X} P_{\hat{X} X} (\hat{x}, x) . \end{matrix}

(14)

Fix positive

μ > 0

and

ζ > 0

, an arbitrary blocklength n and two conditional pmfs

P_{\hat{X} | X}

and

P_{U | \hat{X}}

over finite auxiliary alphabets

\hat{X}

and

U

. Fix also the rate and privacy leakage level as

\begin{matrix} R = I (U; \hat{X}) + μ, and L = I (\hat{X}; X) + ζ . \end{matrix}

(15)

Codebook Generation: Randomly and independently generate a codebook

\begin{matrix} C_{U} ≜ \{U^{n} (m) : m \in {0, \dots, ⌊ 2^{n R} ⌋}\}, \end{matrix}

(16)

by drawing

U^{n} (m)

in an i.i.d. manner according to

P_{U}

. The codebook is revealed to all terminals.

Randomizer: Upon observing

x^{n}

, it checks whether

x^{n} \in T_{μ / 4}^{n} (P_{X}) .

If successful, it outputs the sequence

{\hat{x}}^{n}

where its i-th component

{\hat{x}}_{i}

is generated based on

x_{i}

, according to

P_{\hat{X} | X} ({\hat{x}}_{i} | x_{i})

. If the typicality check is not successful, the randomizer then outputs

0^{n}

which is an all-zero sequence of length n, where

{\hat{x}}^{n} = 0^{n}

.

Transmitter: Upon observing

{\hat{x}}^{n}

, if

{\hat{x}}^{n} \neq 0^{n}

, the transmitter finds an index m such that

(u^{n} (m), {\hat{x}}^{n}) \in T_{μ / 2}^{n} (P_{U \hat{X}}) .

If successful, it sends the index m over the noiseless link to the receiver. Otherwise, if the typicality check is not successful or

{\hat{x}}^{n} = 0^{n}

, it sends

m = 0

.

Receiver: Upon observing

y^{n}

and receiving the index m, if

m = 0

, the receiver declares

\hat{H} = 1

. If

m \neq 0

, it checks whether

(u^{n} (m), y^{n}) \in T_{μ}^{n} (P_{U Y}) .

If the test is successful, the receiver declares

\hat{H} = 0

; otherwise, it sets

\hat{H} = 1

.

Remark 1.

In the above scheme, the sequence

{\hat{X}}^{n}

is chosen to be an n-length zero-sequence when the randomizer finds that

X^{n}

is not typical according to

P_{X}

. Thus, the privacy mechanism is not memoryless and the sequence

{\hat{X}}^{n}

is not identically and independently distributed (i.i.d.). A detailed analysis in Appendix A shows that the privacy criterion is not larger than L as the blocklength

n \to \infty

.

3.3. Discussion

In the following, we discuss some special cases. First, suppose that

R = 0

. The following corollary shows that Theorem 1 recovers Han’s result [1] for distributed hypothesis testing with zero-rate communication.

Corollary 1

(Theorem 5 in [10]). Suppose that

Q_{X Y} > 0

. For all

ϵ \in [0, 1)

, the optimal error exponent of the zero-rate communication for any privacy mechanism (including non-memoryless mechanisms) is given by the following:

\begin{matrix} θ_{ϵ}^{*} (0, L) = min_{\begin{matrix} {\tilde{P}}_{X Y} : \\ {\tilde{P}}_{X} = P_{X} \\ {\tilde{P}}_{Y} = P_{Y} \end{matrix}} D ({\tilde{P}}_{X Y} ∥ Q_{X Y}) . \end{matrix}

(17)

Proof.

The proof of achievability follows by Theorem 1, in which

\hat{X}

is arbitrary and the auxiliary

U = \emptyset

due to the zero-rate constraint. The proof of the strong converse follows along the same lines as [35]. □

Remark 2.

Consider the case of

R > 0

and

L = 0

where

\hat{X}

is independent of X. Using Theorem 1, the optimal error exponent is lower bounded as follows:

\begin{matrix} θ_{ϵ}^{*} (R, 0) \geq min_{\begin{matrix} {\tilde{P}}_{X Y} : \\ {\tilde{P}}_{X} = P_{X} \\ {\tilde{P}}_{Y} = P_{Y} \end{matrix}} D ({\tilde{P}}_{X Y} ∥ Q_{X Y}) . \end{matrix}

(18)

However, there is no known converse result in this case where the communication rate is positive. Comparing this special case with the one in Corollary 1 shows that the proposed model does not, in general, admit symmetry between the rate and privacy constraints. However, we will see from some specific examples in the following that the roles of R and L are symmetric.

Now, suppose that L is so large such that

L > H (X)

. The following corollary shows that Theorem 1 recovers Han’s result in [10] for distributed hypothesis testing over a rate-R communication link.

Corollary 2

(Theorem 2 in [10]). Assuming

L > H (X)

, the optimal error exponent is lower bounded as the following:

θ_{ϵ}^{*} (R, L) \geq max_{\begin{matrix} P_{U | X} : \\ R \geq I (U; X) \end{matrix}} min_{\begin{matrix} {\tilde{P}}_{U X Y} : \\ {\tilde{P}}_{U X} = P_{U X} \\ {\tilde{P}}_{U Y} = P_{U Y} \end{matrix}} D ({\tilde{P}}_{U X Y} ∥ P_{U | X} Q_{X Y}) .

(19)

Proof.

The proof follows from Theorem 1 by specializing to

\hat{X} = X

. □

The above two special cases reveal a trade-off between the privacy criterion and the achievable error exponent when the communication rate is positive, i.e.,

R > 0

. An increase in L results in a larger achievable error exponent. This observation is further illustrated by an example in Section 4.3 to follow.

4. Hypothesis Testing against Independence with a Memoryless Privacy Mechanism

In this section, we consider testing against independence where the joint pmf under

H = 1

factorizes as follows:

\begin{matrix} Q_{X Y} = P_{X} \cdot P_{Y} . \end{matrix}

(20)

The privacy mechanism is assumed to be memoryless here.

4.1. Optimal Error Exponent

The following theorem, which includes a strong converse, states the optimal error exponent for this special case.

Theorem 2.

For any

(R, L) \in R_{+}^{2}

, define

θ_{ϵ}^{*} (R, L) = max_{\begin{matrix} P_{U | \hat{X}}, P_{\hat{X} | X} : \\ R \geq I (U; \hat{X}) \\ L \geq I (X; \hat{X}) \end{matrix}} I (U; Y) .

(21)

Then, for any

ϵ \in [0, 1)

and any

(R, L) \in R_{+}^{2}

, the optimal error exponent for testing against independence when using a memoryless privacy mechanism is given by (21), where it suffices to choose

| U | \leq | \hat{X} | + 1

and

| \hat{X} | \leq | X |

according to Caratheodory’s theorem [36] (Theorem 15.3.5).

Proof.

The coding scheme is given in the following section. For the rest of proof, see Appendix B. □

4.2. Coding Scheme

In this section, we propose a coding scheme for Theorem 2. Fix the joint distribution as in (13), and the rate and privacy constraints as in (15). Generate the codebook

C_{U}

as in (16).

Randomizer: Upon observing

x^{n}

, it outputs the sequence

{\hat{x}}^{n}

in which the i-th component

{\hat{x}}_{i}

is generated based on

x_{i}

, according to

P_{\hat{X} | X} ({\hat{x}}_{i} | x_{i})

.

Transmitter: It finds an index m such that

(u^{n} (m), {\hat{x}}^{n}) \in T_{μ / 2}^{n} (P_{U \hat{X}}) .

If successful, it sends the index m over the noiseless link to the receiver. Otherwise, it sends

m = 0

.

Receiver: Upon observing

y^{n}

and receiving the index m, if

m = 0

, the receiver declares

\hat{H} = 1

. If

m \neq 0

, it checks whether

(u^{n} (m), y^{n}) \in T_{μ}^{n} (P_{U Y}) .

If the test is successful, the receiver declares

\hat{H} = 0

; otherwise, it sets

\hat{H} = 1

.

Remark 3.

In the above scheme, the sequence

{\hat{X}}^{n}

is i.i.d. since it is generated based on the memoryless mechanism

P_{\hat{X} | X}

.

When the communication rate is positive, there exists a trade-off between the optimal error exponent and the privacy criterion. The following example elucidates this trade-off.

4.3. Binary Example

In this section, we study hypothesis testing against independence for a binary example. Suppose that under both hypotheses, we have

X \sim Bern (\frac{1}{2})

. Under the null hypothesis,

\begin{matrix} H = 0 : Y = X \oplus N, N \sim Bern (q) \end{matrix}

(22)

for some

0 \leq q \leq 1

, where N is independent of X. Under the alternative hypothesis

H = 1 : Y \sim Bern (\frac{1}{2}),

(23)

where Y is independent of X. The cardinality constraint shows that it suffices to choose

| \hat{X} | = 2

. Among all possible privacy mechanisms, the choice of

P_{\hat{X} | X} (1 | 0) = P_{\hat{X} | X} (1 | 1)

and

P_{\hat{X} | X} (0 | 0) = P_{\hat{X} | X} (0 | 1)

minimizes the mutual information

I (X; \hat{X})

. Thus, we restrict to this choice which also results in

\hat{X} \sim Bern (\frac{1}{2})

.

The cardinality bound on the auxiliary random variable U is

| U | \leq 3

. The following proposition states that it is also optimal to choose

P_{U | \hat{X}}

to be a BSC.

Proposition 1.

The optimal error exponent of the proposed binary setup is given by the following:

\begin{matrix} θ_{ϵ}^{*} (R, L) & = 1 - h_{b} (q ⋆ h_{b}^{- 1} (1 - L) ⋆ h_{b}^{- 1} (1 - R)) . \end{matrix}

(24)

Proof.

For the proof of achievability, choose the following auxiliary random variables:

\begin{matrix} \hat{X} & = X \oplus \hat{Z}, \hat{Z} \sim Bern (p_{1}) \end{matrix}

(25)

\begin{matrix} U & = \hat{X} \oplus Z, Z \sim Bern (p_{2}), \end{matrix}

(26)

for some

0 \leq p_{1}, p_{2} \leq 1

where

\hat{Z}

and Z are independent of X and

(X, \hat{X})

, respectively. The optimal error exponent of Theorem 2 reduces to the following:

\begin{matrix} θ_{ϵ}^{*} (R, L) & = max_{\begin{matrix} 0 \leq p_{1}, p_{2} \leq 1 : \\ R \geq 1 - h_{b} (p_{2}) \\ L \geq 1 - h_{b} (p_{1}) \end{matrix}} 1 - h_{b} (q ⋆ p_{1} ⋆ p_{2}), \end{matrix}

(27)

which can be simplified to (24). For the proof of the converse, see Appendix C. □

Figure 2 illustrates the error exponent versus the privacy parameter L for a fixed rate R. There is clearly a trade-off between

θ_{ϵ}^{*} (R, L)

and L. For a less stringent privacy requirement (large L), the error exponent

θ_{ϵ}^{*} (R, L)

increases.

4.4. Euclidean Approximation

In this section, we propose Euclidean approximations [30,31] for the optimal error exponent of testing against independence scenario (Theorem 2) when

R \approx 0

and

L \approx 0

. Consider the optimal error exponent as follows:

\begin{matrix} θ_{ϵ}^{*} (R, L) = max_{\begin{matrix} P_{U | \hat{X}}, P_{\hat{X} | X} : \\ R \geq I (U; \hat{X}) \\ L \geq I (X; \hat{X}) \end{matrix}} I (U; Y) . \end{matrix}

(28)

Let

W

of dimension

| Y | \times | X |

, denote the transition matrix

P_{Y | X}

, which is itself induced by

P_{X}

and the joint distribution

P_{X Y}

. Now, consider the rate constraint as follows:

\begin{matrix} I (U; \hat{X}) & = \sum_{u \in U} P_{U} (u) D (P_{\hat{X} | U} (\cdot | u) ∥ P_{\hat{X}}) \leq R . \end{matrix}

(29)

Assuming

R \approx 0

, we let

P_{\hat{X} | U} (\cdot | u)

be a local perturbation from

P_{\hat{X}} (\cdot)

, where we have

P_{\hat{X} | U} (\cdot | u) = P_{\hat{X}} (\cdot) + ψ_{u} (\cdot),

(30)

for a perturbation

ψ_{u} (\cdot)

satisfying

\begin{matrix} \sum_{\hat{x} \in \hat{X}} ψ_{u} (\hat{x}) = 0, \end{matrix}

(31)

in order to preserve the row stochasticity of

P_{\hat{X} | U}

. Using a

χ^{2}

-approximation [30], we can write:

\begin{matrix} D (P_{\hat{X} | U} (\cdot | u) ∥ P_{\hat{X}}) & \approx \frac{1}{2} \cdot log e \cdot {∥v_{u}∥}^{2}, \end{matrix}

(32)

where

v_{u}

denotes the length-

| \hat{X} |

column vector of weighted perturbations whose

\hat{x}

-th component is defined as:

\begin{matrix} v_{u} (\hat{x}) & ≜ \frac{1}{\sqrt{P_{\hat{X}} (\hat{x})}} \cdot ψ_{u} (\hat{x}), \forall \hat{x} \in \hat{X} . \end{matrix}

(33)

Using the above definition, the rate constraint in (29) can be written as:

\begin{matrix} \sum_{u \in U} P_{U} (u) {∥v_{u}∥}^{2} & \leq \frac{2 R}{log e} . \end{matrix}

(34)

Similarly, consider the privacy constraint as the following:

\begin{matrix} I (X; \hat{X}) & = \sum_{\hat{x} \in \hat{X}} P_{\hat{X}} (\hat{x}) D (P_{X | \hat{X}} (\cdot | \hat{x}) ∥ P_{X}) \leq L . \end{matrix}

(35)

Assuming

L \approx 0

, we let

P_{X | \hat{X}} (\cdot | \hat{x})

be a local perturbation from

P_{X} (\cdot)

where

\begin{matrix} P_{X | \hat{X}} (\cdot | \hat{x}) = P_{X} (\cdot) + ϕ_{\hat{x}} (\cdot), \end{matrix}

(36)

for a perturbation

ϕ_{\hat{x}} (\cdot)

that satifies:

\begin{matrix} \sum_{x \in X} ϕ_{\hat{x}} (x) = 0 . \end{matrix}

(37)

Again, using a

χ^{2}

-approximation, we obtain the following:

\begin{matrix} D (P_{X | \hat{X}} (\cdot | \hat{x}) ∥ P_{X}) \approx \frac{1}{2} log e {∥v_{\hat{x}}∥}^{2}, \end{matrix}

(38)

where

v_{\hat{x}}

is a length-

| X |

column vector and its x-th component is defined as follows:

\begin{matrix} v_{\hat{x}} (x) & ≜ \frac{1}{\sqrt{P_{X} (x)}} \cdot ϕ_{\hat{x}} (x), \forall x \in X . \end{matrix}

(39)

Thus, the privacy constraint in (35) can be written as:

\begin{matrix} \sum_{\hat{x} \in \hat{X}} P_{\hat{X}} (\hat{x}) {∥v_{\hat{x}}∥}^{2} & \leq \frac{2 L}{log e} . \end{matrix}

(40)

For any

x \in X

and

u \in U

, we define the following:

\begin{matrix} (41) & Λ_{u} (x) & ≜ \sum_{\hat{x} \in \hat{X}} ψ_{u} (\hat{x}) ϕ_{\hat{x}} (x) \\ (42) & = \sqrt{P_{X} (x)} \sum_{\hat{x} \in \hat{X}} \sqrt{P_{\hat{X}} (\hat{x})} v_{u} (\hat{x}) v_{\hat{x}} (x), \end{matrix}

and the corresponding length-

| X |

column vector

Λ_{u}

defined as follows:

\begin{matrix} Λ_{u} = [\sqrt{P_{X}}] V_{\hat{X}} [\sqrt{P_{\hat{X}}}] v_{u}, \end{matrix}

(43)

where

[\sqrt{P_{X}}]

denotes a diagonal

| X | \times | X |

-matrix, so that its

(x, x)

-th element (

x \in X

) is

\sqrt{P_{X} (x)}

, and

[\sqrt{P_{\hat{X}}}]

is defined similarly. Moreover,

V_{\hat{X}}

refers to the

| X | \times | \hat{X} |

-matrix defined as follows:

\begin{matrix} V_{\hat{X}} ≜ [\begin{matrix} v_{1} & v_{2} & \dots & v_{\hat{x}} & \dots & v_{| \hat{X} |} \end{matrix}] . \end{matrix}

(44)

Let

{[\sqrt{P_{Y}}]}^{- 1}

be the inverse of diagonal

| Y | \times | Y |

-matrix

[\sqrt{P_{Y}}]

. As shown in Appendix D, the optimization problem in (28) can be written as follows:

\begin{matrix} max_{\begin{matrix} {v_{u}}_{u \in U}, V_{\hat{X}} : \\ - \sqrt{P_{\hat{X}} (\hat{x})} \leq v_{u} (\hat{x}) \leq \frac{1 - P_{\hat{X}} (\hat{x})}{\sqrt{P_{\hat{X}} (\hat{x})}} \\ - \sqrt{P_{X} (x)} \leq v_{\hat{x}} (x) \leq \frac{1 - P_{X} (x)}{\sqrt{P_{X} (x)}} \end{matrix}} \frac{1}{2} log e [\sum_{u \in U} P_{U} (u) \cdot {∥{[\sqrt{P_{Y}}]}^{- 1} W [\sqrt{P_{X}}] V_{\hat{X}} [\sqrt{P_{\hat{X}}}] V_{u}∥}^{2}] \end{matrix}

(45)

\begin{matrix} subject to : \sum_{u \in U} P_{U} (u) {∥v_{u}∥}^{2} \leq \frac{2 R}{log e}, \end{matrix}

(46)

\begin{matrix} \sum_{\hat{x} \in \hat{X}} P_{\hat{X}} (\hat{x}) {∥v_{\hat{x}}∥}^{2} \leq \frac{2 L}{log e} . \end{matrix}

(47)

The following example specializes the above approximation to the binary case.

Example 1.

Consider the binary setup of Example 4.3 and the choice of auxiliary random variables in (26). Since the privacy mechanism is assumed to be a BSC, we have

\begin{matrix} P_{X} = {[\frac{1}{2} \frac{1}{2}]}^{T}, P_{\hat{X}} = {[\frac{1}{2} \frac{1}{2}]}^{T}, \end{matrix}

(48)

Now, we consider the vectors

v_{u = 0}

and

v_{u = 1}

defined as

\begin{matrix} v_{u = 0} & = {[\begin{matrix} \sqrt{2} ξ_{1} & - \sqrt{2} ξ_{1} \end{matrix}]}^{T}, \end{matrix}

(49)

\begin{matrix} v_{u = 1} & = {[\begin{matrix} - \sqrt{2} ξ_{1} & \sqrt{2} ξ_{1} \end{matrix}]}^{T} . \end{matrix}

(50)

for some positive

ξ_{1}

. This yields the following:

\begin{matrix} P_{\hat{X} | U = 0} & = P_{\hat{X}} + {[ξ_{1} - ξ_{1}]}^{T}, \end{matrix}

(51)

\begin{matrix} P_{\hat{X} | U = 1} & = P_{\hat{X}} + {[- ξ_{1} ξ_{1}]}^{T} \end{matrix}

(52)

We also choose the vectors

v_{\hat{x} = 0}

and

v_{\hat{x} = 1}

as follows:

\begin{matrix} v_{\hat{x} = 0} & = {[\begin{matrix} \sqrt{2} ξ_{2} & - \sqrt{2} ξ_{2} \end{matrix}]}^{T}, \end{matrix}

(53)

\begin{matrix} v_{\hat{x} = 1} & = {[\begin{matrix} - \sqrt{2} ξ_{2} & \sqrt{2} ξ_{2} \end{matrix}]}^{T}, \end{matrix}

(54)

which results in

\begin{matrix} P_{X | \hat{X} = 0} & = P_{X} + {[ξ_{2} - ξ_{2}]}^{T}, \end{matrix}

(55)

\begin{matrix} P_{X | \hat{X} = 1} & = P_{X} + {[- ξ_{2} ξ_{2}]}^{T} . \end{matrix}

(56)

Notice that the matrix

W

is given by

\begin{matrix} W = [\begin{matrix} 1 - q & q \\ q & 1 - q \end{matrix}] . \end{matrix}

(57)

Thus, the optimization problem in (45) and (47) reduces to the following:

\begin{matrix} max_{- \frac{1}{2} \leq ξ_{1}, ξ_{2} \leq \frac{1}{2}} 8 log e {(1 - 2 q)}^{2} | ξ_{1} |^{2} {| ξ_{2} |}^{2} \end{matrix}

(58)

\begin{matrix} s u b j e c t t o : 4 | ξ_{1} |^{2} \leq \frac{2 R}{log e} a n d 4 {| ξ_{2} |}^{2} \leq \frac{2 L}{log e} . \end{matrix}

(59)

Solving the above optimization yields

\begin{matrix} θ_{ϵ}^{*} (R \approx 0, L \approx 0) & \approx \frac{2}{log e} {(1 - 2 q)}^{2} R L . \end{matrix}

(60)

For some values of parameters, the approximation in (60) is compared to the error exponent of (24) in Figure 3. We observe that when

R = L \approx 0

, the approximation turns out to be excellent.

Remark 4.

The trade-off between the optimal error exponent and the privacy can again be verified from (60) in the case of

L \approx 0

and

R \approx 0

. As L becomes larger (which corresponds to a less stringent privacy requirement), the error exponent also increases. For a fixed error exponent, a trade-off between R and L exists. An increase in R results in a decrease of L.

4.5. Gaussian Setup

In this section, we consider hypothesis testing against independence over a Gaussian example. Suppose that

X \sim N (0, 1)

and under the null hypothesis

H = 0

, the sources X and Y are jointly Gaussian random variables distributed as

N (0, G_{X Y})

, where

G_{X Y}

is defined as the following:

\begin{matrix} G_{X Y} \overset{Δ}{=} [\begin{matrix} 1 & ρ \\ ρ & 1 \end{matrix}], \end{matrix}

(61)

for some

0 \leq ρ \leq 1

.

Under the alternative hypothesis

H = 1

, we assume that X and Y are independent Gaussian random variables, each distributed as

N (0, 1)

. Consider the privacy constraint as follows:

\begin{matrix} L & \geq I (X; \hat{X}) = h (X) - h (X | \hat{X}) . \end{matrix}

(62)

For a Gaussian source X, the conditional entropy

h (X | \hat{X})

is maximized for a jointly Gaussian

(X, \hat{X})

. This choice minimizes the RHS of (62). Thus, without loss of optimality, we choose

\begin{matrix} X = \hat{X} + Z, Z \sim N (0, 2^{- 2 L}), \end{matrix}

(63)

where Z is independent of

\hat{X}

. The following proposition states that it is optimal to choose U jointly Gaussian with

(X, \hat{X}, Y)

.

Proposition 2.

The optimal error exponent of the proposed Gaussian setup is given by

\begin{matrix} θ_{ϵ}^{*} (R, L) = \frac{1}{2} log (\frac{1}{1 - ρ^{2} \cdot (1 - 2^{- 2 R}) \cdot (1 - 2^{- 2 L})}) . \end{matrix}

(64)

Proof.

For the proof of achievability, we choose

\hat{X}

as in (63). Also, let

\begin{matrix} \hat{X} = U + \hat{Z}, \hat{Z} \sim N (0, β^{2}), \end{matrix}

(65)

for some

β^{2} \geq 0

, where

\hat{Z}

is independent of U. It can be shown that Theorem 2 remains valid when it is extended to the continuous alphabet [5]. For the details of the simplification and also the proof of converse, see Appendix E. □

Remark 5.

If

L = \infty

, the above proposition recovers the optimal error exponent of Rahman and Wagner [5] (Corollary 7) for testing against independence of Gaussian sources over a noiseless link of rate R.

5. Summary and Discussion

In this paper, distributed hypothesis testing with privacy constraints is considered. A coding scheme is proposed where the sensor decides on one of hypotheses and generates the randomized data based on its decision. The transmitter describes the randomized data over a noiseless link to the receiver. The privacy mechanism in this scheme is non-memoryless. The special case of testing against independence with a memoryless privacy mechanism is studied in detail. The optimal type-II error exponent of this case is established, together with a strong converse. A binary example is proposed where the trade-off between the privacy criterion and the error exponent is reported. Euclidean approximations are provided for the case in which the privacy level is high and and the communication rate is vanishingly small. The optimal type-II error exponent of a Gaussian setup is also established.

A future line of research is to study the second-order asymptotics of our model. The second-order analysis of distributed hypothesis testing without privacy constraints and with zero-rate communication was studied in [37]. In our proposed model, the trade-off between the privacy and type-II error exponent is observed, i.e., a less stringent privacy requirement yields a larger error exponent. The next step is to see whether the trade-off between privacy and error exponent affects the second-order term.

Another potential line for future research is to consider other metrics of privacy instead of the mutual information. A possible candidate is to use the maximal leakage [21,22,23] and to analyze the performance in tandem with distributed hypothesis testing problem.

Author Contributions

Investigation, A.G.; Supervision, S.S. and V.Y.F.T.; Writing—original draft, S.B.A.

Funding

This research was partially funded by grants R-263-000-C83-112 and R-263-000-C54-114.

Acknowledgments

The authors would like to thank Lin Zhou (National University of Singapore) and Daming Cao (Southeast University) for helpful discussions during the preparation of the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Theorem 1

The analysis is based on the scheme described in Section 3.2.

Error Probability Analysis: We analyze the type-I and type-II error probabilities averaged over all random codebooks. By standard arguments [36] (p. 204), it can be shown that there exists at least one codebook that satisfies constraints on the error probabilities.

For the considered

μ > 0

and the considered blocklength n, let

P_{μ}^{n}

be the set of all joint types

π_{U \hat{X} X Y}

over

U^{n} \times {\hat{X}}^{n} \times X^{n} \times Y^{n}

which satisfy the following constraints:

\begin{matrix} | π_{X} - P_{X} | & \leq μ / 4, \end{matrix}

(A1)

\begin{matrix} | π_{U \hat{X}} - P_{U \hat{X}} | & \leq μ / 2, \end{matrix}

(A2)

\begin{matrix} | π_{U Y} - P_{U Y} | & \leq μ . \end{matrix}

(A3)

First, we analyze the type-I error probability. For the case of

M \neq 0

, we define the following event:

\begin{matrix} E ≜ \{(U^{n} (M), Y^{n}) \notin T_{μ}^{n} (P_{U Y})\} . \end{matrix}

(A4)

Thus, type-I error probability can be upper bounded as follows:

\begin{matrix} (A5) & α_{n} & \leq Pr [{\hat{X}}^{n} = 0^{n} or M = 0 or E | H = 0] \\ (A6) & \leq Pr [{\hat{X}}^{n} = 0^{n} | H = 0] + Pr [M = 0 | {\hat{X}}^{n} \neq 0^{n}, H = 0] + Pr [E | M \neq 0, {\hat{X}}^{n} \neq 0^{n}, H = 0] \\ (A7) & \leq ϵ / 3 + Pr [M = 0 | {\hat{X}}^{n} \neq 0^{n}, H = 0] + Pr [E | M \neq 0, {\hat{X}}^{n} \neq 0^{n}, H = 0] \\ (A8) & \leq ϵ / 3 + ϵ / 3 + Pr [E | M \neq 0, {\hat{X}}^{n} \neq 0^{n}, H = 0] \\ (A9) & \leq ϵ / 3 + ϵ / 3 + ϵ / 3 \\ (A10) & = ϵ, \end{matrix}

where (A7) follows from AEP [36] (Theorem 3.1.1); (A8) follows from the covering lemma [34] (Lemma 3.3) and the rate constraint (15), (A10) follows from Markov lemma [34] (Lemma 12.1). In all justifications, n is taken to be sufficiently large.

Next, we analyze the type-II error probability. The acceptance region at the receiver is

\begin{matrix} A_{n}^{Rx} = ⋃_{m} \{({\hat{x}}^{n}, x^{n}, y^{n}) : {\hat{x}}^{n} \neq 0^{n}, (u^{n} (m), {\hat{x}}^{n}, x^{n}, y^{n}) \in T_{μ}^{n} (P_{U \hat{X} X Y})\} . \end{matrix}

(A11)

The set

A_{n}^{Rx}

is contained within the following acceptance region

{\bar{A}}_{n}

:

\begin{matrix} {\bar{A}}_{n} = ⋃_{m} \{({\hat{x}}^{n}, x^{n}, y^{n}) : {\hat{x}}^{n} \neq 0^{n}, (u^{n} (m), {\hat{x}}^{n}, x^{n}, y^{n}) \in ⋃_{π \in P_{μ}^{n}} T^{n} (π)\} . \end{matrix}

(A12)

Let

F_{m} ≜ {(U^{n} (m), {\hat{X}}^{n}, X^{n}, Y^{n}) \in P_{μ}^{n}}

. Therefore, the average of type-II error probability over all codebooks is upper bounded as follows:

\begin{matrix} (A13) & E_{C} [β_{n}] & \leq Q_{\hat{X} X Y}^{n} ({\bar{A}}_{n}) \\ (A14) & \leq \sum_{m} Pr [{\hat{X}}^{n} \neq 0^{n}, F_{m} | H = 1] \\ (A15) & \leq \sum_{m} Pr [F_{m} | {\hat{X}}^{n} \neq 0^{n}, H = 1] \\ (A16) & \leq 2^{n R} \cdot {(n + 1)}^{| U | \cdot | \hat{X} | \cdot | X | \cdot | Y |} \cdot max_{π_{U \hat{X} X Y} \in P_{μ}^{n}} 2^{- n D (π_{U \hat{X} X Y} ∥ P_{U} P_{\hat{X} | X} Q_{X Y})} \\ (A17) & = {(n + 1)}^{| U | \cdot | \hat{X} | \cdot | X | \cdot | Y |} \cdot 2^{- n {\tilde{θ}}_{μ}}, \end{matrix}

where

{\tilde{θ}}_{μ} ≜ min_{π_{U \hat{X} X Y} \in P_{μ}^{n}} D (π_{U \hat{X} X Y} ∥ P_{U} P_{\hat{X} | X} Q_{X Y}) - R,

(A18)

and (A16) follows from the upper bound of Sanov’s theorem [36] (Theorem 11.4.1). Hence,

\begin{matrix} (A19) & {\tilde{θ}}_{μ} & = min_{π_{U \hat{X} X Y} \in P_{μ}^{n}} D (π_{U \hat{X} X Y} ∥ P_{U} P_{\hat{X} | X} Q_{X Y}) - R \\ (A20) & = min_{π_{U \hat{X} X Y} \in P_{μ}^{n}} D (π_{U \hat{X} X Y} ∥ P_{U} P_{\hat{X} | X} Q_{X Y}) - I (U; \hat{X}) - μ \\ (A21) & = min_{π_{U \hat{X} X Y} \in P_{μ}^{n}} D (π_{U \hat{X} X Y} ∥ P_{U | \hat{X}} P_{\hat{X} | X} Q_{X Y}) + δ (μ), \end{matrix}

where

δ (μ) \to 0

as

μ \to 0

. Equality (A20) follows from the rate constraint in (15) and (A21) holds because

| π_{U \hat{X}} - P_{U \hat{X}} | < μ / 2

.

Privacy Analysis: We first analyze the privacy under

H = 0

. Notice that

{\hat{X}}^{n}

is not necessarily i.i.d. because according to the scheme in Section 3.2,

{\hat{X}}^{n}

is forced to be an all-zero sequence if the Randomizer decides that

X^{n}

is not typical. However, conditioned on the event that

X^{n} \in T_{μ}^{n} (P_{X})

, the sequence

{\hat{X}}^{n}

is i.i.d. according to the conditional pmf

P_{\hat{X} | X}

. The privacy measure

T_{n}

satisfies

\begin{matrix} n T_{n} = I (X^{n}; {\hat{X}}^{n}) = H ({\hat{X}}^{n}) - H ({\hat{X}}^{n} | X^{n}) . \end{matrix}

(A22)

We now provide a lower bound on

H ({\hat{X}}^{n} | X^{n})

as follows

\begin{matrix} H ({\hat{X}}^{n} | X^{n}) & \geq \sum_{x^{n} \in T_{μ}^{n} (P_{X})} P_{X}^{n} (x^{n}) H ({\hat{X}}^{n} | X^{n} = x^{n}) \end{matrix}

(A23)

For any

x^{n} \in T_{μ}^{n} (P_{X})

and for

μ^{'} > μ

, it holds that

\begin{matrix} (A24) & H ({\hat{X}}^{n} | X^{n} = x^{n}) & = - \sum_{{\hat{x}}^{n} \in {\hat{X}}^{n}} P_{\hat{X} | X}^{n} ({\hat{x}}^{n} | x^{n}) log P_{\hat{X} | X}^{n} ({\hat{x}}^{n} | x^{n}) \\ (A25) & \geq - \sum_{{\hat{x}}^{n} \in T_{μ^{'}}^{n} (P_{\hat{X} | X} (\cdot | x^{n}))} P_{\hat{X} | X}^{n} ({\hat{x}}^{n} | x^{n}) log P_{\hat{X} | X}^{n} ({\hat{x}}^{n} | x^{n}) \\ (A26) & \geq - \sum_{{\hat{x}}^{n} \in T_{μ^{'}}^{n} (P_{\hat{X} | X} (\cdot | x^{n}))} P_{\hat{X} | X}^{n} ({\hat{x}}^{n} | x^{n}) log [2^{- n (1 - μ^{'}) H (\hat{X} | X)}] \\ (A27) & \geq n {(1 - μ^{'})}^{2} H (\hat{X} | X) \end{matrix}

where (A26) is true because for any

{\hat{x}}^{n} \in T_{μ^{'}}^{n} (P_{\hat{X} | X} (\cdot | x^{n}))

, it holds that

P_{\hat{X} | X}^{n} ({\hat{x}}^{n} | x^{n}) \leq 2^{- n (1 - μ^{'}) H (\hat{X} | X)}

, and (A27) follows because the conditional typicality lemma [34] (Chapter 2) implies that

P_{\hat{X} | X}^{n} (T_{μ^{'}}^{n} (P_{\hat{X} | X} (\cdot | x^{n}) | x^{n}) \geq 1 - μ^{'}

for n sufficiently large.

Combining (A23) and (A27), we obtain

\begin{matrix} (A28) & H ({\hat{X}}^{n} | X^{n}) & \geq n {(1 - μ^{'})}^{2} H (\hat{X} | X) \sum_{x^{n} \in T_{μ}^{n} (P_{X})} P_{X}^{n} (x^{n}) \\ (A29) & \geq n {(1 - μ^{'})}^{2} (1 - μ) H (\hat{X} | X), \end{matrix}

where (A29) follows because the AEP [36] (Theorem 3.1.1) implies that

P_{X}^{n} (T_{μ}^{n} (P_{X})) \geq 1 - μ

for n sufficiently large.

Hence, we have

\begin{matrix} (A30) & I (X^{n}; {\hat{X}}^{n}) & = H ({\hat{X}}^{n}) - H ({\hat{X}}^{n} | X^{n}) \\ (A31) & \leq n H (\hat{X}) - H ({\hat{X}}^{n} | X^{n}) \\ (A32) & \leq n H (\hat{X}) - n (1 - μ^{''}) H (\hat{X} | X) \\ (A33) & = n I (X; \hat{X}) + n μ^{''} H (\hat{X} | X) \\ (A34) & \leq n L + n μ^{''} \cdot log | \hat{X} | \\ (A35) & = n L + n ζ, \end{matrix}

where

μ^{''} ≜ 1 - {(1 - μ^{'})}^{2} (1 - μ) \geq 0

, and

ζ ≜ μ^{''} \cdot log | \hat{X} |

.

Next, consider the privacy analysis under

H = 1

. Please note that when

P_{X} = Q_{X}

, the analysis is similar to that of

H = 0

. Thus, we assume that

P_{X} \neq Q_{X}

in the following. From (A22), the privacy measure

T_{n}

satisfies:

n T_{n} = I (X^{n}; {\hat{X}}^{n}) \leq H ({\hat{X}}^{n}) .

(A36)

To upper bound

H ({\hat{X}}^{n})

, we calculate the probability

P_{{\hat{X}}^{n}} ({\hat{x}}^{n})

for

{\hat{x}}^{n} = 0^{n}

as follows:

\begin{matrix} (A37) & P_{{\hat{X}}^{n}} (0^{n}) & = \sum_{x^{n} \in T_{μ}^{n} (P_{X})} P_{\hat{X} | X}^{n} (0^{n} | x^{n}) \cdot Q_{X}^{n} (x^{n}) + \sum_{x^{n} \notin T_{μ}^{n} (P_{X})} P_{\hat{X} | X}^{n} (0^{n} | x^{n}) \cdot Q_{X}^{n} (x^{n}) \\ (A38) & = \sum_{x^{n} \notin T_{μ}^{n} (P_{X})} P_{\hat{X} | X}^{n} (0^{n} | x^{n}) \cdot Q_{X}^{n} (x^{n}) \\ (A39) & = \sum_{x^{n} \notin T_{μ}^{n} (P_{X})} Q_{X}^{n} (x^{n}) \\ (A40) & = 1 - Q_{X}^{n} (T_{μ}^{n} (P_{X})) \\ (A41) & \geq 1 - 2^{- n (D (P_{X} ∥ Q_{X}) + δ (μ))} ≜ 1 - γ_{n}, \end{matrix}

where

γ_{n} \to 0

exponentially fast as

n \to \infty

. Here, (A38) follows because if

x^{n} \in T_{μ}^{n} (P_{X})

, then

P_{\hat{X} | X}^{n} (0^{n} | x^{n}) = 0

, (A39) follows because when

x^{n} \notin T_{μ}^{n} (P_{X})

, then

P_{\hat{X} | X}^{n} (0^{n} | x^{n}) = 1

and (A41) follows from Sanov’s theorem and the continuity of the relative entropy in its first argument [38] (Lemma 1.2.7).

Write

H ({\hat{X}}^{n})

as

H (P_{{\hat{X}}^{n}})

and let

P_{0^{n}}

be the distribution on

{\hat{X}}^{n}

that places all its probability mass on

0^{n} \in {\hat{X}}^{n}

. Since

H (P_{0^{n}}) = 0

, by the uniform continuity of entropy [38] (Lemma 1.2.7),

\begin{matrix} H (P_{{\hat{X}}^{n}}) \leq 2 | P_{{\hat{X}}^{n}} - P_{0^{n}} | \cdot log \frac{| \hat{X} |^{n}}{2 | P_{{\hat{X}}^{n}} - P_{0^{n}} |} . \end{matrix}

(A42)

Since

γ_{n} \to 0

exponentially fast, the same holds true for

| P_{{\hat{X}}^{n}} - P_{0^{n}} |

and so by (A42),

H (P_{{\hat{X}}^{n}}) = H ({\hat{X}}^{n}) \to 0

. Therefore, under

H = 1

, we have

T_{n} \to 0

as

n \to \infty

.

Letting

n \to \infty

and then letting

μ, μ^{'}, γ \to 0

, we obtain

{\tilde{θ}}_{μ} \to θ

and

{lim sup}_{n \to n} T_{n} \leq L

, with

θ

given by the RHS of (11). This establishes the proof of Theorem 1.

Appendix B. Proof of Theorem 2

Achievability: The analysis is based on the scheme of Section 4.2. It follows similar steps as in [1]. Recall the definition of the event

E

in (A4). Consider the type-I error probability as follows:

\begin{matrix} (A43) & α_{n} & \leq Pr [M = 0 or E | H = 0] \\ (A44) & \leq Pr [M = 0 | H = 0] + Pr [E | M \neq 0, H = 0] \\ (A45) & \leq ϵ / 2 + ϵ / 2 \\ (A46) & = ϵ, \end{matrix}

where (A46) follows from covering lemma [34] (Lemma 3.3) and the rate constraint in (15), and also the Markov lemma [34] (Lemma 12.1). Now, consider the type-II error probability as follows:

\begin{matrix} (A47) & β_{n} & = Pr [\hat{H} = 0 | H = 1] \\ (A48) & = Pr [\hat{H} = 0, M \neq 0 | H = 1] \\ (A49) & \leq Pr [\hat{H} = 0 | H = 1, M \neq 0] \\ (A50) & = Pr [\hat{H} = 0 | H = 1, M = 1], \end{matrix}

where the last equality follows from the symmetry of the code construction. Now, the average of type-II error probability over all codebooks satisfies:

\begin{matrix} E_{C} [β_{n}] \leq 2^{- n [I (U; Y) - δ (μ)]}, \end{matrix}

(A51)

where

δ (μ)

is a function that tends to zero as

μ \to 0

. The privacy analysis is straightforward since the privacy mechanism is memoryless whence we have

\begin{matrix} \frac{1}{n} I (X^{n}; {\hat{X}}^{n}) = I (X; \hat{X}) = L + ζ, \end{matrix}

(A52)

where the last equality follows from the privacy constraint in (15). This concludes the proof of achievability.

Converse: Now, we prove the strong converse. It involves an extension of the

η

-image characterization technique [4,38]. The proof steps are given as follows. First, we find a truncated distribution

P_{{\underset{̲}{\hat{X}}}^{n}}

which is arbitrarily close to

P_{\hat{X}}^{n}

in terms of entropy. Then, we analyze the type-II error probability under a constrained type-I error probability. Finally, a single-letter characterization of the rate and privacy constraints is given.

(1): Construction of a Truncated Distribution:

Since the privacy machanism is memoryless, we conclude that

(X^{n}, {\hat{X}}^{n}, Y^{n})

is i.i.d. according to

P_{X \hat{X} Y} ≜ P_{\hat{X} | X} P_{X Y}

. For a given

P_{\hat{X} Y}

, define

V^{n} (y^{n} | {\hat{x}}^{n}) ≜ P_{Y | \hat{X}}^{n} (y^{n} | {\hat{x}}^{n})

for all

{\hat{x}}^{n} \in {\hat{X}}^{n}

and

y^{n} \in Y^{n}

. A set

B \subseteq Y^{n}

is an

η

-image of the set

A \subseteq {\hat{X}}^{n}

over the channel

V^{n}

if

V^{n} (B | {\hat{x}}^{n}) \geq η, \forall {\hat{x}}^{n} \in A .

(A53)

The privacy mechanism is the same under both hypotheses, thus, we can define the acceptance region based on

({\hat{x}}^{n}, y^{n})

as follows:

\begin{matrix} A_{n} ≜ \{({\hat{x}}^{n}, y^{n}) : g^{(n)} (y^{n}, ϕ^{(n)} ({\hat{x}}^{n})) = 0\} . \end{matrix}

(A54)

For any encoding function

ϕ^{(n)}

and an acceptance region

A_{n} \subseteq {\hat{X}}^{n} \times Y^{n}

, let

τ_{n}

denote the cardinality of codebook and define the following sets:

\begin{matrix} C_{i} & \overset{Δ}{=} \{{\hat{x}}^{n} \in {\hat{X}}^{n} : ϕ^{(n)} ({\hat{x}}^{n}) = i\}, \end{matrix}

(A55)

\begin{matrix} D_{i} & \overset{Δ}{=} \{y^{n} \in Y^{n} : g^{(n)} (y^{n}, i) = 0\}, 1 \leq i \leq τ_{n} . \end{matrix}

(A56)

The acceptance region can be written as follows:

\begin{matrix} A_{n} = ⋃_{i = 1}^{τ_{n}} (C_{i} \times D_{i}), \end{matrix}

(A57)

where

C_{i} \cap C_{j} = ϕ

for all

i \neq j

. Define the set

B_{n} (η)

as follows:

\begin{matrix} B_{n} (η) ≜ \{{\hat{x}}^{n} : V^{n} (D_{ϕ^{(n)} ({\hat{x}}^{n})} | {\hat{x}}^{n}) \geq η\} . \end{matrix}

(A58)

Fix

ϵ \in [0, 1)

and notice that the type-I error probability is upper-bounded as

\begin{matrix} α_{n} & = P_{\hat{X} Y}^{n} (A_{n}^{c}) \leq ϵ, \end{matrix}

(A59)

which we can write equivalently as

\begin{matrix} (A60) & 1 - ϵ & \leq P_{\hat{X} Y}^{n} (A_{n}) \\ (A61) & = \sum_{{\hat{x}}^{n} \in B_{n} (η)} P_{\hat{X}}^{n} ({\hat{x}}^{n}) V^{n} (D_{ϕ^{(n)} ({\hat{x}}^{n})} | {\hat{x}}^{n}) + \sum_{{\hat{x}}^{n} \in B_{n}^{c} (η)} P_{\hat{X}}^{n} ({\hat{x}}^{n}) V^{n} (D_{ϕ^{(n)} ({\hat{x}}^{n})} | {\hat{x}}^{n}) \\ (A62) & \leq P_{\hat{X}}^{n} (B_{n} (η)) + η (1 - P_{\hat{X}}^{n} (B_{n} (η))), \end{matrix}

where the first term is because

V^{n} (D_{ϕ^{(n)} ({\hat{x}}^{n})} | {\hat{x}}^{n}) \leq 1

; and the second term is because for any

{\hat{x}}^{n} \in B_{n}^{c} (η)

, we have

V^{n} (D_{ϕ^{(n)} ({\hat{x}}^{n})} | {\hat{x}}^{n}) < η

.

In what follows, let

η = \frac{1 - ϵ}{2}

. Inequality (A62) implies

\begin{matrix} P_{\hat{X}}^{n} (B_{n} (η)) \geq \frac{1 - ϵ}{1 + ϵ} . \end{matrix}

(A63)

Let

μ_{n} = n^{- 1 / 3}

. For the typical set

T_{μ_{n}}^{n} (P_{\hat{X}})

, we have

\begin{matrix} P_{\hat{X}}^{n} (T_{μ_{n}}^{n} (P_{\hat{X}})) \geq 1 - \frac{| \hat{X} |}{2 μ_{n} n} . \end{matrix}

(A64)

Hence,

\begin{matrix} (A65) & P_{\hat{X}}^{n} (T_{μ_{n}}^{n} (P_{\hat{X}}) \cap B_{n} (η)) & \geq P_{\hat{X}}^{n} (T_{μ_{n}}^{n} (P_{\hat{X}})) + P_{\hat{X}}^{n} (B_{n} (η)) - 1 \\ (A66) & \geq \frac{1 - ϵ}{1 + ϵ} - \frac{| \hat{X} |}{2 μ_{n} n} . \end{matrix}

For any

0 < δ < \frac{1 - ϵ}{1 + ϵ}

and for sufficiently large n,

\begin{matrix} P_{\hat{X}}^{n} (T_{μ_{n}}^{n} (P_{\hat{X}}) \cap B_{n} (η)) & \geq δ . \end{matrix}

(A67)

We can also write

T_{μ_{n}}^{n} (P_{\hat{X}})

as

\begin{matrix} T_{μ_{n}}^{n} (P_{\hat{X}}) = ⋃_{{\hat{P}}_{\hat{X}} : | {\hat{P}}_{\hat{X}} - P_{\hat{X}} | \leq μ_{n}} T^{n} ({\hat{P}}_{\hat{X}}) . \end{matrix}

(A68)

Combining the above equations, we get

\begin{matrix} \sum_{{\hat{P}}_{\hat{X}} : | {\hat{P}}_{\hat{X}} - P_{\hat{X}} | \leq μ_{n}} P_{\hat{X}}^{n} (T^{n} ({\hat{P}}_{\hat{X}}) \cap B_{n} (η)) \geq δ . \end{matrix}

(A69)

Let

{\tilde{P}}_{\hat{X}}

denote the type which maximizes the

P_{\hat{X}}^{n}

-probability of the type class among all such types. As there exist at most

{(n + 1)}^{| \hat{X} |}

possible types, it holds that

\begin{matrix} P_{\hat{X}}^{n} (T^{n} ({\tilde{P}}_{\hat{X}}) \cap B_{n} (η)) \geq \frac{δ}{{(n + 1)}^{| \hat{X} |}} . \end{matrix}

(A70)

Define the set

Ψ_{n} (η) ≜ T^{n} ({\tilde{P}}_{\hat{X}}) \cap B_{n} (η)

. We can write the probability in (A70) as

\begin{matrix} (A71) & P_{\hat{X}}^{n} (T^{n} ({\tilde{P}}_{\hat{X}}) \cap B_{n} (η)) & = \sum_{{\hat{x}}^{n} \in Ψ_{n} (η)} P_{\hat{X}}^{n} ({\hat{x}}^{n}) \\ (A72) & = \sum_{{\hat{x}}^{n} \in Ψ_{n} (η)} 2^{- n [D ({\tilde{P}}_{\hat{X}} ∥ P_{\hat{X}}) + H_{{\tilde{P}}_{\hat{X}}} (\hat{X})]} \\ (A73) & \leq \sum_{{\hat{x}}^{n} \in Ψ_{n} (η)} 2^{- n [H (\hat{X}) - δ_{1}]} \end{matrix}

where

δ_{1} \to 0

as

n \to \infty

due to the fact that

D ({\tilde{P}}_{\hat{X}} ∥ P_{\hat{X}}) \geq 0

and

| {\tilde{P}}_{\hat{X}} - P_{\hat{X}} | \leq μ_{n}

so the entropies are also arbitrarily close. It then follows from (A70) and (A73) that

\begin{matrix} \frac{1}{n} log | Ψ_{n} (η) | \geq H (\hat{X}) - δ_{2}, \end{matrix}

(A74)

where

δ_{2} \to 0

as

μ_{n} \to 0

.

The encoding function

ϕ^{(n)}

partitions the set

Ψ_{n} (η)

into

τ_{n}

non-intersecting subsets

{S_{i}}_{i = 1}^{τ_{n}}

such that

ϕ^{(n)} ({\hat{x}}^{n}) = i

for any

{\hat{x}}^{n} \in S_{i}

. Define the following distribution:

\begin{matrix} P_{{\underset{̲}{\hat{X}}}^{n}} ({\hat{x}}^{n}) & ≜ \frac{P_{\hat{X}}^{n} ({\hat{x}}^{n}) \cdot 1 \{{\hat{x}}^{n} \in Ψ_{n} (η)\}}{P_{\hat{X}}^{n} (Ψ_{n} (η))} . \end{matrix}

(A75)

Please note that this distribution, henceforth denoted as

P^{(n)}

, corresponds to a uniform distribution over

Ψ_{n} (η)

because all sequences in

Ψ_{n} (η)

have the same type

{\tilde{P}}_{\hat{X}}

, and the probability is uniform on a type class under any i.i.d. measure.

Finally, define the following truncated joint distribution:

\begin{matrix} P_{\underset{̲}{M} {\underset{̲}{X}}^{n} {\underset{̲}{\hat{X}}}^{n} {\underset{̲}{Y}}^{n}} (m, x^{n}, {\hat{x}}^{n}, y^{n}) ≜ 1 \{ϕ^{(n)} (x^{n}) = m\} P_{X | \hat{X}}^{n} (x^{n} | {\hat{x}}^{n}) P_{{\underset{̲}{\hat{X}}}^{n}} ({\hat{x}}^{n}) P_{Y | X}^{n} (y^{n} | x^{n}) . \end{matrix}

(A76)

(2): Analysis of Type-II Error Exponent:

The proof of the upper bound on the error exponent relies on the following Lemma A1. For a set

A \subseteq {\hat{X}}^{n}

, let

B (A, η)

denote the collection of all

η

-images of A, define

Q_{X \hat{X} Y} ≜ P_{\hat{X} | X} Q_{X Y}

and

\begin{matrix} κ_{V^{n}} (A, Q_{\hat{X} Y}, η) ≜ \frac{{min}_{B \in B (A, η)} Q_{\hat{X} Y}^{n} (A \times B)}{P_{\hat{X}}^{n} (A)} . \end{matrix}

(A77)

This quantity is a generalization of the minimum cardinality of the

η

-images in [38] and is closely related to the minimum type-II error probability associated with the set A.

For the testing against independence setup,

Q_{\hat{X} Y} = P_{\hat{X}} \cdot P_{Y}

, and thus

\begin{matrix} \frac{Q_{\hat{X} Y}^{n} (A \times B)}{P_{\hat{X}}^{n} (A)} = \frac{P_{\hat{X}}^{n} (A) P_{Y}^{n} (B)}{P_{\hat{X}}^{n} (A)} = P_{Y}^{n} (B), \end{matrix}

(A78)

and

κ_{V^{n}} (A, Q_{\hat{X} Y}, η)

is simply written as

κ_{V^{n}} (A, η)

and is given by

\begin{matrix} κ_{V^{n}} (A, η) ≜ min_{B \in B (A, η)} P_{Y}^{n} (B) . \end{matrix}

(A79)

Lemma A1

(Lemma 3 in [4]). For any set

A \subseteq {\hat{X}}^{n}

, consider a distribution

P_{A}^{(n)}

over A and let

P_{A}^{(n)} V^{n}

be its corresponding output distribution induced by the channel

V^{n}

, i.e.,

P_{A}^{(n)} V^{n} (y^{n}) ≜ \sum_{{\hat{x}}^{n} \in A} P_{A}^{(n)} ({\hat{x}}^{n}) V^{n} (y^{n} | {\hat{x}}^{n}) .

(A80)

Then, for every

δ^{'} > 0

,

0 < η < 1

, we have

\begin{matrix} κ_{V^{n}} (A, η) \geq 2^{- D (P_{A}^{(n)} V^{n} ∥ P_{Y}^{n}) - n δ^{'}} \end{matrix}

(A81)

for sufficiently large n.

Let

P_{i}^{(n)} V^{n}

be the distribution of the random variable

{\underset{̲}{Y}}^{n}

given

\underset{̲}{M} = i

. The type-II error probability can be lower-bounded as:

\begin{matrix} (A82) & β_{n} & \geq \sum_{{\hat{x}}^{n} \in Ψ_{n} (η)} P_{\hat{X}}^{n} ({\hat{x}}^{n}) \cdot P_{Y}^{n} (D_{ϕ^{(n)} ({\hat{x}}^{n})}) \\ (A83) & = \sum_{i = 1}^{τ_{n}} P_{\hat{X}}^{n} (S_{i}) \cdot P_{Y}^{n} (D_{i}) \\ (A84) & \geq \sum_{i = 1}^{τ_{n}} P_{\hat{X}}^{n} (S_{i}) \cdot κ_{V^{n}} (S_{i}, η) \\ (A85) & = P_{\hat{X}}^{n} (Ψ_{n} (η)) \cdot \sum_{i = 1}^{τ_{n}} P^{(n)} (S_{i}) \cdot κ_{V^{n}} (S_{i}, η) \\ (A86) & \geq 2^{- n δ^{'}} \cdot P_{\hat{X}}^{n} (Ψ_{n} (η)) \cdot \sum_{i = 1}^{τ_{n}} P^{(n)} (S_{i}) \cdot 2^{- D (P_{i}^{(n)} V^{n} ∥ P_{Y}^{n})} \\ (A87) & \geq 2^{- n δ^{'}} \cdot P_{\hat{X}}^{n} (Ψ_{n} (η)) \cdot 2^{- \sum_{i = 1}^{τ_{n}} P^{(n)} (S_{i}) \cdot D (P_{i}^{(n)} V^{n} ∥ P_{Y}^{n})} \\ (A88) & \geq \frac{2^{- n δ^{'}} δ}{{(n + 1)}^{| \hat{X} |}} \cdot 2^{- \sum_{i = 1}^{τ_{n}} P^{(n)} (S_{i}) \cdot D (P_{i}^{(n)} V^{n} ∥ P_{Y}^{n})}, \end{matrix}

where (A84) follows from the definition of

κ_{V^{n}} (S_{i}, η)

, (A86) follows because Lemma A1 implies that for any distribution

P_{i}^{(n)}

over the set

S_{i}

it holds that

κ_{V^{n}} (S_{i}, η) \geq 2^{- D (P_{i}^{(n)} V^{n} ∥ P_{Y}^{n}) - n δ^{'}}

, (A87) follows because of the convexity of the function

t \mapsto 2^{t}

, and (A88) follows by (A70) and the fact that

Pr (A) \geq Pr (A \cap B)

. Hence,

\begin{matrix} - \frac{1}{n} log β_{n} - δ^{''} \leq \frac{1}{n} \sum_{i = 1}^{τ_{n}} P^{(n)} (S_{i}) \cdot D (P_{i}^{(n)} V^{n} ∥ P_{Y}^{n}), \end{matrix}

(A89)

where

δ^{''} ≜ δ^{'} - \frac{1}{n} log \frac{δ}{{(n + 1)}^{| \hat{X} |}}

.

(3): Single-letterization Steps and Analyses of Rate and Privacy Constraints:

In the following, we proceed to provide a single-letter characterization of the upper bound in (A89). Considering the fact that

P^{(n)} (S_{i}) = P_{\underset{̲}{M}} (i)

, the right-hand-side of (A89) can be upper-bounded as follows:

\begin{matrix} (A90) & \frac{1}{n} \sum_{i = 1}^{τ_{n}} P^{(n)} (S_{i}) \cdot D (P_{i}^{(n)} V^{n} ∥ P_{Y}^{n}) & = \frac{1}{n} \sum_{i = 1}^{τ_{n}} \sum_{y^{n} \in Y^{n}} P_{\underset{̲}{M} {\underset{̲}{Y}}^{n}} (i, y^{n}) log \frac{P_{{\underset{̲}{Y}}^{n} | \underset{̲}{M}} (y^{n} | i)}{P_{Y}^{n} (y^{n})} \\ (A91) & = - \frac{1}{n} H ({\underset{̲}{Y}}^{n} | \underset{̲}{M}) - \frac{1}{n} \sum_{y^{n} \in Y^{n}} P_{{\underset{̲}{Y}}^{n}} (y^{n}) log P_{Y}^{n} (y^{n}) \\ (A92) & = - \frac{1}{n} H ({\underset{̲}{Y}}^{n} | \underset{̲}{M}) - \frac{1}{n} \sum_{y^{n} \in Y^{n}} P_{{\underset{̲}{Y}}^{n}} (y^{n}) \sum_{t = 1}^{n} log P_{Y} (y_{t}) \\ (A93) & = - \frac{1}{n} H ({\underset{̲}{Y}}^{n} | \underset{̲}{M}) - \frac{1}{n} \sum_{t = 1}^{n} \sum_{\begin{matrix} y^{n} \in Y^{n} \end{matrix}} P_{{\underset{̲}{Y}}^{n}} (y^{n}) log P_{Y} (y_{t}) \\ (A94) & = - \frac{1}{n} H ({\underset{̲}{Y}}^{n} | \underset{̲}{M}) - \frac{1}{n} \sum_{t = 1}^{n} \sum_{y_{t} \in Y} P_{{\underset{̲}{Y}}_{t}} (y_{t}) log P_{Y} (y_{t}) \\ (A95) & = - \frac{1}{n} H ({\underset{̲}{Y}}^{n} | \underset{̲}{M}) + \frac{1}{n} \sum_{t = 1}^{n} [H ({\underset{̲}{Y}}_{t}) + D (P_{{\underset{̲}{Y}}_{t}} ∥ P_{Y})] \\ (A96) & = \frac{1}{n} \sum_{t = 1}^{n} [H ({\underset{̲}{Y}}_{t}) - H ({\underset{̲}{Y}}_{t} | \underset{̲}{M}, {\underset{̲}{Y}}^{t - 1}) + D (P_{{\underset{̲}{Y}}_{t}} ∥ P_{Y})] \\ (A97) & \leq \frac{1}{n} \sum_{t = 1}^{n} I (\underset{̲}{M}, {\underset{̲}{X}}^{t - 1}, {\underset{̲}{\hat{X}}}^{t - 1}; {\underset{̲}{Y}}_{t}) + \frac{1}{n} \sum_{t = 1}^{n} D (P_{{\underset{̲}{Y}}_{t}} ∥ P_{Y}) \\ (A98) & = \frac{1}{n} \sum_{t = 1}^{n} I ({\underset{̲}{U}}_{t}; {\underset{̲}{Y}}_{t}) + \frac{1}{n} \sum_{t = 1}^{n} D (P_{{\underset{̲}{Y}}_{t}} ∥ P_{Y}) \\ (A99) & = I (\underset{̲}{U}; \underset{̲}{Y}) + D (P_{\underset{̲}{Y}} ∥ P_{Y}) . \end{matrix}

Here, (A96)–(A99) are justified in the following:

(A96) follows by the chain rule;
(A97) follows from the Markov chain ${\underset{̲}{Y}}^{t - 1} \to (\underset{̲}{M}, {\underset{̲}{X}}^{t - 1}, {\underset{̲}{\hat{X}}}^{t - 1}) \to {\underset{̲}{Y}}_{t}$ ;
(A98) follows from the definition

$\begin{matrix} {\underset{̲}{U}}_{t} ≜ (\underset{̲}{M}, {\underset{̲}{X}}^{t - 1}, {\underset{̲}{\hat{X}}}^{t - 1}); \end{matrix}$

(A100)
(A99) follows by defining a time-sharing random variable T over ${1, \dots, n}$ and the following

$\begin{matrix} \underset{̲}{U} \overset{Δ}{=} ({\underset{̲}{U}}_{T}, T), \underset{̲}{Y} \overset{Δ}{=} {\underset{̲}{Y}}_{T} . \end{matrix}$

(A101)

This leads to the following upper-bound on the type-II error exponent:

\begin{matrix} - \frac{1}{n} log β_{n} \leq I (\underset{̲}{U}; \underset{̲}{Y}) + D (P_{\underset{̲}{Y}} ∥ P_{Y}) + δ^{''} . \end{matrix}

(A102)

Next, the rate constraint satisfies the following:

\begin{matrix} (A103) & n R & \geq H (\underset{̲}{M}) \\ (A104) & \geq I (\underset{̲}{M}; {\underset{̲}{\hat{X}}}^{n}) \\ (A105) & = H ({\underset{̲}{\hat{X}}}^{n}) - H ({\underset{̲}{\hat{X}}}^{n} | \underset{̲}{M}) \\ (A106) & = log | Ψ_{n} (η) | - H ({\underset{̲}{\hat{X}}}^{n} | \underset{̲}{M}) \\ (A107) & \geq n (H (\hat{X}) - δ_{2}) - H ({\underset{̲}{\hat{X}}}^{n} | \underset{̲}{M}) \\ (A108) & = n H (\hat{X}) - \sum_{t = 1}^{n} H ({\underset{̲}{\hat{X}}}_{t} | {\underset{̲}{\hat{X}}}^{t - 1}, \underset{̲}{M}) - n δ_{2} \\ (A109) & = n H (\hat{X}) - \sum_{t = 1}^{n} H ({\underset{̲}{\hat{X}}}_{t} | {\underset{̲}{U}}_{t}) - n δ_{2} \\ (A110) & = n H (\hat{X}) - n H (\underset{̲}{\hat{X}} | \underset{̲}{U}) - n δ_{2} \end{matrix}

where (A106) follows because the distribution

P_{{\underset{̲}{\hat{X}}}^{n}}

is uniform over the set

Ψ_{n} (η)

; (A107) follows from (A74); (A109) follows from the definition in (A100); (A110) follows by defining

\underset{̲}{\hat{X}} ≜ {\underset{̲}{\hat{X}}}_{T}

.

Finally, the privacy measure satisfies the following:

\begin{matrix} (A111) & n L & \geq I (X^{n}; {\hat{X}}^{n}) \\ (A112) & \geq I ({\underset{̲}{X}}^{n}; {\underset{̲}{\hat{X}}}^{n}) \\ (A113) & = H ({\underset{̲}{\hat{X}}}^{n}) - H ({\underset{̲}{\hat{X}}}^{n} | {\underset{̲}{X}}^{n}) \\ (A114) & = log | Ψ_{n} (η) | - H ({\underset{̲}{\hat{X}}}^{n} | {\underset{̲}{X}}^{n}) \\ (A115) & = n (H (\hat{X}) - δ_{2}) - H ({\underset{̲}{\hat{X}}}^{n} | {\underset{̲}{X}}^{n}) \\ (A116) & = n (H (\hat{X}) - δ_{2}) - \sum_{t = 1}^{n} H ({\underset{̲}{\hat{X}}}_{t} | {\underset{̲}{\hat{X}}}^{t - 1}, {\underset{̲}{X}}^{n}) \\ (A117) & \geq n (H (\hat{X}) - δ_{2}) - \sum_{t = 1}^{n} H ({\underset{̲}{\hat{X}}}_{t} | {\underset{̲}{X}}_{t}) \\ (A118) & = n H (\hat{X}) - n H (\underset{̲}{\hat{X}} | \underset{̲}{X}) - n δ_{2}, \end{matrix}

where (A112) follows because

({\underset{̲}{X}}^{n}, {\underset{̲}{\hat{X}}}^{n})

are functions of

(X^{n}, {\hat{X}}^{n})

and from data processing inequality, (A114) follows because

P_{{\underset{̲}{\hat{X}}}^{n}}

is uniform over the set

Ψ_{n} (η)

(see definition in (A75)), (A115) follows from (A74), and (A118) follows by defining

\underset{̲}{X} ≜ ({\underset{̲}{X}}_{T}, T)

,

\underset{̲}{\hat{X}} ≜ ({\underset{̲}{\hat{X}}}_{T}, T)

and choosing T uniformly over

{1, \dots, n}

.

Since

Ψ_{n} (η) \subseteq T^{n} ({\tilde{P}}_{\hat{X}})

, for any

\hat{x} \in \hat{X}

,

\begin{matrix} (A119) & P_{\underset{̲}{\hat{X}}} (\hat{x}) & = \frac{1}{n} \sum_{t = 1}^{n} P_{{\underset{̲}{\hat{X}}}_{t}} (\hat{x}) \\ (A120) & = \sum_{{\hat{x}}^{n} \in Ψ_{n} (η)} \frac{N (\hat{x} | {\hat{x}}^{n})}{n \cdot | Ψ_{n} (η) |} \\ (A121) & = {\tilde{P}}_{\hat{X}} (\hat{x}) . \end{matrix}

Recall that

| {\tilde{P}}_{\hat{X}} - P_{\hat{X}} | \leq μ_{n}

with

μ_{n} = n^{- 1 / 3}

. Hence, from (A121), it holds that

| P_{\underset{̲}{\hat{X}}} - P_{\hat{X}} | \leq μ_{n}

. By the definitions of

\underset{̲}{\hat{X}}

,

\underset{̲}{X}

and

\underset{̲}{Y}

, we have

P_{\hat{X} | X} = P_{\underset{̲}{\hat{X}} | \underset{̲}{X}}

and

P_{Y | X} = P_{\underset{̲}{Y} | \underset{̲}{X}} = V

. The random variable U is chosen over the same alphabet as

\underset{̲}{U}

and such that

P_{U | \hat{X}} = P_{\underset{̲}{U} | \underset{̲}{\hat{X}}}

.

Since

P_{Y} (y) > 0

for all

y \in Y

, letting

n \to \infty

and

μ_{n} \to 0

and the uniform continuity of the involved information-theoretic quantities yields the following upper bound on the optimal error exponent:

\begin{matrix} θ_{ϵ}^{*} (R, L) \leq I (U; Y), \end{matrix}

(A122)

subject to the rate constraint:

\begin{matrix} R \geq I (U; \hat{X}), \end{matrix}

(A123)

and the privacy constraint:

\begin{matrix} L & \geq & I (X; \hat{X}) . \end{matrix}

(A124)

This concludes the proof of converse.

Appendix C. Proof of Converse of Proposition 1

We simplify Theorem 2 for the proposed binary setup. As discussed in Section 4.3, from the fact that

| \hat{X} | = 2

and the symmetry of the source X on its alphabet, without loss of optimality, we can choose

P_{\hat{X} | X}

to be a BSC. First, consider the rate constraint:

\begin{matrix} (A125) & R & \geq I (U; \hat{X}) \\ (A126) & = H (\hat{X}) - H (\hat{X} | U) \\ (A127) & = 1 - H (\hat{X} | U), \end{matrix}

which can be equivalently written as the following:

\begin{matrix} H (\hat{X} | U) & \geq 1 - R . \end{matrix}

(A128)

Also, the privacy criterion can be simplified as follows:

\begin{matrix} (A129) & L & \geq I (\hat{X}; X) \\ (A130) & = H (\hat{X}) - H (\hat{X} | X) \\ (A131) & = 1 - H (\hat{X} | X) \\ (A132) & = 1 - H (\hat{Z}), \end{matrix}

which can be equivalently written as

\begin{matrix} H (\hat{Z}) \geq 1 - L . \end{matrix}

(A133)

Now, consider the error exponent

θ

as follows:

\begin{matrix} (A134) & θ & \leq I (U; Y) \\ (A135) & = H (Y) - H (Y | U) \\ (A136) & = H (Y) - H (X \oplus N | U) \\ (A137) & = H (Y) - H (\hat{X} \oplus \hat{Z} \oplus N | U) \\ (A138) & \leq H (Y) - h_{b} (h_{b}^{- 1} (H (\hat{X} | U)) ⋆ h_{b}^{- 1} (1 - L) ⋆ q) \\ (A139) & \leq H (Y) - h_{b} (h_{b}^{- 1} (1 - R) ⋆ h_{b}^{- 1} (1 - L) ⋆ q), \end{matrix}

where (A138) follows from Mrs. Gerber’s lemma [39] (Theorem 1) and the fact that

(\hat{Z}, N)

is independent of U and also from (A133); (A139) follows from (A128). This concludes the proof of the proposition.

Appendix D. Euclidean Approximation of Testing agianst Independence

We analyze the Euclidean approximation with the parameters,

W

,

ψ_{u} (\hat{x})

,

ϕ_{\hat{x}} (x)

and

Λ_{u} (x)

defined in Section 4.4. Notice that since

U \to \hat{X} \to X \to Y

forms a Markov chain, it holds that, for any

u \in U

,

\begin{matrix} P_{Y | U = u} = W P_{X | U = u} . \end{matrix}

(A140)

Now, consider the following chain of equalities for any

x \in X

:

\begin{matrix} (A141) & P_{X | U} (x | u) & = \sum_{\hat{x} \in \hat{X}} P_{X \hat{X} | U} (x, \hat{x} | u) \\ (A142) & = \sum_{\hat{x} \in \hat{X}} P_{\hat{X} | U} (\hat{x} | u) P_{X | \hat{X}, U} (x | \hat{x}, u) \\ (A143) & = \sum_{\hat{x} \in \hat{X}} P_{\hat{X} | U} (\hat{x} | u) P_{X | \hat{X}} (x | \hat{x}) \\ (A144) & = \sum_{\hat{x} \in \hat{X}} (P_{\hat{X}} (\hat{x}) + ψ_{u} (\hat{x})) (P_{X} (x) + ϕ_{\hat{x}} (x)) \\ (A145) & = P_{X} (x) + \sum_{\hat{x} \in \hat{X}} ψ_{u} (\hat{x}) ϕ_{\hat{x}} (x) + \sum_{\hat{x} \in \hat{X}} P_{\hat{X}} (\hat{x}) ϕ_{\hat{x}} (x) + P_{X} (x) \sum_{\hat{x} \in \hat{X}} ψ_{u} (\hat{x}) \\ (A146) & = P_{X} (x) + \sum_{\hat{x} \in \hat{X}} ψ_{u} (\hat{x}) ϕ_{\hat{x}} (x), \end{matrix}

where (A143)—(A146) are justified in the following:

(A143) follows from the Markov chain $U \to \hat{X} \to X$ where given $\hat{X}$ , U and X are independent;
(A144) follows from (30) and (36);
(A146) follows from (31) and also from (36) which yields the following:

$\begin{matrix} \sum_{\hat{x} \in \hat{X}} P_{\hat{X}} (\hat{x}) \cdot ϕ_{\hat{x}} (x) = 0 . \end{matrix}$

(A147)

With the definition of

Λ_{u} (x)

in (42), we can write

\begin{matrix} P_{X | U} (x | u) & = P_{X} (x) + Λ_{u} (x), \forall x \in X, u \in U . \end{matrix}

(A148)

Thus, we get

\begin{matrix} (A149) & P_{Y | U = u} & = W P_{X} + W Λ_{u} \\ (A150) & = P_{Y} + W Λ_{u} . \end{matrix}

Applying the

χ^{2}

-approximation and using (A150), we can rewrite

I (U; Y)

as follows:

\begin{matrix} I (U; Y) & \approx \frac{1}{2} log e \sum_{u \in U} P_{U} (u) {∥{[\sqrt{P_{Y}}]}^{- 1} W Λ_{u}∥}^{2} \end{matrix}

(A151)

The above approximation with the definition of the vector

Λ_{u}

in (43) yields the optimization problem in (45).

Appendix E. Proof of Proposition 2

Achievability: We specialize the achievable scheme of Theorem 2 to the proposed Gaussian setup. We choose the auxiliary random variables as in (63) and (65). Notice that from the Markov chain

U \to \hat{X} \to X \to Y

and also the Gaussian choice of

\hat{X}

in (63) which was discussed in Section 4.5, we can write

Y = ρ \hat{X} + F

where

F \sim N (0, 1 - ρ^{2} \cdot (1 - 2^{- 2 L}))

is independent of

\hat{X}

. These choices of auxiliary random variables lead to the following rate constraint:

\begin{matrix} R \geq \frac{1}{2} log (\frac{1 - 2^{- 2 L}}{β^{2}}), \end{matrix}

(A152)

which can be equivalently written as:

\begin{matrix} 2^{- 2 R} \cdot (1 - 2^{- 2 L}) \leq β^{2} . \end{matrix}

(A153)

The optimal error exponent is also lower bounded as follows

\begin{matrix} θ_{ϵ}^{*} (R, L) \geq \frac{1}{2} log (\frac{1}{1 - ρ^{2} \cdot (1 - 2^{- 2 L} - β^{2})}) . \end{matrix}

(A154)

Combining (A153) and (A154) gives the lower bound on the error exponent in (64).

Converse: Consider the following upper bound on the optimal error exponent in Theorem 2:

\begin{matrix} (A155) & θ_{ϵ}^{*} (R, L) & \leq I (U; Y) \\ (A156) & = h (Y) - h (Y | U) \\ (A157) & = \frac{1}{2} log (2 π e) - h (Y | U) \\ (A158) & = \frac{1}{2} log (2 π e) - h (ρ \hat{X} + F | U) \\ (A159) & \leq \frac{1}{2} log (2 π e) - \frac{1}{2} log (2^{2 h (ρ \hat{X} | U)} + 2 π e (1 - ρ^{2} \cdot (1 - 2^{- 2 L}))) \\ (A160) & \leq \frac{1}{2} log (2 π e) - \frac{1}{2} log (ρ^{2} 2^{2 h (\hat{X} | U)} + 2 π e (1 - ρ^{2} \cdot (1 - 2^{- 2 L}))), \end{matrix}

where (A159) follows from the entropy power inequality (EPI) [34] (Chapter 2). Now, consider the rate constraint as follows:

\begin{matrix} (A161) & R & \geq I (\hat{X}; U) \\ (A162) & = h (\hat{X}) - h (\hat{X} | U) \\ (A163) & = \frac{1}{2} log (2 π e (1 - 2^{- 2 L})) - h (\hat{X} | U), \end{matrix}

which is equivalent to

\begin{matrix} 2^{2 h (\hat{X} | U)} \geq 2 π e \cdot 2^{- 2 R} \cdot (1 - 2^{- 2 L}) . \end{matrix}

(A164)

Considering (A160) with (A164) yields the following upper bound on the error exponent:

\begin{matrix} (A165) & θ_{ϵ}^{*} (R, L) & \leq \frac{1}{2} log (2 π e) - \frac{1}{2} log (2 π e ρ^{2} 2^{- 2 R} (1 - 2^{- 2 L}) + 2 π e (1 - ρ^{2} (1 - 2^{- 2 L}))) \\ (A166) & = \frac{1}{2} log (\frac{1}{1 - ρ^{2} (1 - 2^{- 2 R}) (1 - 2^{- 2 L})}) . \end{matrix}

This concludes the proof of the proposition.

References

Ahlswede, R.; Csiszàr, I. Hypothesis testing with communication constraints. IEEE Trans. Inf. Theory 1986, 32, 533–542. [Google Scholar] [CrossRef]
Zhao, W.; Lai, L. Distributed testing against independence with multiple terminals. In Proceedings of the 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 30 September–3 October 2014; pp. 1246–1251. [Google Scholar]
Xiang, Y.; Kim, Y.H. Interactive hypothesis testing against independence. In Proceedings of the 2013 IEEE International Symposium on Information Theory, Istanbul, Turkey, 7–12 July 2013; pp. 2840–2844. [Google Scholar]
Tian, C.; Chen, J. Successive refinement for hypothesis testing and lossless one-helper problem. IEEE Trans. Inf. Theory 2008, 54, 4666–4681. [Google Scholar] [CrossRef]
Rahman, M.S.; Wagner, A.B. On the optimality of binning for distributed hypothesis testing. IEEE Trans. Inf. Theory 2012, 58, 6282–6303. [Google Scholar] [CrossRef]
Sreekuma, S.; Gündüz, D. Distributed Hypothesis Testing Over Noisy Channels. In Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017. [Google Scholar]
Salehkalaibar, S.; Wigger, M.; Timo, R. On hypothesis testing against independence with multiple decision centers. arXiv 2017, arXiv:1708.03941. [Google Scholar] [CrossRef]
Salehkalaibar, S.; Wigger, M.; Wang, L. Hypothesis Testing In Multi-Hop Networks. arXiv 2017, arXiv:1708.05198. [Google Scholar]
Mhanna, M.; Piantanida, P. On secure distributed hypothesis testing. In Proceedings of the 2015 IEEE International Symposium on Information Theory (ISIT), Hong Kong, China, 14–19 June 2015; pp. 1605–1609. [Google Scholar]
Han, T.S. Hypothesis testing with multiterminal data compression. IEEE Trans. Inf. Theory 1987, 33, 759–772. [Google Scholar] [CrossRef]
Shimokawa, H.; Han, T.; Amari, S.I. Error bound for hypothesis testing with data compression. IEEE Trans. Inf. Theory 1994, 32, 533–542. [Google Scholar]
Ugur, Y.; Aguerri, I.E.; Zaidi, A. Vector Gaussian CEO Problem Under Logarithmic Loss and Applications. arXiv 2018, arXiv:1811.03933. [Google Scholar]
Zaidi, A.; Aguerri, I.E.; Caire, G.; Shamai, S. Uplink oblivious cloud radio access networks: An information theoretic overview. In Proceedings of the 2018 Information Theory and Applications Workshop (ITA), San Diego, CA, USA, 11–16 February 2018. [Google Scholar]
Aguerri, I.E.; Zaidi, A.; Caire, G.; Shamai, S. On the capacity of cloud radio access networks with oblivious relaying. In Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017. [Google Scholar]
Aguerri, I.E.; Zaidi, A. Distributed information bottleneck method for discrete and Gaussian sources. In Proceedings of the 2018 International Zurich Seminar on Information and Communication (IZS), Zurich, Switzerland, 21–23 February 2018. [Google Scholar]
Aguerri, I.E.; Zaidi, A. Distributed variational representation learning. arXiv 2018, arXiv:1807.04193. [Google Scholar]
Evfimievski, A.V.; Gehrke, J.; Srikant, R. Limiting privacy breaches in privacy preserving data mining. In Proceedings of the Twenty-Second Symposium on Principles of Database Systems, San Diego, CA, USA, 9–11 June 2003; pp. 211–222. [Google Scholar]
Smith, G. On the Foundations of Quantitative Information Flow. In Proceedings of the 12th International Conference on Foundations of Software Science and Computational Structures: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009; Springer: Berlin/Heidelberg, Germany, 2009; pp. 288–302. [Google Scholar]
Sankar, L.; Rajagopalan, S.R.; Poor, H.V. Utility-Privacy Tradeoffs in Databases: An Information-Theoretic Approach. IEEE Trans. Inf. Forensics Secur. 2013, 8, 838–852. [Google Scholar] [CrossRef]
Liao, J.; Sankar, L.; Tan, V.Y.F.; Calmon, F. Hypothesis Testing Under Mutual Information Privacy Constraints in the High Privacy Regime. IEEE Trans. Inf. Forensics Secur. 2018, 13, 1058–1071. [Google Scholar] [CrossRef]
Barthe, G.; Köpf, B. Information-theoretic bounds for differentially private mechanisms. In Proceedings of the 2011 IEEE 24th Computer Security Foundations Symposium, Cernay-la-Ville, France, 27–29 June 2011; pp. 191–204. [Google Scholar]
Issa, I.; Wagner, A.B. Operational definitions for some common information leakage metrics. In Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017; pp. 769–773. [Google Scholar]
Liao, J.; Sankar, L.; Calmon, F.; Tan, V.Y.F. Hypothesis testing under maximal leakage privacy constraints. In Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017; pp. 779–783. [Google Scholar]
Wagner, I.; Eckhoff, D. Technical Privacy Metrics: A Systematic Survey. ACM Comput. Surv. (CSUR) 2018, 51. to appear. [Google Scholar] [CrossRef]
Dwork, C. Differential Privacy. In Proceedings of the 33rd International Colloquium on Automata, Languages and Programming, Part II (ICALP 2006); Springer: Venice, Italy, 2006; Volume 4052, pp. 1–12. [Google Scholar]
Dwork, C.; Kenthapadi, K.; McSherry, F.; Mironov, I.; Naor, M. Our Data, Ourselves: Privacy Via Distributed Noise Generation. In Advances in Cryptology (EUROCRYPT 2006); Springer: Saint Petersburg, Russia, 2006; Volume 4004, pp. 486–503. [Google Scholar]
Dwork, C. Differential Privacy: A Survey of Results; Chapter Theory and Applications of Models of Computation; TAMC 2008; Lecture Notes in Computer Science; Springer: Heidelberg, Germany, 2008; Volume 4978. [Google Scholar]
Wasserman, L.; Zhou, S. A statistical framework for differential privacy. J. Am. Stat. Assoc. 2010, 105, 375–389. [Google Scholar] [CrossRef]
Sreekumar, A.C.; Gunduz, D. Distributed hypothesis testing with a privacy constraint. arXiv 2018, arXiv:1806.02015. [Google Scholar]
Borade, S.; Zheng, L. Euclidean Information Theory. In Proceedings of the 2008 IEEE International Zurich Seminar on Communications, Zurich, Switzerland, 12–14 March 2008; pp. 14–17. [Google Scholar]
Huang, S.; Suh, C.; Zheng, L. Euclidean information theory of networks. IEEE Trans. Inf. Theory 2015, 61, 6795–6814. [Google Scholar] [CrossRef]
Viterbi, A.J.; Omura, J.K. Principles of Digital Communication and Coding; McGraw-Hill: New York, NY, USA, 1979. [Google Scholar]
Weinberger, N.; Merhav, N. Optimum tradeoffs between the error exponent and the excess-rate exponent of variable rate Slepian-Wolf coding. IEEE Trans. Inf. Theory 2015, 61, 2165–2190. [Google Scholar] [CrossRef]
El Gamal, A.; Kim, Y.H. Network Information Theory; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
Shalaby, H.M.H.; Papamarcou, A. Multiterminal detection with zero-rate data compression. IEEE Trans. Inf. Theory 1992, 38, 254–267. [Google Scholar] [CrossRef]
Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley: Hoboken, NJ, USA, 2006. [Google Scholar]
Watanabe, S. Neyman-Pearson Test for Zero-Rate Multiterminal Hypothesis Testing. IEEE Trans. Inf. Theory 2018, 64, 4923–4939. [Google Scholar] [CrossRef]
Csiszàr, I.; Körner, J. Information theory: Coding Theorems for Discrete Memoryless Systems; Academic Press: New York, NY, USA, 1982. [Google Scholar]
Wyner, A.D.; Ziv, J. A theorem on the entropy of certain binary sequences and applications (Part I). IEEE Trans. Inf. Theory 1973, 19, 769–772. [Google Scholar] [CrossRef]

Figure 1. Hypothesis testing with communication and privacy constraints.

Figure 2.

θ_{ϵ}^{*} (R, L)

versus L for

q = 0.1

and various values of R.

Figure 2.

θ_{ϵ}^{*} (R, L)

versus L for

q = 0.1

and various values of R.

Figure 3.

θ_{ϵ}^{*} (R \approx 0, L \approx 0)

versus L for

q = 0.1

and

R = L

.

Figure 3.

θ_{ϵ}^{*} (R \approx 0, L \approx 0)

versus L for

q = 0.1

and

R = L

.

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gilani, A.; Belhadj Amor, S.; Salehkalaibar, S.; Tan, V.Y.F. Distributed Hypothesis Testing with Privacy Constraints. Entropy 2019, 21, 478. https://doi.org/10.3390/e21050478

AMA Style

Gilani A, Belhadj Amor S, Salehkalaibar S, Tan VYF. Distributed Hypothesis Testing with Privacy Constraints. Entropy. 2019; 21(5):478. https://doi.org/10.3390/e21050478

Chicago/Turabian Style

Gilani, Atefeh, Selma Belhadj Amor, Sadaf Salehkalaibar, and Vincent Y. F. Tan. 2019. "Distributed Hypothesis Testing with Privacy Constraints" Entropy 21, no. 5: 478. https://doi.org/10.3390/e21050478

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Distributed Hypothesis Testing with Privacy Constraints

Abstract

1. Introduction

1.1. Injecting Privacy Considerations into Our System

1.2. Description of Our System Model

1.3. Main Contributions

1.4. Notation

1.5. Organization

2. System Model

3. General Hypothesis Testing

3.1. Achievable Error Exponent

3.2. Coding Scheme

3.3. Discussion

4. Hypothesis Testing against Independence with a Memoryless Privacy Mechanism

4.1. Optimal Error Exponent

4.2. Coding Scheme

4.3. Binary Example

4.4. Euclidean Approximation

4.5. Gaussian Setup

5. Summary and Discussion

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A. Proof of Theorem 1

Appendix B. Proof of Theorem 2

Appendix C. Proof of Converse of Proposition 1

Appendix D. Euclidean Approximation of Testing agianst Independence

Appendix E. Proof of Proposition 2

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI