Channel-Supermodular Entropies: Order Theory and an Application to Query Anonymization

Américo, Arthur; Khouzani, MHR; Malacaria, Pasquale

doi:10.3390/e24010039

Open AccessArticle

Channel-Supermodular Entropies: Order Theory and an Application to Query Anonymization^†

by

Arthur Américo

^‡,

MHR Khouzani

^‡ and

Pasquale Malacaria

^*,‡

School of Electronic Engineering and Computer Science, Queen Mary University of London, London E1 4NS, UK

^*

Author to whom correspondence should be addressed.

^†

Presented at the IEEE Information Theory Workshop, Visby, Gotland, Sweden, 25–28 August 2019.

^‡

These authors contributed equally to this work.

Entropy 2022, 24(1), 39; https://doi.org/10.3390/e24010039

Submission received: 17 November 2021 / Revised: 13 December 2021 / Accepted: 20 December 2021 / Published: 25 December 2021

(This article belongs to the Section Information Theory, Probability and Statistics)

Download

Browse Figures

Versions Notes

Abstract

:

This work introduces channel-supermodular entropies, a subset of quasi-concave entropies. Channel-supermodularity is a property shared by some of the most commonly used entropies in the literature, including Arimoto–Rényi conditional entropies (which include Shannon and min-entropy as special cases), k-tries entropies, and guessing entropy. Based on channel-supermodularity, new preorders for channels that strictly include degradedness and inclusion (or Shannon ordering) are defined, and these preorders are shown to provide a sufficient condition for the more-capable and capacity ordering, not only for Shannon entropy but also regarding analogous concepts for other entropy measures. The theory developed is then applied in the context of query anonymization. We introduce a greedy algorithm based on channel-supermodularity for query anonymization and prove its optimality, in terms of information leakage, for all symmetric channel-supermodular entropies.

Keywords:

information theory; quantitative information flow; channel ordering; more-capable; less-noisy; broadcast channels; anonymity

1. Introduction

The idea of preorders over channels goes back a long way in the history of information theory. For instance, in [1], Shannon introduced the “inclusion” preorder to compare the capacities of discrete memoryless channels. Several authors, such as El Gamal [2], Korner and Marton [3], and many more, made further significant contributions to the study of channel preorders.

Such preorders are of practical importance in information theory. For example, the “more capable” preorder [3] is used in calculating the capacity region of broadcast channels [2], or in deciding whether a system is more secure than another [4,5]. As discussed in the book by Cohen, Kempermann, and Zbaganu [6], the applications of preorders over stochastic matrices goes beyond the field of information theory, for instance, to statistics, economics, and population sciences.

In this work, which is an extension of the results in our previous work [7], we introduce a new preorder over channels. To illustrate the key idea, consider the following channel:

(\begin{matrix} 0.3 & 0.5 & 0.2 \\ 0.6 & 0.4 & 0 \\ 0.2 & 0.3 & 0.5 \end{matrix}) .

Now build a new channel from it as follows: take the first two columns, and for each row, rearrange their pairwise entries such that the larger element is moved to the first column, and the smaller element is in the second column. This yields:

(\begin{matrix} 0.5 & 0.3 & 0.2 \\ 0.6 & 0.4 & 0 \\ 0.3 & 0.2 & 0.5 \end{matrix}) .

We will refer to this pairwise operation on columns as a Join-Meet operation. We prove that, for most commonly used entropy measures, a Join-Meet operation always increases the posterior entropy. More precisely, the posterior entropy of the derived channel is never less than the posterior entropy of the original channel, for any probability distribution on the input. That is, the original channel is more capable [3] than the derived channel. We name the entropies respecting this property channel-supermodular, and prove they entail Arimoto–Rényi entropies (including Shannon and min-entropy), and the guessing entropy, as well as some other entropies that are motivated from security and privacy contexts.

We define the supermodular preorder (

\geq_{s}

) based on the Join-Meet operator. In particular, given channels

K_{1}

and

K_{2}

, we say

K_{1} \geq_{s} K_{2}

iff

K_{2}

can be obtained from

K_{1}

via a finite sequence of Join-Meet operations. We establish that the supermodular preorder is neither included nor does it include the degradedness [8] or inclusion (Shannon) [1] preorders. Motivated by this, we define two other channel preorders (

\geq_{ds}

and

\geq_{shs}

) that strictly include the aforementioned ones, respectively. The relation

K_{1} \geq_{ds} K_{2}

implies that

K_{1}

is “more capable” than

K_{2}

. Moreover, whenever

K_{1} \geq_{shs} K_{2}

, then the capacity of

K_{1}

is higher than that of

K_{2}

. Several such new channel ordering results are proven in this paper based on channel-supermodularity.

Next, we will consider the applications of channel-supermodularity in the context of security and privacy. The starting point will be the channel design problem, which is the problem of designing a channel that leaks the least amount of confidential information while respecting a set of operational constraints. This kind of problem arises in many security systems, such as authentication systems [9], operating systems functions [10], scheduling protocols [11], bucketing schemes in cryptography [12], anonymity (Section 6.2), and so on. In the context of these applications, the problem is particularly interesting for deterministic systems and deterministic solutions. Solutions which are unique across many measures of leakage are also of interest because they are robust against how the knowledge or abilities of the attackers are modeled. In this work, we present a robust anonymity mechanism. The algorithm is based on a result from [13], which uses the properties of channel-supermodularity presented in this paper to derive a greedy channel design algorithm that is provably optimal and unique for all channel-supermodular measures of leakage. We apply our robust anonymity mechanism to query anonymization: we consider the problem in which the real query itself is the secret, and we also consider the scenario where a related attribute is the secret. We provide optimal solutions for these two problems based on channel-supermodularity.

1.1. Related Literature

In the “information theory” literature, the degradedness order was introduced by Cover [8] in the study of broadcasting channels. Cover conjectured a solution for the capacity region of broadcast channels that satisfy the degradedness ordering, which was proved by Bergmans [14,15] and Gallager [16]. The problem of determining the capacity region of broadcast channels also motivated Korner and Marton to introduce the

H_{1}

-less noisy and

H_{1}

-more capable orderings [3]. In the same paper, Korner and Marton established the capacity region for broadcast channels that respect the

H_{1}

-less noisy ordering. A similar result for broadcast channels respecting the

H_{1}

-more capable ordering was later established by El Gamal [2].

Those orderings also play an important role in the field of quantitative information flow (QIF), which is concerned with quantifying information leakage in computational systems (we refer to [17] for a review of QIF). Malacaria [18] makes use of degradedness ordering to reason about the security of deterministic programs, proving it is equivalent to the

H_{1}

,

H_{\infty}

, and

H_{G}

-more capable orderings for deterministic channels. This ordering also appears in the work of Alvim et al. [19], in which it is shown to imply the more capable ordering for the g-entropy family, a generalizing framework for information measures used in QIF. In the same paper, they conjectured that if two channels satisfy the more capable ordering for all members of the g-entropy family, they satisfy the degradedness order. This conjecture, which was proven by McIver et al. [20], turns out to be equivalent to Blackwell’s theorem in the finite setting [21]. The less noisy ordering also appears recently in QIF literature, especially the

H_{\infty}

-less noisy ordering, in the study of Dalenius leakage by Bordenabe and Smith [22]. This last work is also closely related to the classical implications of the work by Buscemi [23], which is mainly focused on the

H_{\infty}

-less noisy ordering in quantum information theory.

Shannon ordering (or inclusion), which generalizes degradedness ordering, was first introduced by Shannon when studying channels that could be perfectly simulated by other channels [1]. In the same paper, Shannon established that inclusion implies

H_{1}

-capacity ordering. This ordering has been the object of study of several recent works. Inspired by Le Cam’s concept of deficiency [24], which is itself related to the degradedness order and Blackwell’s theorem, Raginsky [25] defines Shannon deficiency, which may be seen as a measure of how far two channels are from satisfying the inclusion order. Techniques to verify Shannon ordering, both algebraic and computational, were studied by Zhang and Tepedelenlioǧlu [26], and Nasser [27] gave two different characterizations of the Shannon ordering.

The abstract problem of designing a system that leaks the least amount of information under some generalized form of operational constraints has been the objective of recent exploration in the literature [28,29,30]. The problem of optimal system design in security settings has been studied within specific contexts, including secure multi-party computation systems [31] and countermeasures against timing attacks [12,32]. The general channel design problem is of particular significance in QIF, as it represents a paradigm shift from the earlier foundational research in the area, which focuses mostly on measuring information leakage for existing channels or systems [4,5,19,33,34].

Query obfuscation and anonymity have been investigated by several authors and even implemented in commercial products. Related to our work are algorithms presented in [35,36,37]. Compared to those works, our approach follows an order-theoretical methodology and is based on a more general notion of entropy.

All the results in this paper rely on the concept of core-concave entropies, a generalizing framework that has been recently developed and can be shown to generalize the most commonly used conditional entropy measures in the literature [28]. Core-concavity is a generalization of concavity, which has also been considered as a defining or desirable property for generalized information measures in QIF [38] and in information theory [39,40,41,42].

1.2. Notational Conventions

Throughout the paper,

X, Y, Z, \dots

represent discrete random variables with (nonempty, finite) alphabets

X, Y, Z, \dots

. We assume that the elements of each alphabet are ordered, denoting by

x_{1}, \dots, x_{| X |}

the elements of

X

, by

y_{1}, \dots, y_{| Y |}

the elements of

Y

, and so on. Given

x_{i} \in X

, we write

p (x_{i})

or

p_{i}

to mean

P r {X = x_{i}}

, and use p to refer to the (categorical) distribution (as a vector). We may specify the r.v. with a subscript, for example, writing

p_{X} (x)

, if it is not clear from the context.

We denote by

Δ_{n} \subset R^{n}

the

(n - 1)

-dimensional probability simplex. Given a probability distribution p over

{x_{1}, \dots, x_{n}}

, we overload the notation and use p to refer to its probability vector

(p_{1}, \dots, p_{n}) \in Δ_{n}

. We write

(p_{[1]}, p_{[2]}, \dots, p_{[n]})

for the nonincreasing rearrangement of

p = (p_{1}, \dots, p_{n})

—that is,

p_{[1]}

denotes largest element of p,

p_{[2]}

the second largest, and so forth. Given a vector

r = (r_{1}, \dots, r_{n}) \in R^{n}

, we denote by

{∥r∥}_{α}

its

α

-norm

{(\sum r_{i}^{α})}^{1 / α}

(note that this is a slight abuse of nomenclature, since it is not a norm when

α < 1

).

Given a function F over

Δ_{n}

and a random variable X with distribution

p = (p_{1}, \dots, p_{n})

, we use

F (X)

,

F (p_{1}, \dots, p_{n})

and

F (p)

interchangeably.

A channel K is a row stochastic matrix with rows indexed by

X

and columns indexed by

Y

. The value

K (x, y)

is equal to

p (y | x) = P r {Y = y | X = x}

, that is, the conditional probability that y is produced by the channel K when x is the input value. The notation

K : X \to Y

means that the channel K has

X

and

Y

as input and output alphabets, respectively.

p (x, y) = p_{X} (x) K (y | x) .

Channels are represented in a table format, or simply in the matrix notation, for example:

\begin{matrix} K & y_{1} & y_{2} \\ x_{1} & 0.5 & 0.5 \\ x_{2} & 0.6 & 0.4 \\ x_{3} & 0.2 & 0.8 \end{matrix}, (\begin{matrix} 0.5 & 0.5 \\ 0.6 & 0.4 \\ 0.2 & 0.8 \end{matrix}),

with the understanding that the ith row corresponds to

x_{i}

, and the jth column to

y_{j}

.

2. Preliminaries

2.1. Core-Concave Entropies

The main result this work provides is the monotonicity of conditional entropy with regard to the Join-Meet operator, which will be introduced in Section 3. This holds not only for Shannon entropy, but also for a number of different entropy measures, including the Arimoto–Rényi entropies (which include Shannon and min-entropy as limit cases) [43], and the guessing entropy [44].

Most of the results in this paper concern the aforementioned entropies, which are instances of what we call channel-supermodular entropies. To define a channel supermodular entropy, however, we first need a generalizing framework. To this end, we introduce the core-concave entropies based on the framework introduced in [28]. Besides the ones aforementioned, they also include the Tsallis [45] and Sharma–Mittal [46] entropies.

Definition 1.

A “core-concave” entropy H is a pair

(η, F)

such that

$F : Δ_{n} \to R$ is a concave and continuous function,
η is a strictly increasing continuous real-valued function, defined on the image of the function F.

Given a core-concave

H = (η, F)

, we define

H (X) = η (F (p_{X}))

. The set of core-concave entropies will be denoted by

H

.

Besides the value

H (X)

, which we refer to as the unconditional form, we also define a conditional form for the core-concave entropies, with relation to two random variables

X, Y

.

Definition 2.

The “conditional form” of a core-concave entropy

H = (η, F)

is defined as

\begin{matrix} H (X | Y) = η (\sum_{y \in supp (Y)} p (y) F (X | y)) \end{matrix}

where

supp (Y)

is the support of Y.

As claimed before, core-concave entropies encompass the most common entropies in the literature. Some of these are summarized in Table 1, together with their conditional form. Notice that H is used to denote an arbitrary core-concave entropy, while

H_{1}

refers to Shannon entropy.

Based on Definitions 1 and 2, we define a notion of mutual information and channel capacity for each

H \in H

.

Definition 3.

Let

H \in H

. The H-mutual information is defined as

I_{H} (X; Y) = H (X) - H (X | Y) .

When X and Y are respectively the input and output of a channel K, the H-channel capacity of K is defined as

C_{H} (K) = max_{p_{X}} I_{H} (X; Y) .

Notice that, in general, H-mutual information is not symmetric.

Core-concave entropies satisfy the data-processing inequality.

Theorem 1

([28] Proposition 2(b)). Let

X, Y, Z

be random variables such that

X \to Y \to Z

(i.e., X and Z are conditionally independent given Y). Then, for all

H \in H

,

H (X | Y) \leq H (X | Z) .

2.2. Preorders over Channels

Let

K_{1} : X \to Y

,

K_{2} : X \to Z

be channels which share an input X and produce outputs

Y, Z

.

A channel

K_{2}

is degraded from

K_{1}

[8], written as

K_{1} \geq_{d} K_{2}

, if there exists a channel

R : Y \to Z

such that

K_{2} = K_{1} R

.

In [1], Shannon introduced a preorder which includes the one above. A channel

K_{1}

includes

K_{2}

, written as

K_{1} \geq_{sh} K_{2}

, if there exists a family of tuples

{(g_{i}, T_{i}, R_{i})}_{i}

of channels

T_{i}, R_{i}

and non-negative real numbers

g_{i}

such that:

\begin{matrix} K_{2} = \sum_{i} g_{i} T_{i} K_{1} R_{i} and \sum_{i} g_{i} = 1 . \end{matrix}

(1)

As noted by Shannon, any channel can be expressed as a convex combination of deterministic channels. Thus, whenever

K_{1} \geq_{sh} K_{2}

, it is possible to choose

{(g_{i}, T_{i}, R_{i})}_{i}

such that

T_{i}, R_{i}

are deterministic channels and (1) holds.

The two preorders defined above are not dependent on a particular entropy but on the structure of the channel. The next preorders depend on the choice of a core-concave entropy H and generalize preorders introduced in [3].

A channel

K_{1}

is H-less noisy than

K_{2}

, denoted as

K_{1} \geq_{\ln}^{H} K_{2}

, if for all random variables U with finite support such that

U \to X \to (Y, Z)

,

\begin{matrix} I_{H} (U; Y) \geq I_{H} (U; Z) . \end{matrix}

K_{1}

is H-more capable than

K_{2}

, written as

K_{1} \geq_{mc}^{H} K_{2}

, if for all distributions of the input, X,

\begin{matrix} I_{H} (X; Y) \geq I_{H} (X; Z) . \end{matrix}

Finally,

K_{1} \geq_{c}^{H} K_{2}

stands for

C_{H} (K_{1}) \geq C_{H} (K_{2}) .

If

A \subset H

is a subset of core-concave entropies,

K_{1} \geq_{\ln}^{A} K_{2}

is defined as:

\forall H \in A, K_{1} \geq_{\ln}^{H} K_{2} .

This is similar for

\geq_{mc}^{A}

and

\geq_{c}^{A}

.

2.3. Relationships between Preorders

We now explore some relationships between the preorders defined above.

Proposition 1.

For any

H \in H

:

\begin{matrix} K_{1} \geq_{d} K_{2} \Rightarrow K_{1} \geq_{\ln}^{H} K_{2} \Rightarrow K_{1} \geq_{mc}^{H} K_{2} \Rightarrow K_{1} \geq_{c}^{H} K_{2} . \end{matrix}

Proof.

The first implication follows from Theorem 1, the second implication follows by choosing

U = X

in the definition of

\geq_{\ln}^{H}

, and the third is straightforward. □

While the reverse of the above implications is false, the following is true:

Theorem 2

([23]).

K_{1} \geq_{\ln}^{H_{\infty}} K_{2} \Rightarrow K_{1} \geq_{d} K_{2}

.

The following important theorem can be traced back to Blackwell’s result on comparison of experiments [21] and relates to more recent results in [20,48].

Theorem 3.

K_{1} \geq_{d} K_{2} \Leftrightarrow K_{1} \geq_{mc}^{H} K_{2}

.

3. Channel-Supermodular Entropies

Note that any specific core-concave entropy H induces a H-more capable preorder over channels. However, this preorder might not be preserved for a different choice of conditional entropy. Theorem 3 characterizes a channel preorder that is “consistent” for all core-concave entropies. As strong as this result is, it still leaves the question whether there exists a preorder that is consistent for a class of entropies of interest. This is motivated by the fact that the class of core-concave entropies include far more entropies than the conventionally used ones, so it may include some eccentric ones that can be excluded for a stronger result. Moreover, the degradedness relation between channels seems very restrictive: there are many channels that cannot be compared with respect to degradedness, but have consistent ordering with respect to all typically used entropies.

With these motivations in mind, we introduce channel-supermodular entropies. Channel-supermodularity is a property satisfied by a significant portion of commonly used entropies, being a helpful tool in optimization problems (as shown in Section 6.1).

The characterization of channel-supermodular entropies is linked to supermodular functions over the real lattice. These functions and some basic properties are introduced next. For details about supermodular functions please refer to [49] and [50] (Chapter 6.D).

Consider the set

R_{\geq 0}^{n}

of all n-dimensional vectors with no negative entries (i.e., the non-negative orthant of

R^{n}

). Let ⪯ represent the element-wise inequality, that is, given

r = (r_{1}, \dots, r_{n})

and

s = (s_{1}, \dots, s_{n})

,

r ⪯ s

iff

r_{i} \leq s_{i}

for all i. ⪯ is a partial order on

R_{\geq 0}^{n}

. In fact ⪯ defines a lattice over

R_{\geq 0}^{n}

, whose join ∨ and meet ∧ operations are defined as:

\begin{matrix} r \lor s = (max (r_{1}, s_{1}), \dots, max (r_{n}, s_{n})), \\ r \land s = (min (r_{1}, s_{1}), \dots, min (r_{n}, s_{n})) . \end{matrix}

Recall that:

Definition 4.

A function

ϕ : R_{+}^{n} \to R

is supermodular (over a lattice) if, for all

r, s \in R_{+}^{n}

,

ϕ (r \lor s) + ϕ (r \land s) \geq ϕ (r) + ϕ (s) .

Next, we introduce some fundamental definitions for this work:

Definition 5.

Let

H = (η, F)

be a core-concave entropy. Define the function

G_{F} : R_{\geq 0}^{n} \to R

as

G_{F} (r) : = {∥r∥}_{1} F (\frac{r}{{∥r∥}_{1}})

if

r

is not the null vector, and

G_{F} (0, \dots, 0) = 0

. (Notice that, as F is continuous over a compact set,

{lim}_{r \to (0, \dots, 0)} G_{F} (r) = 0

.)

Definition 6.

An entropy

H = (η, F) \in H

is said to be channel-supermodular if

G_{F}

is supermodular. The set of channel-supermodular entropies is noted by

S \subset H

.

The motivation for defining channel-supermodularity in terms of

G_{F}

might seem arbitrary, but it is justified for its relationship with conditional entropies, given by

H (X | Y) = η (\sum_{y \in Y} G_{F} (p_{(X, Y)} (x_{1}, y), \dots, p_{(X, Y)} (x_{n}, y))) .

(2)

Together with (2), the supermodularity of

G_{F}

can be a powerful tool for deriving results regarding conditional entropy and mutual information for entropies in

S

.

3.1. Examples of Channel-Supermodular and Non-Channel-Supermodular Entropies

In the next sections, we will study the implications of channel-supermodularity. The inequality in Definition 4 implies interesting behaviors regarding H-mutual information, as will be seen in Section 4. This property has immediate consequences for channel ordering and channel design, as will be explored in Section 5 and Section 6.

One appealing aspect of channel-supermodularity is that some of the most commonly used entropies in the literature belong to

S

, including Shannon and min-entropy, and more generally the Arimoto–Rényi entropies. In this section, we prove that these (and other entropies) indeed belong to

S

, and provide examples of entropies that do not. First, we state a useful characterization of supermodular functions, which is an immediate consequence of Corollary 2.6.1 in [49].

Let

ϕ : R_{\geq 0}^{n} \to R

and let

e_{1}, \dots e_{n}

denote the canonical basis of

R^{n}

. The function

ϕ

is supermodular if and only if, for all

r \in R_{+}^{n}

, all

δ_{1}, δ_{2} \geq 0

and all

i, j

with

i \neq j

,

ϕ (r + δ_{1} e_{i} + δ_{2} e_{j}) + ϕ (r) \geq ϕ (r + δ_{1} e_{i}) + ϕ (r + δ_{2} e_{j}) .

(3)

Moreover, if

ϕ

has second partial derivatives,

ϕ

is supermodular if and only if, for all

r \in R_{+}^{n}

and all

i, j

with

i \neq j

,

\frac{\partial^{2} ϕ (r)}{\partial r_{i} \partial r_{j}} \geq 0 .

(4)

The property characterized by Equations (3) and (4) is known in the economics literature as increasing differences [49]. This name is due to the effect an increase on a coordinate has on the value of

ϕ

being monotonically increasing with regard to the other coordinates. This is readily noticeable if we rearrange the terms of (3):

ϕ (r + δ_{1} e_{i} + δ_{2} e_{j}) - ϕ (r + δ_{2} e_{j}) \geq ϕ (r + δ_{1} e_{i}) - ϕ (r) .

That is, the change of

ϕ

prompted by an increase of

δ_{1}

in coordinate i is greater the greater the value of coordinate j. Equation (3) is thus just the statement that on the lattice

(R_{\geq 0}^{n}, ⪯)

increasing differences and supermodularity are equivalent concepts ([49] Corollary 2.6.1).

Using this result, as well as appealing directly to Definition 4, we now prove channel-supermodularity for a number of commonly used entropies. Throughout, given

r \in R_{\geq 0}^{n}

, we denote by

r_{i}

its ith coordinate.

Proposition 2.

1.: Shannon entropy is channel-supermodular.
2.: Arimoto–Rényi entropies are channel-supermodular for all $α \in (0, 1) \cup (1, \infty)$ .
3.: For any k, the k-tries entropy is channel-supermodular. In particular, min-entropy is channel-supermodular.
4.: Guessing entropy is channel-supermodular.

Proof.

See Appendix A. □

Items 3 and 4 of Proposition 2 are of particular interest to security applications, as guessing entropy and k-tries entropies have found interesting applications in the field of quantitative information flow.

Guessing entropy is especially useful in scenarios modelling brute-force attacks, as it models the expected number of attempts necessary for an adversary to obtain the value of a secret when trying one by one. On the other hand, k-tries entropy reflects the probability of guessing a value correctly when k guesses are allowed (see e.g., [19] (Section III.C)). It is defined as

H_{k - t r i e s} (p) = - log \sum_{i = 1}^{k} p_{[i]},

and can be readily seen to be core-concave by taking

η (x) = - log (- x)

and

F (p) = - \sum_{i = 1}^{k} p_{[i]}

. Notice that min entropy is equal to

H_{k - t r i e s}

when

k = 1

.

The results in Proposition 2 justify our interest in channel-supermodularity, as any property derived for entropies in

S

will also hold for this set of commonly used entropy measures. However, not all entropies are channel-supermodular.

This includes another interesting entropy family useful in security, the partition entropies [19]. Let

P

be a partition of the set

{1, \dots, n}

. The partition entropy with regard to

P

is given by

H_{P} (X) = - log max_{A \in P} \sum_{i \in A} p (x_{i}) .

It is easy to see that

H_{P}

is core-concave, by taking

η (x) = - log (- x)

and

F (p) = - {max}_{A \in P} \sum_{i \in A} p_{i}

. The partition entropy

H_{P}

is useful for capturing the uncertainty of an adversary that is interested in knowing only to which subset the realization of X pertains. This is an appropriate model for adversaries that are interested in obtaining some specific partial knowledge about some sensitive information (e.g., obtaining the home town or the DOB of a user).

Proposition 3.

1.: Hayashi–Rényi, Tsallis and Sharma–Mittal entropies (with conditional forms as in Table 1) are not channel-supermodular for all $α > 1$ whenever the input set is of size greater than 2. Moreover, they are also not channel-supermodular for all $α \in (0, 1)$ for some size of input set.
2.: The partition entropy is not, in general, channel-supermodular.

Proof.

See Appendix A. □

Notice that, for some choices of

P

,

H_{P}

is channel-supermodular. In particular, if

P = {{i} | i \in {1, \dots, n}}

,

H_{P}

coincides with min-entropy.

A generalization of partition entropy is the weighted partition entropies,

H_{P, w} (X) = - log max_{A \in P} \sum_{i \in A} p (x_{i}) w_{i}

where

w = (w_{1}, \dots, w_{n}) \in R_{\geq 0}^{n}

is a set of weights. Being a generalization of partition entropies, weighted partition entropies are also not channel-supermodular in general.

4. The Join-Meet Operator and a New Structural Order

In this section we address the claim made in Section 1, proving that the Join-Meet operation is monotonic with regard to conditional entropy for all channel-supermodular entropies.

Let

K : X \to Y

be a channel, with

Y = {y_{1}, \dots, y_{m}}

, and let

K^{i}

be the column of K corresponding to output

y_{i}

. Define, for

i \neq j

, the Join-Meet operator

⋄_{i, j}

as follows:

\begin{matrix} {(⋄_{i, j} K)}^{l} = \{\begin{matrix} K^{i} \lor K^{j} & if l = i, \\ K^{i} \land K^{j} & if l = j, \\ K^{l} & otherwise . \end{matrix} \end{matrix}

The next result proves that the Join-Meet operator is monotonic with

I_{H}

if

H \in S

.

Theorem 4.

For all channels

K_{1}

and all

i, j

,

K_{1} \geq_{mc}^{S} ⋄_{i, j} K_{1}

.

Proof.

Let

H = (η, F) \in S

and define

G_{F}

as in Definition 5. Let

K_{2} = ⋄_{i, j} K_{1}

, and denote by

Y_{1}

,

Y_{2}

the outputs of

K_{1}

,

K_{2}

. Notice that, for any distribution on the input, we have

p_{X, Y_{2}} (x_{k}, y_{i}) = max (p_{X, Y_{1}} (x_{k}, y_{i}), p_{X, Y_{1}} (x_{k}, y_{j}))

, and, similarly,

p_{X, Y_{2}} (x_{k}, y_{j}) = min (p_{X, Y_{1}} (x_{k}, y_{i}), p_{X, Y_{1}} (x_{k}, y_{j}))

. Thus,

\begin{matrix} \sum_{l} G_{F} (p_{X, Y_{2}} (x_{1}, y_{l}), \dots, p_{X, Y_{2}} (x_{n}, y_{l})) \\ \geq & \sum_{l = i, j} G_{F} (p_{X, Y_{1}} (x_{1}, y_{l}), \dots, p_{X, Y_{1}} (x_{n}, y_{l})) \\ + \sum_{l \neq i, j} G_{F} (p_{X, Y_{2}} (x_{1}, y_{l}), \dots, p_{X, Y_{2}} (x_{n}, y_{l})) \\ = & \sum_{l} G_{F} (p_{X, Y_{1}} (x_{1}, y_{l}), \dots, p_{X, Y_{1}} (x_{n}, y_{l})) \end{matrix}

where the inequality follows from

G_{F}

being supermodular. From Equation (2) and

η

being increasing, it follows that

H (X | Y_{1}) \leq H (X | Y_{2})

, which is equivalent to

I_{H} (X; Y_{1}) \geq I_{H} (X; Y_{2})

. □

In light of Theorem 4, one might wonder if the Join-Meet operator completely defines

S

—that is, whether

H \in S

whenever

K \geq_{mc}^{H} ⋄_{i, j} K

for all channels K and all

i, j

. In fact, an even stronger statement can be made by only considering a subset of channels.

Definition 7.

Let

K_{(k, l, ϵ_{1}, ϵ_{2})}

denote the channel with input alphabet

{x_{1}, \dots, x_{n}}

and output alphabet

{y_{1}, y_{2}}

, given by

K_{(k, l, ϵ_{1}, ϵ_{2})} (y | x) = \{\begin{matrix} 1 - ϵ_{1} & i f x = x_{k}, y = y_{1}, \\ ϵ_{1} & i f x = x_{k}, y = y_{2}, \\ ϵ_{2} & i f x = x_{l}, y = y_{1}, \\ 1 - ϵ_{2} & i f x = x_{l}, y = y_{2}, \\ \frac{1}{2} & o t h e r w i s e . \end{matrix}

Theorem 5.

Let

H = (η, F) \in H

. If for all

k, l \leq n

, and all

ϵ_{1}, ϵ_{2} \in [0, \frac{1}{2})

,

K_{(k, l, ϵ_{1}, ϵ_{2})} \geq_{mc}^{H} ⋄_{1, 2} K_{(k, l, ϵ_{1}, ϵ_{2})}

, then

H = (η, F) \in S

.

Proof.

We prove the contrapositive. Suppose that

H = (η, F) \notin S

. Then, from (3), there are

r = (r_{1}, \dots, r_{n}) \in R_{\geq 0}^{n}

,

i, j \leq n

with

i \neq j

and

δ_{1}, δ_{2} > 0

such that

G_{F} (r + δ_{1} e_{i} + δ_{2} e_{j}) + G_{F} (r) < G_{F} (r + δ_{1} e_{i}) + G_{F} (r + δ_{2} e_{j}) .

Let

γ = {(2 {∥r∥}_{1} + δ_{1} + δ_{2})}^{- 1}

and define a probability distribution over X by

p_{X} (x) = \{\begin{matrix} γ (2 r_{i} + δ_{1}), & if x = x_{i}, \\ γ (2 r_{j} + δ_{2}), & if x = x_{j}, \\ 2 γ r_{l}, & if x = x_{l}, l \neq i, j . \end{matrix}

Let

ϵ_{1} = \frac{r_{i}}{(2 r_{i} + δ_{1})}

and

ϵ_{2} = \frac{r_{j}}{(2 r_{j} + δ_{2})}

, and let

K_{1} = K_{(i, j, ϵ_{1}, ϵ_{2})}

,

K_{2} = ⋄_{1, 2} K_{1}

. Then,

\begin{matrix} H (X | Y_{1}) & = η (γ (G_{F} (r + δ_{1} e_{i}) + G_{F} (r + δ_{2} e_{j}))), & and \\ H (X | Y_{2}) & = η (γ (G_{F} (r + δ_{1} e_{i} + δ_{2} e_{j}) + G_{F} (r))), \end{matrix}

and thus, as

η

is strictly increasing,

H (X | Y_{1}) > H (X | Y_{2})

, which concludes the proof. □

An immediate consequence of Theorems 4 and 5 is that the Join-Meet operator completely characterizes

S

.

Corollary 1.

Let

H \in H

.

H \in S

if, and only if,

K \geq_{mc}^{H} ⋄_{i, j} K

for all channels K and all

i, j

.

A New Structural Ordering

Theorem 4 yields some immediate new results for reasoning about channel ordering, as, whenever

| X | > 2

, the Join-Meet operator is not, in general, captured by the degradedness ordering. Consider, for instance, the following channels

K_{1}

,

K_{2}

(notice that

K_{1} = K_{(1, 2, 0, 0)}

).

(\begin{matrix} 1 & 0 \\ 0 & 1 \\ \frac{1}{2} & \frac{1}{2} \end{matrix}) (\begin{matrix} 1 & 0 \\ 1 & 0 \\ \frac{1}{2} & \frac{1}{2} \end{matrix})

(5)

Then, we have that

K_{2} = ⋄_{1, 2} K_{1}

, but clearly

K_{1} ≱_{d} K_{2}

. To see this, fix a channel

R : Y_{1} \to Y_{2}

. Because R is a channel, there are

p, q \in [0, 1]

such that

R = (\begin{matrix} p & 1 - p \\ q & 1 - q \end{matrix}) .

Then, we have

K_{1} R = (\begin{matrix} p & 1 - p \\ q & 1 - q \\ (\frac{1}{2}) (p + q) & (\frac{1}{2}) (2 - p - q) \end{matrix}) .

Therefore,

K_{2} \neq K_{1} R

for any choice of

p, q

.

We formalize this observation in the next result.

Proposition 4.

1.: If $| X | = 2$ , then, for all $K : X \to Y$ and all $i, j \leq | Y |$ , $K \geq_{d} ⋄_{i, j} K$ .
2.: If $| X | > 2$ , then there are $K : X \to Y$ and $i, j \leq | Y |$ such that $K ≱_{d} ⋄_{i, j} K$ .

Proof.

We first prove (1). As it is possible to reorder columns by degrading a channel, without loss of generality let

i = 1

and

j = 2

. Fix

K : X \to Y

. Let

K_{k l} = K (y_{l} | x_{k})

, and suppose, again without loss of generality, that

K_{11} \geq K_{12}

and

K_{22} \geq K_{21}

.

If

K_{11} = K_{12}

or

K_{22} = K_{21}

, then

⋄_{1, 2} K

is obtainable by permutating columns of K, and therefore

K \geq_{d} ⋄_{1, 2} K

. Otherwise, we have

K_{11} K_{22} - K_{12} K_{21} > 0

, and

⋄_{1, 2} K = K R

where R is the following channel:

\begin{matrix} R & y_{1} & y_{2} & y_{3} & \dots & y_{m} \\ y_{1} & \frac{K_{22} K_{11} - K_{22} K_{12}}{K_{11} K_{22} - K_{12} K_{21}} & \frac{K_{22} K_{12} - K_{12} K_{21}}{K_{11} K_{22} - K_{12} K_{21}} & 0 & \dots & 0 \\ y_{2} & \frac{K_{11} K_{22} - K_{11} K_{21}}{K_{11} K_{22} - K_{12} K_{21}} & \frac{K_{11} K_{21} - K_{12} K_{21}}{K_{11} K_{22} - K_{12} K_{21}} & 0 & \dots & 0 \\ y_{3} & 0 & 0 & 1 & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ y_{m} & 0 & 0 & 0 & \dots & 1 \end{matrix}

For the proof of (2), it suffices to notice that, whenever

| X | > 2

,

K_{(1, 2, 0, 0)} ≱_{d} ⋄_{1, 2} K_{(1, 2, 0, 0)}

. The proof for general

| X |

is along the same lines of the argument after (5). □

Next we define the channel-supermodularity preorder over channels, which is based on the Join-Meet operators

⋄_{i, j}

.

Definition 8.

K_{1} \geq_{s} K_{2}

if there is a finite collection of tuples

(i_{k}, j_{k})

such that

K_{2} = ⋄_{i_{1}, j_{1}} (⋄_{i_{2}, j_{2}} (\dots ⋄_{i_{m}, j_{m}} K_{1})) .

An induced preorder can be then defined by combining

\geq_{d}

and

\geq_{s}

as follows:

Definition 9.

K_{1} \geq_{ds} K_{2}

if there are channels

W_{1}, \dots, W_{n}

such that

K_{1} \geq_{0} W_{1} \geq_{1} \dots \geq_{n - 1} W_{n} \geq_{n} K_{2}

, where each

\geq_{i}

stands for

\geq_{d}

or

\geq_{s}

.

5. Relations between Preorders for Channel-Supermodular Entropies

Throughout this section, let

K_{1} : X \to Y

and

K_{2} : X \to Z

. First, note that Proposition 1 and Theorem 2 are still meaningful under

S

. The next proposition summarizes the relationship between

(\geq_{ds})

and the other preorders.

Proposition 5.

1.: $K_{1} \geq_{d} K_{2} \Rightarrow K_{1} \geq_{ds} K_{2}$ and $K_{1} \geq_{s} K_{2} \Rightarrow K_{1} \geq_{ds} K_{2}$ ,
2.: $K_{1} \geq_{ds} K_{2} ⇏ K_{1} \geq_{d} K_{2}$ ,
3.: $K_{1} \geq_{ds} K_{2} ⇏ K_{1} \geq_{s} K_{2}$ ,
4.: $K_{1} \geq_{ds} K_{2} ⇏ K_{1} \geq_{\ln}^{S} K_{2}$ ,
5.: $K_{1} \geq_{\ln}^{S} K_{2} \Rightarrow K_{1} \geq_{ds} K_{2}$ ,
7.: $K_{1} \geq_{\ln}^{H_{1}} K_{2} ⇏ K_{1} \geq_{ds} K_{2}$ ,
8.: $K_{1} \geq_{ds} K_{2} \Rightarrow K_{1} \geq_{mc}^{S} K_{2}$ ,
8.: $K_{1} \geq_{ds} K_{2} ⇏ K_{1} \geq_{sh} K_{2}$ ,
9.: $K_{1} \geq_{sh} K_{2} ⇏ K_{1} \geq_{ds} K_{2}$ .

Proof.

See Appendix B. □

Proposition 5.7 can be used to decide whether

K_{1} \geq_{mc}^{H} K_{2}

, for

H \in S

, by only using structural properties of the channel. Consider, for example, the following channels

K_{1}

,

K_{2}

.

(\begin{matrix} \frac{1}{2} & 0 & \frac{1}{2} \\ 0 & \frac{1}{2} & \frac{1}{2} \\ \frac{1}{2} & \frac{1}{2} & 0 \end{matrix}) (\begin{matrix} \frac{1}{4} & \frac{3}{4} \\ \frac{1}{4} & \frac{3}{4} \\ \frac{3}{5} & \frac{2}{5} \end{matrix}) .

In [19], the authors claimed to have no proof that

K_{1} \geq_{mc}^{H_{1}} K_{2}

. By Proposition 5 (7),

K_{1} \geq_{mc}^{H_{1}} K_{2}

can be proven as follows:

(\begin{matrix} \frac{1}{2} & 0 & \frac{1}{2} \\ 0 & \frac{1}{2} & \frac{1}{2} \\ \frac{1}{2} & \frac{1}{2} & 0 \end{matrix}) \geq_{s} (\begin{matrix} \frac{1}{2} & 0 & \frac{1}{2} \\ \frac{1}{2} & 0 & \frac{1}{2} \\ \frac{1}{2} & \frac{1}{2} & 0 \end{matrix}) \geq_{d} (\begin{matrix} \frac{1}{4} & \frac{3}{4} \\ \frac{1}{4} & \frac{3}{4} \\ \frac{3}{5} & \frac{2}{5} \end{matrix}) .

Notice that if

H

is substituted for

S

, Theorem 3 does not hold:

Proposition 6.

K_{1} \geq_{mc}^{S} K_{2} ⇏ K_{1} \geq_{d} K_{2}

and

K_{1} \geq_{mc}^{S} K_{2} ⇏ K_{1} \geq_{\ln}^{S} K_{2}

.

Proof.

From Proposition 5 (2), there are channels

K_{1}

,

K_{2}

such that

K_{1} \geq_{ds} K_{2}

and

K_{1} ≱_{d} K_{2}

. For such channels, Proposition 5 (7) implies

K_{1} \geq_{mc}^{S} K_{2}

, and the first result follows. The second result then follows by noting that Proposition 1 and Theorem 2 imply

K_{1} \geq_{\ln}^{S} K_{2} \Leftrightarrow K_{1} \geq_{d} K_{2}

. □

Proposition 7.

K_{1} \geq_{mc}^{S} K_{2} ⇏ K_{1} \geq_{sh} K_{2}

.

Results on Channel Capacity

In [1], Shannon proved that

K_{1} \geq_{sh} K_{2} \Rightarrow K_{1} \geq_{c}^{H_{1}} K_{2}

. We can use Theorem 4 to prove similar results for the preorder

\geq_{shs}

, which is an extension of

\geq_{sh}

with

\geq_{s}

.

Definition 10.

K_{1} \geq_{shs} K_{2}

if there are channels

W_{1}, \dots, W_{n}

such that

K_{1} \geq_{0} W_{1} \geq_{1} \dots \geq_{n - 1} W_{n} \geq_{n} K_{2}

where each

\geq_{i}

stands for

\geq_{sh}

or

\geq_{s}

.

We have:

Proposition 8.

For all channels

K_{1}, K_{2}

,

1.: $K_{1} \geq_{sh} K_{2} \Rightarrow K_{1} \geq_{shs} K_{2}$ ,
2.: $K_{1} \geq_{ds} K_{2} \Rightarrow K_{1} \geq_{shs} K_{2}$ ,
3.: $K_{1} \geq_{shs} K_{2} ⇏ K_{1} \geq_{s} K_{2}$ ,
4.: $K_{1} \geq_{shs} K_{2} ⇏ K_{1} \geq_{sh} K_{2}$ .

Proof.

(1) and (2) follow immediately from the definitions of

\geq_{ds}

and

\geq_{shs}

, and from observing that

\geq_{d} \Rightarrow \geq_{sh}

. (3) and (4) follow from (2) and Propositions 5 (3) and 5 (8). □

In the remainder of this section, we prove that

K_{1} \geq_{shs} K_{2}

is a sufficient condition for establishing that both the Shannon and min-capacity of

K_{1}

are at least as large as that of

K_{2}

.

Lemma 1.

Let

H \in S

. If

K_{1} \geq_{sh} K_{2} \Rightarrow K_{1} \geq_{c}^{H} K_{2}

, then

K_{1} \geq_{shs} K_{2} \Rightarrow K_{1} \geq_{c}^{H} K_{2}

.

Proof.

Let

H \in S

. From Propositions 1, 5 (1), and 5 (7),

K_{1} \geq_{s} K_{2} \Rightarrow K_{1} \geq_{c}^{H} K_{2}

. The result then follows from Definition 10. □

Proposition 9.

1.: $K_{1} \geq_{shs} K_{2} \Rightarrow K_{1} \geq_{c}^{H_{1}} K_{2}$ ,
2.: $K_{1} \geq_{shs} K_{2} \Rightarrow K_{1} \geq_{c}^{H_{\infty}} K_{2}$ ,
3.: $K_{1} \geq_{c}^{H_{1}} K_{2} ⇏ K_{1} \geq_{shs} K_{2}$ and $K_{1} \geq_{c}^{H_{\infty}} K_{2} ⇏ K_{1} \geq_{shs} K_{2}$ .

Proof.

See Appendix B. □

Figure 1 summarizes the implications between orderings. As can be seen, there are some open questions that need to be established, designated by dotted lines with a question mark. Note that the absence of an arrow means that the implication is known to be false.

6. Channel Design

Core-concavity was originally introduced in [51] in the context of universally optimal channel design—that is, the problem of finding, given some operational constraints, a channel leaking the minimum amount of confidential information (optimality), for all entropy-based measures of leakage (universality). This section shows how core-concavity and channel-supermodularity can be used in this context.

If X and Y are the input and output of a channel and H is a core-concave entropy, then the leakage about X through Y as measured by H is defined to be the H-mutual information

I_{H} (X; Y)

, as in Definition 3.

The concept of leakage is relevant in security/privacy contexts where X is some confidential data, K is modelling a system (e.g., a cryptographic computation or a database query system), and Y some observable generated by the system (e.g., the computation time or the result of a statistical query). Different H corresponds to different attackers’ models, and universal channel solutions identify countermeasures to leakage which are robust with regard to all attackers in that universe.

Minimizing leakage of sensitive information is usually a desirable goal. However, when designing systems, it is often the case that some leakage is unavoidable. With that in mind, some recent works in QIF aimed at obtaining channels that leak the least amount of information subject to some operational constraints [13,28,29]. From Definition 3, the problem can be rephrased as finding the channel which, subject to some operational constraints, maximizes

H (X | Y)

— or, as

η

is increasing, maximizes

\sum_{y} p (y) F (X | y)

. In a recent work [29], which considered a generalizing framework for these operational constraints, it was shown that this problem can be solved by convex optimization techniques, for a given core-concave H. However, it was also proven that the solution to the problem is in general not universal—that is, the optimal channel given a set of constraints may vary with the choice of H.

Despite this negative result, it was shown in [29] that some classes of problems admit a universal solution. As different entropies model different attackers, these results provide a very strong security guarantee—namely, that the optimal system in these situations is the most secure possible regardless of the attacker model. In the next few sections, we show how channel-supermodularity can be a useful tool in obtaining solutions that, while not universally optimal for all core-concave entropies, are the most secure for all symmetric entropies in

S

.

6.1. Deterministic Channel Design Problem: A Universal Solution by Channel-Supermodularity

In many applications, such as repeated queries, it is either undesirable or impractical to consider a “probabilistic” system. This motivates the study of the channel design problem restricted to deterministic channels, which has been recently investigated in [13].

It was proven in [13] that, similarly to the general channel design problem, the deterministic version does not in general accept a universal solution. Moreover, the problem was also shown to be, in general, NP-hard. However, it was also proven in this work that a specific class of problems, called the deterministic complete k-hypergraph design problem, admit a solution that is optimal for all symmetric channel-supermodular entropies. This problem can be defined as follows.

Definition 11.

Let

H \in H

,

k \in N_{> 0}

and let

X, Y

be finite sets with

| Y | \geq | X | / k

The deterministic complete k-hypergraph design problem (CKDP) is to find a channel

K : X \to Y

that maximizes

H (X | Y)

, subject to the following constraints:

$\forall x, y, K (y | x) \in {0, 1}$ , and
$\forall y, \sum_{x} K (y | x) \leq k$ .

That is, the deterministic CKDP is the problem of finding the most secure deterministic channel, subject to the constraint that each output can only be generated by at most k inputs.

For the remainder of this section, let

k \in N_{> 0}

, and fix a distribution

p_{X}

such that, without loss of generality,

p_{X} (x_{i}) \geq p_{X} (x_{j})

whenever

i < j

.

The greedy solution proposed by [13] is described in Algorithm 1. The algorithm is straightforward: it associates the k most likely secrets with the first observable; it then associates the k most likely secrets among the remaining secrets with the second observable, and so on. The solution for

X = {x_{1}, \dots, x_{8}}

and

k = 3

is depicted in Figure 2.

Algorithm 1 Greedy algorithm for the k-complete hypergraph problem

Input: Input set

X

, prior

p_{X}

and integer

k \leq | X |

Output: Matrix of optimal deterministic channel

K_{k}

1: initialize:

K_{k}

as a matrix of 0s, with

| X |

rows and

⌈ | X | / k ⌉

columns.

2: for

i \in {1, \dots, | X |}

do

3:

K_{k} (i, ⌈ i / k ⌉) = 1

4:return

K_{k}

Theorem 6

([13]). Given a complete k-hypergraph channel design problem, the solution given by Algorithm 1 is optimal for any symmetric channel-supermodular entropy.

Proof.

We reproduce the proof of this theorem from [13], as it provides an interesting application of channel-supermodularity. Let us consider the following joint matrix

J_{k}

obtained by Algorithm 1 and the prior:

\begin{matrix} (\begin{matrix} p_{1} & 0 & \dots & 0 \\ ⋮ & ⋮ & ⋮ \\ p_{k} & 0 & ⋮ \\ 0 & p_{k + 1} & ⋮ \\ ⋮ & ⋮ & ⋮ \\ ⋮ & p_{2 k} & ⋮ \\ ⋮ & 0 & ⋮ \\ ⋮ & ⋮ & ⋱ & ⋮ \\ ⋮ & ⋮ & 0 \\ ⋮ & ⋮ & p_{m k + 1} \\ ⋮ & ⋮ & ⋮ \\ 0 & 0 & \dots & p_{n} \end{matrix}) \end{matrix}

We now prove that any matrix J satisfying the constraints can be transformed into

J_{k}

by a sequence of steps each increasing (or keeping equal) any supermodular entropy. Each step consists of the following three sub-steps:

Select two columns $c_{i}, c_{j}$ and align the non-zero coefficients in $c_{i}, c_{j}$ ;
Perform $\land, \lor$ operations on the aligned columns and replace $c_{i}, c_{j}$ with $c_{i} \lor c_{j}, c_{i} \land c_{j}$ ;
Disalign the two columns $c_{i} \lor c_{j}, c_{i} \land c_{j}$ .

The following example illustrates one step (i.e., the three sub-steps above):

(\begin{matrix} 0 & 0.4 & 0 \\ 0 & 0 & 0.3 \\ 0 & 0.15 & 0 \\ 0 & 0 & 0.1 \\ 0.05 & 0 & 0 \end{matrix}) \overset{=}{\to} (\begin{matrix} 0 & 0.4 & 0.1 \\ 0 & 0.15 & 0.3 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0.05 & 0 & 0 \end{matrix}) \overset{\leq}{\to} (\begin{matrix} 0 & 0.4 & 0.1 \\ 0 & 0.3 & 0.15 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0.05 & 0 & 0 \end{matrix}) \overset{=}{\to} (\begin{matrix} 0 & 0.4 & 0 \\ 0 & 0.3 & 0 \\ 0 & 0 & 0.15 \\ 0 & 0 & 0.1 \\ 0.05 & 0 & 0 \end{matrix})

from left to right we have:

selected $c_{2}, c_{3}$ , which are the two columns containing the two most likely priors (i.e., 0.4 and 0.3) and aligned $c_{2}, c_{3}$ so that 0.4, 0.3 appear in different rows;
replaced $c_{2}, c_{3}$ with $c_{2} \lor c_{3}, c_{2} \land c_{3}$ ;
disaligned columns 2 and 3, that is, position values in $c_{2} \lor c_{3}, c_{2} \land c_{3}$ , so that each row has the same probability it had before the step.

Notice that aligning (and disaligning) is a permutation of a column; hence, they do not change the value of the posterior of symmetric channel-supermodular entropies because

G_{F} (c_{i}) = G_{F} (c_{i}^{'})

for any permutation

c_{i}^{'}

of the column

c_{i}

.

Next, for the remaining sub-step, where we replace

c_{i}, c_{j}

with

c_{i} \lor c_{j}, c_{i} \land c_{j}

, by supermodularity of G we have

G_{F} (c_{i}) + G_{F} (c_{j}) \leq G_{F} (c_{i} \lor c_{j}) + G_{F} (c_{i} \land c_{j})

; hence, that sub-step increases (or keeps equal) the posterior entropy. Notice also that the matrix at the end of the step has in each row the same probabilities as it had before that step; hence, it is still a joint matrix that respects the complete k-hypergraph constraints for the same prior.

The selection and alignment of columns is as follows: at the initial step select

c_{i}

such that

c_{i}

contains the first r elements with the highest probabilities, say

p_{1}, \dots, p_{r}

; if

r < k

, then select

c_{j}

as the column containing

p_{r + 1}

; align

c_{i}, c_{j}

so that

p_{r + 1}

is not on the same row as any of the

p_{1}, \dots, p_{r}

(and

c_{i} \lor c_{j}

has no more than k non-zero terms). Then,

c_{i} \lor c_{j}

will contain

p_{1}, \dots, p_{r + 1}

. Repeat until

r = k

. Then repeat the process considering the probabilities

p_{k + 1}, \dots, p_{n}

.

By repeating these steps, we will reach a matrix

J^{'}

with columns

c_{1}^{'}, \dots, c_{n}^{'}

such that each element of column

c_{i}^{'}

has higher probability than all elements of column

c_{i + 1}^{'}

. This is exactly the solution given by the greedy algorithm (modulo column permutations), that is,

J^{'} = J_{k}

. □

If H is not channel-supermodular, the greedy solution may not be optimal. Consider, for example, the Hayashi–Rényi entropies, which are not channel-supermodular. Let

X = {x_{1}, \dots, x_{4}}

,

p_{X} = (0.3, 0.3, 0.2, 0.2)

and

k = 3

, and consider the channels

K_{1}, K_{2}

with outputs

Y_{1}, Y_{2}

below.

\begin{matrix} K_{1} & y_{1} & y_{2} \\ x_{1} & 1 & 0 \\ x_{2} & 1 & 0 \\ x_{3} & 1 & 0 \\ x_{4} & 0 & 1 \end{matrix} \begin{matrix} K_{2} & y_{1} & y_{2} \\ x_{1} & 1 & 0 \\ x_{2} & 1 & 0 \\ x_{3} & 0 & 1 \\ x_{4} & 0 & 1 \end{matrix}

Then,

K_{1}

is the greedy solution. However, for Hayashi–Rényi entropies, the following limit holds [52]:

lim_{α \to \infty} H_{α}^{'} (X | Y) = - log (max_{y \in \sim ≊ ∥ (Y), x \in X} p_{X | y} (x)),

Therefore,

lim_{α \to \infty} H_{α}^{'} (X | Y_{1}) = 0,

whereas

lim_{α \to \infty} H_{α}^{'} (X | Y_{2}) = 1 .

Thus,

H_{α}^{'} (X | Y_{1}) < H_{α}^{'} (X | Y_{2})

for large enough

α

, and the greedy solution is not optimal.

Another example of core-concave functions for which Algorithm 1 is not optimal is provided by “partition” entropies. For example, if B is a partition of the possible values of X then

H_{B} (X) = - log max_{S \in B} \sum_{x \in S} p_{X} (x)

is a core-concave entropy which is not channel-supermodular.

6.2. An Application to Query Anonymity

Let us consider the following anonymity mechanism problem: we want to design an anonymity mechanism where in order to conceal a secret query from an eavesdropper, the user sends to a server a set of k queries which includes the secret query. Then, once received from the server the response to all the k queries, the user retrieves the response to the secret query. In our setting, this corresponds to each observable having a pre-image of size exactly k.

As an illustrative example consider a Twitter user who wants to visit some other Twitter user page but wants to keep this query secret. To solve this problem, he decides to use the following protocol: whenever he visits the desired user page, he also sends

k - 1

other queries to the pages of other Twitter users. Suppose further that this user frequently visits this user’s page, meaning that a random choice of the other queries is not a wise strategy, since multiple observations would end up revealing more and more information about the query, eventually completely revealing the secret query. The problem is then: which set of k Twitter pages will leak the least information about the user secret query?

We assume the attacker has no background information about the user, and hence we set the probability of a Twitter query for that user as the probability that a general member of the public requests that Twitter page (a good proxy to this measure can be derived by the number of followers of that Twitter page). Let n be the number of possible queries (i.e., the input set). Considering the scenario that n is divisible by k, we can use Algorithm 1 to solve this problem.

Notice that for n secrets, there are

n! {(\frac{n}{k}!)}^{- 1} {(k!)}^{- \frac{n}{k}}

possible ways to satisfy these anonymity constraints. For example, there are about

7 \times 10^{85}

possible solutions when

n = 100

and

k = 10

, and about

4 \times 10^{19, 704}

for

n =

10,000 and

k = 100

. We will now compare the greedy algorithm in Section 6 against other possible anonymity solutions, and we will measure the goodness of the solutions using min and Shannon posterior entropies. Let us consider the three anonymity solutions below:

the solution from the greedy algorithm (Algorithm 1) (i.e., pick the $k - 1$ queries closest in probability to the real query);
a random solution (i.e., pick k random queries);
a non-optimal solution where the secrets with the highest probabilities, instead of being grouped in the first bin, are distributed in the other bins.

For example, for 6 secrets and

k = 2

, the greedy solution would be

{{x_{1}, x_{2}},

{x_{3}, x_{4}},

{x_{5}, x_{6}}}

, whereas the non-optimal solution (3) would be

{{x_{1}, x_{4}},

{x_{2}, x_{5}},

{x_{3}, x_{6}}}

.

The difference between these solutions can be very substantial. Figure 3 shows the values when the distribution over the input set is a binomial distribution, with parameter

p = 0.5

: in this scenario, supposing that there is a universe of 350 Twitter pages and that the user sends 19 queries selected using the greedy algorithm, the probability of an attacker guessing the secret query correctly would be over 7 times smaller than if the user had opted instead for the non-optimal solution (using

2^{- H_{\infty} (X | Y)}

as the conversion formula from posterior min entropy to probability of guessing). In fact, it is easy to define an input probability distribution such that the leakage gap between the non-optimal solution and the optimal solution given by the greedy algorithm is arbitrarily large.

Note that, by Theorem 6, the greedy solution is in fact optimal for all channel-supermodular entropies. Hence, the user knows that the greedy solution is optimal against an attacker trying to guess his secret query in a fixed number of guesses, or using guesswork or guessing using a twenty-questions-style guesswork (reflecting Shannon entropy), and so on.

6.3. Query Anonymity for Related Secrets

Consider the scenario where a user who wants to query the Twitter page of some political commentators is at the same time interested in hiding his own political affiliation, which could be leaked by his queries. In this scenario, the solution from Algorithm 1 might be sub-optimal. To see this, suppose that the k queries in the real query’s cover end up being all affiliated to the same party. This would thus reveal the user’s political party to the attacker with certainty even though the real intended query is still uncertain. This is not a contradiction to the optimality of the algorithm. As such, an adversary would be better modeled by a partition entropy—with political commentators (the queried users) grouped by party affiliation—and, as established in Proposition 3, this type of entropy is not channel-supermodular in general.

Motivated by this scenario, we now give an optimal solution for all channel supermodular entropies (even if not symmetric) to deal with this kind of problem. Suppose there are k political parties, and l commentators aligned with each party. Suppose, further, that the user is affiliated to one of the parties, and would like to check the profiles of his party’s commentators without revealing his own affiliation.

To achieve this aim, the user decides to group the political commentators in covers of size k, each cover containing exactly one commentator from each party, and then proceed to use these covers similarly to the mechanism described in Section 6.2, by querying the entire cover (fetching the pages of all the commentators in the cover). The question is: what is the set of covers that reveal the least amount of information about the user’s political affiliation?

Let

X = {P_{1}, \dots P_{k}}

be the set of parties,

Y = {c_{1}^{1}, \dots, c_{1}^{l}, c_{2}^{1}, \dots, c_{2}^{l}, \dots, \dots, c_{k}^{1}, \dots, c_{k}^{l}}

be the set of commentators, wherein

c_{i}^{j}

is the jth commentator of the ith party, and let

Z \subset 2^{Y}

be the set of covers. Let

K : X \to Y

be the channel giving the conditional probability of a user choosing to query a commentator given the user’s party inclination. We assume that the user only chooses commentators that share the same affiliation as his, that is,

K (c_{j}^{i} | P_{m}) = 0

whenever

j \neq m

. For simplicity, we assume that the commentators are organized decreasingly with regard to this probability—that is,

K (c_{j}^{i} | P_{j}) \geq K (c_{j}^{m} | P_{j}) whenever m > i .

(6)

We claim that the optimal mechanism, for all channel-supermodular entropies (with regards to the political parties, not the commentators), is to group the most popular commentators for each party in the first cover, then the second-most popular commentators for each party on the second, and so on. That is, the covers would be:

{{c_{1}^{1}, c_{2}^{1}, \dots, c_{k}^{1}}

,

{c_{1}^{2}, c_{2}^{2}, \dots, c_{k}^{2}}

, …,

{c_{1}^{l}, c_{2}^{l}, \dots, c_{k}^{l}}}

.

Let

R : Y \to Z

be the channel mapping each commentator to their cover, modelling the optimal solution described above. The matrix of this channel can be seen as a vertical concatenation of identity matrices:

\begin{matrix} R & C o v e r_{1} & C o v e r_{2} & \dots & C o v e r_{l} \\ c_{1}^{1} & 1 & 0 & \dots & 0 \\ c_{1}^{2} & 0 & 1 & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ c_{1}^{l} & 0 & 0 & \dots & 1 \\ ⋮ & ⋮ \\ ⋮ & ⋮ \\ c_{k}^{1} & 1 & 0 & \dots & 0 \\ c_{k}^{2} & 0 & 1 & \dots & 0 \\ \dots & \dots & \dots & ⋱ & \dots \\ c_{k}^{1} & 0 & 0 & \dots & 1 \end{matrix} \begin{matrix} K R & C o v e r_{1} & C o v e r_{2} & \dots & C o v e r_{l} \\ P_{1} & K (c_{1}^{1} | P_{1}) & K (c_{1}^{2} | P_{1}) & \dots & K (c_{1}^{l} | P_{1}) \\ P_{2} & K (c_{2}^{1} | P_{2}) & K (c_{2}^{2} | P_{2}) & \dots & K (c_{2}^{l} | P_{2}) \\ ⋮ & ⋮ & ⋮ & ⋮ \\ P_{k} & K (c_{k}^{1} | P_{k}) & K (c_{k}^{2} | P_{k}) & \dots & K (c_{k}^{l} | P_{k}) \end{matrix}

(7)

The channel

K R : X \to Z

above is then obtained by postprocessing K by R.

The claim of optimally can be more formally stated as follows: given any channel-supermodular H, any distribution

p_{X}

over the parties, and any other deterministic covering

R^{'} : Y \to Z^{'}

in which there is exactly one commentator from each party per cover, the resulting channel

K R^{'}

will never leak less information than the channel

K R

in (7). That is,

H (X | Z) \geq H (X | Z^{'}) .

The proof that the covering R above is indeed optimal for all channel-supermodular entropies is similar to the proof of Theorem 6, but even simpler. Suppose the channel

R^{'} : Y \to Z

is any covering that satisfies the restriction that each cover has exactly one commentator from each party. Now, consider the channel

K R^{'}

, and proceed as follows: first, do the Join-Meet operation of the first column with all the other columns (that is, obtaining the channel

⋄_{1, l} ⋄_{1, l - 1} \dots ⋄_{1, 2} (K R^{'})

). Now, disregarding the first column, the process is repeated for the second column with all the remaining ones. It is easy to see that the resulting channel will be exactly

K R

in (7).

As an example, let

k = 3

,

l = 3

, and suppose the non-zero values of K are:

\begin{matrix} K (c_{1}^{1} | P_{1}) = 0.5, K (c_{1}^{2} | P_{1}) = 0.3, K (c_{1}^{3} | P_{1}) = 0.2 \\ K (c_{2}^{1} | P_{2}) = 0.6, K (c_{2}^{2} | P_{2}) = 0.3, K (c_{2}^{3} | P_{2}) = 0.1 \\ K (c_{3}^{1} | P_{3}) = 0.5, K (c_{3}^{2} | P_{3}) = 0.4, K (c_{3}^{3} | P_{3}) = 0.1 \end{matrix}

and suppose that the covers that are defined by

R^{'}

are

{c_{1}^{2}, c_{2}^{1}, c_{3}^{2}}

,

{c_{1}^{1}, c_{2}^{3}, c_{3}^{1}}

, and

{c_{1}^{3}, c_{2}^{2}, c_{3}^{3}}

. We have

\begin{matrix} K R^{'} & C o v e r_{1} & C o v e r_{2} & C o v e r_{3} \\ P_{1} & 0.3 & 0.5 & 0.2 \\ P_{2} & 0.6 & 0.1 & 0.3 \\ P_{3} & 0.4 & 0.5 & 0.1 \end{matrix}

By doing the Join-Meet operations on the first column with the others, we obtain

\begin{matrix} ⋄_{1, 3} ⋄_{1, 2} (K R^{'}) & C o v e r_{1} & C o v e r_{2} & C o v e r_{3} \\ P_{1} & 0.5 & 0.3 & 0.2 \\ P_{2} & 0.6 & 0.1 & 0.3 \\ P_{3} & 0.5 & 0.4 & 0.1 \end{matrix}

Finally, by doing the Join-Meet of the second column with the remaining one, we have

\begin{matrix} ⋄_{2, 3} ⋄_{1, 3} ⋄_{1, 2} (K R^{'}) & C o v e r_{1} & C o v e r_{2} & C o v e r_{3} \\ P_{1} & 0.5 & 0.3 & 0.2 \\ P_{2} & 0.6 & 0.3 & 0.1 \\ P_{3} & 0.5 & 0.4 & 0.1 \end{matrix}

which is exactly the optimal solution given by (7).

7. Conclusions

In this work, we introduced the notion of channel-supermodular entropies, as a subset of core-concave entropies, which include guessing and Arimoto–Rényi entropies. We demonstrated that, for this new classification of entropies, the Join-Meet operator on channel columns decreases the H-mutual information. This property prompted us to define structural preorders of channels (

\geq_{ds}, \geq_{sds}

), providing novel sufficient conditions for establishing whether two channels are in the H-more capable any channel-supermodular H or in the

(H_{1}, H_{\infty})

-capacity ordering. Moreover, this work establishes some relationships of these new structural preorders with other existing preorders from the literature.

As an example application, we used channel-supermodularity to prove an optimality result of a greedy query anonymization algorithm.

It is our belief that the connection between supermodular functions and some commonly used entropy measures, made in Section 3.1, will prove useful for posterior investigations in information theory (for example, given the vast literature of supermodular functions over Euclidean space [50] (Chapter 6.D) and [49]). Further directions of work include investigating other useful properties of channel-supermodular entropies, and further applications of channel-supermodularity to anonymity.

Author Contributions

All authors contributed to the results of this paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proofs of Section 3

Appendix A.1. Proof of Proposition 2

(1) For Shannon entropy, Definition 5 yields

G_{F} (r) = - {∥r∥}_{1} \sum_{i}^{n} \frac{r_{i}}{{∥r∥}_{1}} log (\frac{r_{i}}{{∥r∥}_{1}}) = - \sum_{i}^{n} r_{i} log (\frac{r_{i}}{{∥r∥}_{1}}) .

Suppose

r \in R_{> 0}^{n}

. Then, for all

i, j

with

i \neq j

we have

\frac{\partial^{2} G_{F} (r)}{\partial r_{i} \partial r_{j}} = \frac{1}{{∥r∥}_{1} ln (2)} \geq 0 .

(A1)

Supermodularity of

G_{F}

restricted to

r \in R_{> 0}^{n}

then follows from (4). If, however,

r \in R_{\geq 0}^{n} \ R_{> 0}^{n}

, the second derivative may not exist. In such a case, supermodularity can be established by a limiting argument. Let

ϵ > 0

and

r^{'} = r + ϵ (\sum_{k} e_{k})

.

Then (4) and (A1) imply that

G_{F} (r^{'} + δ_{1} e_{i} + δ_{2} e_{j}) + G_{F} (r^{'}) \geq G_{F} (r^{'} + δ_{1} e_{i}) + G_{F} (r^{'} + δ_{2} e_{j}) .

As

G_{F}

is continuous, taking

ϵ \to 0

in both sides of the inequality above, we obtain

G_{F} (r + δ_{1} e_{i} + δ_{2} e_{j}) + G_{F} (r) \geq G_{F} (r + δ_{1} e_{i}) + G_{F} (r + δ_{2} e_{j}) .

Thus,

G_{F}

is supermodular following (3).

(2) For Arimoto–Rényi entropies of order

α \in (0, 1)

, we have

G_{F} (r) = {∥r∥}_{1} F (\frac{r}{{∥r∥}_{1}}) = {∥r∥}_{1} {∥\frac{r}{{∥r∥}_{1}}∥}_{α} = {∥r∥}_{α}

and similarly,

G_{F} (r) = - {∥r∥}_{α}

for entropies of order

α > 1

.

Now, for all

r \in R_{> 0}^{n}

and

i, j

with

i \neq j

, we have

\frac{\partial^{2} {∥r∥}_{α}}{\partial r_{i} \partial r_{j}} = (1 - α) r_{i}^{α - 1} r_{j}^{α - 1} {(\sum_{k = 1}^{n} r_{k}^{α})}^{\frac{1}{α} - 2}

which is negative if

α > 1

and positive if

0 < α < 1

. Thus, from Equation (4),

G_{F}

restricted to

R_{> 0}^{n}

is supermodular for all

α \in (0, 1) \cup (1, \infty)

. Supermodularity of

G_{F}

can then be established by a limiting argument similar to the one in the proof for Shannon entropy.

(3) For the k-tries entropy, we have

G_{F} (r) = - {∥r∥}_{1} (\sum_{i = 1}^{k} \frac{r_{[i]}}{{∥r∥}_{1}}) = - \sum_{i = 1}^{k} r_{[i]} .

Let

r \in R_{\geq 0}^{n}

, let

i, j

be such that

i \neq j

and let

δ_{1}, δ_{2} > 0

. From Equation (3), it suffices to prove that

G_{F} (r + δ_{1} e_{i}) - G_{F} (r) \leq G_{F} (r + δ_{1} e_{i} + δ_{2} e_{j}) - G_{F} (r + δ_{2} e_{j}) .

Let

r^{'} = r + δ_{1} e_{i}

,

r^{''} = r + δ_{2} e_{j}

and

r^{'''} = r + δ_{1} e_{i} + δ_{2} e_{j}

. The inequality above is equivalent to

\sum_{l = 1}^{k} r_{[l]}^{'} - \sum_{l = 1}^{k} r_{[l]} \geq \sum_{l = 1}^{k} r_{[l]}^{'''} - \sum_{l = 1}^{k} r_{[l]}^{''} .

(A2)

If

r_{i} + δ_{1}

is not amongst the k largest elements in

r^{'''}

, then the right-hand side of (A2) is equal to 0, and there is nothing to prove. If

r_{i} + δ_{1}

is amongst the k largest elements in

r^{'''}

, then it is also amongst the k largest elements in

r^{'}

, and either:

$r_{i}$ is amongst the k largest elements in $r$ and $r^{''}$ , and thus both sides of (A2) are equal to $δ_{1}$ ;
$r_{i}$ is amongst the k largest elements in $r$ but not in $r^{''}$ . In which case the left-hand side of (A2) is equal to $δ_{1}$ , and the right-hand side of (A2) is less than $δ_{1}$ ;
$r_{i}$ not amongst the k largest elements in $r$ and neither in $r^{''}$ . In this case the left-hand side of (A2) is larger then the right-hand side, as the kth largest element of $r^{''}$ is greater or equal than the kth largest element of $r$ .

In any case, (A2) holds.

(4) For guessing entropy, Definition 5 yields

G_{F} (r) = {∥r∥}_{1} \sum_{i} \frac{i r_{[i]}}{{∥r∥}_{1}} = \sum_{i} i r_{[i]} .

Let

i, j

be such that

i \neq j

and let

δ_{1}, δ_{2} > 0

. Suppose, without loss of generality, that

r_{i} + δ_{1} \geq r_{j} + δ_{2}

. We will prove that

G_{F} (r + δ_{1} e_{i} + δ_{2} e_{j}) - G_{F} (r + δ_{1} e_{i}) \geq G_{F} (r + δ_{2} e_{j}) - G_{F} (r) .

(A3)

Let

I_{r + δ_{1} e_{i}} = {k \leq n | r_{j} < r_{k} < r_{j} + δ_{2}}

—that is,

I_{r + δ_{1} e_{i}}

is the set of coordinates of

r + δ_{1} e_{i}

whose value is between

r_{j}

and

r_{j} + δ_{2}

.

Let

m - 1

be the number of entries of

r + δ_{1} e_{i} + δ_{1} e_{j}

that are greater than or equal to

r_{j} + δ_{2}

. Then, there are

(m + | I_{r + δ_{1} e_{i}} | - 1)

coordinates of

r + δ_{1} e_{i}

that are strictly greater than

r_{j}

.

The difference

G_{F} (r + δ_{1} e_{i} + δ_{2} e_{j}) - G_{F} (r + δ_{1} e_{i})

is then

m (r_{j} + δ_{2}) - (m + | I_{p + δ_{1} e_{i}} |) r_{j}

, plus the sum of the entries given by the set

I_{p + δ_{1} e_{i}}

, as the integers by which they are multiplied are decreased by one. As we assume

r_{i} + δ_{1} \geq r_{j} + δ_{2}

,

i \notin I_{p + δ_{1} e_{i}}

, and the value of the right hand-side of (A3) reduces to

\begin{matrix} G_{F} (r + δ_{1} e_{i} + δ_{2} e_{j}) - G_{F} (r + δ_{1} e_{i}) \\ = & m (r_{j} + δ_{2}) - (m + | I_{r + δ_{1} e_{i}} |) r_{j} + \sum_{k \in I_{r + δ_{1} e_{i}}} r_{k} \\ = & - | I_{r + δ_{1} e_{i}} | r_{j} + m δ_{2} + \sum_{k \in I_{r + δ_{1} e_{i}}} r_{k} . \end{matrix}

We now turn to the right-hand side of (A3), the difference

G_{F} (p + δ_{2} e_{j}) - G_{F} (p)

. We define

I_{p}

similarly to

I_{p + δ_{1} e_{i}}

and divide the proof in three cases.

Case 1 (

r_{i} \geq r_{j} + δ_{2}

): In this case, there are

m - 1

entries of

r + δ_{2} e_{j}

greater or equal to

r_{j} + δ_{2}

, and

(m + | I_{r} | - 1)

entries of

r

strictly greater than

r_{j}

. Moreover,

I_{r} = I_{r + δ_{1} e_{i}}

, and we obtain

\begin{matrix} G_{F} (r + δ_{2} e_{j}) - G_{F} (r) \\ = & m (r_{j} + δ_{2}) - (m + | I_{r} |) r_{j} + \sum_{k \in I_{r}} r_{k} \\ = & G_{F} (r + δ_{1} e_{i} + δ_{2} e_{j}) - G_{F} (r + δ_{1} e_{i}) . \end{matrix}

Case 2 (

r_{j} < r_{i} < r_{j} + δ_{2}

): In this case, there are

m - 1

entries of

r + δ_{2} e_{j}

greater or equal to

r_{j} + δ_{2}

, and

(m + | I_{r} | - 2)

entries of

r

strictly greater than

r_{j}

. This time,

I_{r} = I_{r + δ_{1} e_{i}} \cup {i}

, and we have

\begin{matrix} G_{F} (r + δ_{2} e_{j}) - G_{F} (r) \\ = & (m - 1) (r_{j} + δ_{2}) + \sum_{k \in I_{r}} r_{k} - (m - 1 + | I_{r} |) r_{j} \\ = & (m - 1) (r_{j} + δ_{2}) + \sum_{k \in I_{r + δ_{1} e_{i}}} r_{k} + r_{i} - (m + | I_{r + δ_{1} e_{i}} |) r_{j} \\ = & - | I_{r + δ_{1} e_{i}} | r_{j} + m δ_{2} + \sum_{k \in I_{r + δ_{1} e_{i}}} r_{k} + r_{i} - (r_{j} + δ_{2}) \\ \leq & - | I_{r + δ_{1} e_{i}} | r_{j} + m δ_{2} + \sum_{k \in I_{r + δ_{1} e_{i}}} r_{k} \\ = & G_{F} (r + δ_{1} e_{i} + δ_{2} e_{j}) - G_{F} (r + δ_{1} e_{i}), \end{matrix}

where the inequality comes from the assumption that

r_{i} < r_{j} + δ_{2}

.

Case 3 (

r_{i} \leq r_{j}

): In this case, again there are

m - 1

entries of

r + δ_{2} e_{j}

greater or equal to

r_{j} + δ_{2}

, and

(m + | I_{r} | - 2)

entries of

r

strictly greater than

r_{j}

. This time,

I_{r} = I_{r + δ_{1} e_{i}}

, and we have

\begin{matrix} G_{F} (r + δ_{2} e_{j}) - G_{F} (r) \\ = & (m - 1) (r_{j} + δ_{2}) + \sum_{k \in I_{r}} r_{k} - (m - 1 + | I_{r} |) r_{j} \\ = & - | I_{r} | r_{j} + (m - 1) δ_{2} + \sum_{k \in I_{r}} r_{k} \\ \leq & - | I_{r} | r_{j} + m δ_{2} + \sum_{k \in I_{r}} r_{k} \\ = & G_{F} (r + δ_{1} e_{i} + δ_{2} e_{j}) - G_{F} (r + δ_{1} e_{i}) . \end{matrix}

Therefore,

G_{F}

is supermodular.

Appendix A.2. Proof of Proposition 3

(1) For all these entropies, Definition 5 yields

G_{F} (r) = {∥r∥}_{1} {∥\frac{r}{{∥r∥}_{1}}∥}_{α}^{α} = {∥r∥}_{α}^{α} {∥r∥}_{1}^{(1 - α)},

if

0 < α < 1

, and

G_{F} (r) = - {∥r∥}_{α}^{α} {∥r∥}_{1}^{(1 - α)}

if

α > 1

.

Now, if

r \in R_{> 0}^{n}

, we have

\frac{\partial^{2} {∥r∥}_{α}^{α} {∥r∥}_{1}^{(1 - α)}}{\partial r_{i} \partial r_{j}} = α (1 - α) {∥r∥}_{1}^{- α} (r_{i}^{α - 1} + r_{j}^{α - 1} - {∥r∥}_{α}^{α} {∥r∥}_{1}^{- 1}),

and by (4),

G_{F}

is supermodular only if the partial derivative above is positive if

α \in (0, 1)

and negative if

α > 1

. Equivalently, if

G_{F}

is supermodular then for all

r \in R_{> 0}^{n}

,

\forall i, j such that i \neq j r_{i}^{α - 1} + r_{j}^{α - 1} - {∥r∥}_{α}^{α} {∥r∥}_{1}^{- 1} \geq 0 .

(A4)

Thus, it suffices to provide, for each

α

, one vector

r

such that (A4) does not hold.

Case 1

(α \in (0, 1))

: Let

r \in R_{> 0}^{n}

be the vector with coordinates

r_{1} = r_{2} = 1

and

r_{j} = ϵ

for all

j > 2

. Then,

r_{1}^{α - 1} + r_{2}^{α - 1} - {∥r∥}_{α}^{α} {∥r∥}_{1}^{- 1} = 2 - \frac{2 + (n - 2) ϵ^{α}}{2 + (n - 2) ϵ} .

Now,

{lim}_{n \to \infty} \frac{2 + (n - 2) ϵ^{α}}{2 + (n - 2) ϵ} = ϵ^{α - 1}

, and

ϵ^{α - 1}

can be made arbitrarily large by choosing a suitably small

ϵ

. Thus, for an appropriate choice of n and

ϵ

,

\frac{2 + (n - 2) ϵ^{α}}{2 + (n - 2) ϵ} > 2

, and (A4) does not hold.

Case 2

(α > 1)

: Fix

n > 2

. Let

α > 1

,

ϵ > 0

and pick

a > 0

such that

a^{α} - 2 a - 2 > (n - 3) (2 ϵ - ϵ^{α}) .

Notice that such a is guaranteed to exist because

α > 1

. Let

r \in R_{> 0}^{n}

be the vector with

r_{1} = r_{2} = 1

,

r_{3} = a

and

r_{k} = ϵ

for all

k > 3

. Then

r_{1}^{α - 1} + r_{2}^{α - 1} - {∥r∥}_{α}^{α} {∥r∥}_{1}^{- 1} = 2 - \frac{2 + a^{α} + (n - 3) ϵ^{α}}{2 + a + (n - 3) ϵ} = \frac{2 + 2 a - a^{α} + (n - 3) (2 ϵ - ϵ^{α})}{2 + a + (n - 3) ϵ} < 0 .

and (A4) does not hold. Therefore,

G_{F}

is not supermodular.

(2) By Definition 5, we have

G_{F} (r) = - max_{A \in P} \sum_{i \in A} r_{i} .

Let

n = 3

,

P = {{1, 2}, {3}}

, and let

r = (3, 1, 3)

,

s = (1, 3, 3) \in R_{\geq 0}^{3}

. Then,

G_{F} (r \lor s) + G_{F} (r \land s) = - 9 < - 8 = G_{F} (r) + G_{F} (s) .

Appendix B. Proofs of Section 5

Appendix B.1. Proof of Proposition 5

(1) follows immediately from Definition 9, and (2) from Proposition 4.2. Statement (3) is proven by the following channels:

(\begin{matrix} 1 \\ 1 \end{matrix}) \geq_{d} (\begin{matrix} 0.5 & 0.5 \\ 0.5 & 0.5 \end{matrix}) .

For (4), consider the channels

K_{1}

,

K_{2}

as follows:

(\begin{matrix} 1 & 0 \\ 0 & 1 \\ \frac{1}{2} & \frac{1}{2} \end{matrix}) (\begin{matrix} 1 & 0 \\ \frac{1}{2} & \frac{1}{2} \\ \frac{1}{2} & \frac{1}{2} \end{matrix}) .

These channels were introduced in [3], where it is proved that

K_{1} ≱_{\ln}^{H_{1}} K_{2}

. However

K_{1} \geq_{ds} K_{2}

, as

(\begin{matrix} 1 & 0 \\ 0 & 1 \\ \frac{1}{2} & \frac{1}{2} \end{matrix}) \geq_{d} (\begin{matrix} 1 & 0 & 0 \\ 0 & \frac{1}{2} & \frac{1}{2} \\ \frac{1}{2} & \frac{1}{4} & \frac{1}{4} \end{matrix}) \geq_{s} (\begin{matrix} 1 & 0 & 0 \\ \frac{1}{2} & 0 & \frac{1}{2} \\ \frac{1}{2} & \frac{1}{4} & \frac{1}{4} \end{matrix}) \geq_{d} (\begin{matrix} 1 & 0 \\ \frac{1}{2} & \frac{1}{2} \\ \frac{1}{2} & \frac{1}{2} \end{matrix}) .

Statement (5) follows from (1) and from Theorem 2.

To prove (6), consider the following channels, for some

r \in (0, 1)

and

δ, ϵ > 0

:

K_{1} = (\begin{matrix} r & 1 - r \\ r + δ & 1 - r - δ \end{matrix}) K_{2} = K_{1} (\begin{matrix} 1 + ϵ & - ϵ \\ \frac{1}{2} & \frac{1}{2} \end{matrix}) .

Notice that

K_{1}

and

K_{2}

are indeed channels if

δ

and

ϵ

are small enough. These channels were introduced in [3] as an example for which (for

δ, ϵ

small enough)

K_{1} \geq_{\ln}^{H_{1}} K_{2}

but

K_{1} ≱_{d} K_{2}

. As their input set is of size 2, Proposition 4.1 implies that

K_{1} ≱_{ds} K_{2}

.

Statement (7) follows directly from Definition 9, Proposition 1, and Theorem 4.

For (8), consider the following

K_{1}

,

K_{2}

:

(\begin{matrix} 0 & \frac{1}{25} & \frac{24}{25} \\ \frac{1}{2} & 0 & \frac{1}{2} \\ \frac{1}{3} & \frac{1}{3} & \frac{1}{3} \end{matrix}) (\begin{matrix} \frac{1}{25} & 0 & \frac{24}{25} \\ \frac{1}{2} & 0 & \frac{1}{2} \\ \frac{1}{3} & \frac{1}{3} & \frac{1}{3} \end{matrix}) .

In this case,

K_{2} = ⋄_{1, 2} K_{1}

, and thus

K_{1} \geq_{ds} K_{2}

. Suppose, to derive a contradiction, that

K_{1} \geq_{sh} K_{2}

Then there is a finite collection

{(g_{i}, T_{i}, R_{i})}

such that each

T_{i}

,

R_{i}

is a deterministic channel, each

g_{i} > 0

,

\sum g_{i} = 1

and

K_{2} = \sum_{i} g_{i} T_{i} K_{1} R_{i}

. Notice that, for all i, we must have

(T_{i} K_{1} R_{i}) (y_{2} | x_{1}) = (T_{i} K_{1} R_{i}) (y_{2} | x_{2}) = 0 .

(A5)

Let

I = {i | (T_{i} K_{1} R_{i}) (y_{1} | x_{1}) \leq \frac{1}{25}}

. We claim that

for all i \in I, (T_{i} K_{1} R_{i}) (y_{1} | x_{3}) + (T_{i} K_{1} R_{i}) (y_{3} | x_{2}) + (T_{i} K_{1} R_{i}) (y_{3} | x_{3}) \geq \frac{4}{3} .

(A6)

To see why, notice that channels of the form

T_{i} K_{1}

are those channels whose rows are equal to rows of

K_{1}

(with possible reordering and duplicates), and channels of the form

(T_{i} K_{1}) R_{i}

are those obtaining by permutating or summing columns of

(T_{i} K_{1})

.

If the first row of

T_{i} K_{1}

is either

(\frac{1}{2}, 0, \frac{1}{2})

or

(\frac{1}{3}, \frac{1}{3}, \frac{1}{3})

. Then, in order that

i \in I

, the third column of

T_{i} K_{1} R_{i}

must be either (1) equal to the sum of the first and third columns of

T_{i} K_{1}

, (2) or equal to the sum of all columns of

T_{i} K_{1}

. In either case, we have

(T_{i} K_{1} R_{i}) (y_{3} | x_{2}) \geq \frac{2}{3}

and

(T_{i} K_{1} R_{i}) (y_{3} | x_{3}) \geq \frac{2}{3}

, so (A6) holds.

If the first row of

T_{i} K_{1}

is

(0, \frac{1}{25}, \frac{24}{25})

, the proof is not as straightforward, and we divide it into two subcases. The first subcase is when the second row is either

(\frac{1}{3}, \frac{1}{3}, \frac{1}{3})

or

(\frac{1}{2}, 0, \frac{1}{2})

, then each column of

T_{i} K_{1}

will be mapped to the first or to the third column of

T_{i} K_{1} R_{i}

, and therefore

(T_{i} K_{1} R_{i}) (y_{1} | x_{3}) + (T_{i} K_{1} R_{i}) (y_{3} | x_{3}) = 1

. Then, (A6) follows from observing that

(T_{i} K_{1} R_{i}) (y_{3} | x_{2}) \geq \frac{1}{3}

. The second subcase to consider is when the second row is

(0, \frac{1}{25}, \frac{24}{25})

. In this case, we have

(T_{i} K_{1} R_{i}) (y_{3} | x_{2}) \geq \frac{24}{25}

, and, by considering each of the three possible choices for the third row of

T_{i} K_{1}

, it is easy to see that

(T_{i} K_{1} R_{i}) (y_{1} | x_{3}) + (T_{i} K_{1} R_{i}) (y_{3} | x_{3}) \geq \frac{1}{2}

.

As

K_{2} (y_{1} | x_{3}) + K_{2} (y_{3} | x_{2}) + K_{2} (y_{3} | x_{3}) = \frac{7}{6}

, it follows from (A6) that

\begin{matrix} \frac{7}{6} & = \sum_{i} g_{i} ((T_{i} K_{1} R_{i}) (y_{1} | x_{3}) + (T_{i} K_{1} R_{i}) (y_{3} | x_{2}) + (T_{i} K_{1} R_{i}) (y_{3} | x_{3})) \\ \geq \sum_{i \in I} g_{i} ((T_{i} K_{1} R_{i}) (y_{1} | x_{3}) + (T_{i} K_{1} R_{i}) (y_{3} | x_{2}) + (T_{i} K_{1} R_{i}) (y_{3} | x_{3})) \\ \geq \frac{4}{3} \sum_{i \in I} g_{i} . \end{matrix}

Thus, it follows that

\sum_{i \in I} g_{i} \leq \frac{7}{8}

, and therefore

\sum_{i \notin I} g_{i} \geq \frac{1}{8}

. Since

{min}_{i \notin I} (T_{i} K_{1} R_{i}) (y_{1} | x_{1}) \geq \frac{1}{3}

, we have

\begin{matrix} \frac{1}{25} = K_{2} (y_{1} | x_{1}) & = \sum_{i = 1}^{k} g_{i} (T_{i} K_{1} R_{i}) (y_{1} | x_{1}) \\ \geq \sum_{i \notin I} g_{i} (T_{i} K_{1} R_{i}) (y_{1} | x_{1}) \\ \geq \frac{1}{3} \sum_{i \notin I} g_{i} \\ \geq \frac{1}{3} \times \frac{1}{8} = \frac{1}{24} \end{matrix}

which is absurd.

For (9), let

K_{1}

,

K_{2}

be the following channels:

(\begin{matrix} 1 & 0 \\ 1 & 0 \\ 0 & 1 \end{matrix}) (\begin{matrix} 1 & 0 \\ 0 & 1 \\ 1 & 0 \end{matrix}) .

Notice that

K_{2}

is obtained from

K_{1}

by row permutation; hence,

K_{1} \geq_{sh} K_{2}

. However,

K_{1} ≱_{mc}^{H_{1}} K_{2}

(take, for example,

p_{X} = (\frac{1}{2}, \frac{1}{2}, 0)

). Thus, (7) yields

K_{1} ≱_{ds} K_{2}

.

Appendix B.2. Proof of Proposition 9

(1) Follows from Lemma 1 and Shannon’s result in [1].

(2) In light of Lemma 1, we need only to prove that

K_{1} \geq_{sh} K_{2} \Rightarrow K_{1} \geq_{c}^{H_{\infty}} K_{2}

.

Firstly, we prove that, given channels

W_{1}

,

W_{2}

and

W_{3} = t W_{1} + (1 - t) W_{2}

, we have

C_{H_{\infty}} (W_{3})

\leq max (C_{H_{\infty}} (W_{1}), C_{H_{\infty}} (W_{2}))

. Keeping in mind the identity

C_{H_{\infty}} (W_{3}) = log

\sum_{y} {max}_{x} W_{3} (x, y)

([43] Remark 3),

\begin{matrix} C_{H_{\infty}} (W_{3}) = & log (\sum_{y} max_{x} (t W_{1} (x, y) + (1 - t) W_{2} (x, y))) \\ \leq & log (t \sum_{y} max_{x} W_{1} (x, y) + (1 - t) \sum_{y} max_{x} W_{2} (x, y)) \\ \leq & max (C_{H_{\infty}} (W_{1}), C_{H_{\infty}} (W_{2})) \end{matrix}

where the last inequality follows from quasi-convexity of log. Now, if

K_{2} = (\sum_{i} g_{i} R_{i} K_{1} T_{i})

, then

C_{H_{\infty}} (K_{2}) = C_{H_{\infty}} (\sum_{i} g_{i} R_{i} K_{1} T_{i}) \leq {max}_{i} C_{H_{\infty}} (R_{i} K_{1} T_{i}) \leq C_{H_{\infty}} (K_{1})

, where the last inequality follows from ([33] Theorem 6).

(3) This is readily seen from the fact that

\geq_{c}^{H_{1}}

and

\geq_{c}^{H_{\infty}}

are total preorders that do not coincide. To make the argument concrete, consider channels

K_{1}

,

K_{2}

as follows:

(\begin{matrix} \frac{1}{2} & \frac{1}{2} & 0 \\ 0 & \frac{1}{2} & \frac{1}{2} \\ \frac{1}{2} & 0 & \frac{1}{2} \end{matrix}) (\begin{matrix} \frac{2}{5} & \frac{2}{5} & \frac{1}{5} \\ \frac{1}{5} & \frac{2}{5} & \frac{2}{5} \\ \frac{2}{5} & \frac{1}{5} & \frac{2}{5} \end{matrix}) .

Then, from ([53] Theorem 2.7.1)

C_{H_{\infty}} (K_{1}) > C_{H_{1}} (K_{2})

, whereas, from ([43] Remark 3),

C_{H_{\infty}} (K_{1}) < C_{H_{\infty}} (K_{2})

. Thus, (1) and (2) imply that

K_{1} ≱_{shs} K_{2}

and

K_{2} ≱_{shs} K_{1}

.

References

Shannon, C.E. A note on a partial ordering for communication channels. Inf. Control 1958, 1, 390–397. [Google Scholar] [CrossRef] [Green Version]
El Gamal, A.A. The capacity of a class of broadcast channels. IEEE Trans. Inf. Theory 1979, 25, 166–169. [Google Scholar] [CrossRef]
Korner, J.; Marton, K. Comparison of two noisy channels. Coll. Math. Soc. J. Bolyai 1977, 16, 411–423. [Google Scholar]
Clark, D.; Hunt, S.; Malacaria, P. Quantitative Information Flow, Relations and Polymorphic Types. J. Log. Comput. 2005, 15, 181–199. [Google Scholar] [CrossRef]
Smith, G. On the Foundations of Quantitative Information Flow. In Proceedings of the 12th International Conference on Foundations of Software Science and Computational Structures (FOSSACS), York, UK, 22–29 March 2009; Springer: Berlin/Heidelberg, Germany, 2009; Volume 5504, pp. 288–302. [Google Scholar]
Cohen, J.; Kempermann, J.H.B.; Zbaganu, G. Comparisons of Stochastic Matrices with Applications in Information Theory, Statistics, Economics and Population; Springer Science & Business Media: Berlin/Heidelberg, Germany, 1998. [Google Scholar]
Américo, A.; Malacaria, P.; Khouzani, A.M. Channel Ordering and Supermodularity. In Proceedings of the 2019 IEEE Information Theory Workshop (ITW) (IEEE ITW 2019), Visby, Sweden, 25–28 August 2019. [Google Scholar]
Cover, T.M. Broadcast channels. IEEE Trans. Inf. Theory 1972, 18, 2–14. [Google Scholar] [CrossRef] [Green Version]
Pasareanu, C.S.; Phan, Q.; Malacaria, P. Multi-run Side-Channel Analysis Using Symbolic Execution and Max-SMT. In Proceedings of the IEEE 29th Computer Security Foundations Symposium, CSF 2016, Lisbon, Portugal, 27 June–1 July 2016; pp. 387–400. [Google Scholar] [CrossRef]
Heusser, J.; Malacaria, P. Quantifying information leaks in software. In Proceedings of the Twenty-Sixth Annual Computer Security Applications Conference, ACSAC 2010, Austin, TX, USA, 6–10 December 2010; pp. 261–269. [Google Scholar] [CrossRef]
Gong, X.; Kiyavash, N. Quantifying the information leakage in timing side channels in deterministic work-conserving schedulers. IEEE/ACM Trans. Netw. 2016, 24, 1841–1852. [Google Scholar] [CrossRef] [Green Version]
Köpf, B.; Smith, G. Vulnerability Bounds and Leakage Resilience of Blinded Cryptography under Timing Attacks. In Proceedings of the 2010 23rd IEEE Computer Security Foundations Symposium, Edinburgh, UK, 17–19 July 2010; pp. 44–56. [Google Scholar]
Américo, A.; Khouzani, M.; Malacaria, P. Deterministic Channel Design for Minimum Leakage. In Proceedings of the IEEE 32nd Computer Security Foundations Symposium (CSF), Hoboken, NJ, USA, 25–28 June 2019; pp. 428–441. [Google Scholar] [CrossRef]
Bergmans, P. Random coding theorem for broadcast channels with degraded components. IEEE Trans. Inf. Theory 1973, 19, 197–207. [Google Scholar] [CrossRef]
Bergmans, P. A simple converse for broadcast channels with additive white Gaussian noise (Corresp.). IEEE Trans. Inf. Theory 1974, 20, 279–280. [Google Scholar] [CrossRef]
Gallager, R.G. Capacity and coding for degraded broadcast channels. Probl. Peredachi Informatsii 1974, 10, 3–14. [Google Scholar]
Smith, G. Recent Developments in Quantitative Information Flow (Invited Tutorial). In Proceedings of the LICS ’15 2015 30th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), Kyoto, Japan, 6–10 July 2015; IEEE Computer Society: Washington, DC, USA, 2015; pp. 23–31. [Google Scholar] [CrossRef]
Malacaria, P. Algebraic foundations for quantitative information flow. Math. Struct. Comput. Sci. 2015, 25, 404–428. [Google Scholar] [CrossRef] [Green Version]
Alvim, M.S.; Chatzikokolakis, K.; Palamidessi, C.; Smith, G. Measuring Information Leakage Using Generalized Gain Functions. In Proceedings of the IEEE 25th Computer Security Foundations Symposium (CSF), Cambridge, MA, USA, 25–27 June 2012; pp. 265–279. [Google Scholar] [CrossRef] [Green Version]
McIver, A.; Morgan, C.; Smith, G.; Espinoza, B.; Meinicke, L. Abstract Channels and Their Robust Information-Leakage Ordering. In Proceedings of the International Conference on Principles of Security and Trust (POST), Grenoble, France, 5–13 April 2014; Springer: Berlin/Heidelberg, Germany, 2014; Volume 8414, pp. 83–102. [Google Scholar]
Blackwell, D. The comparison of experiments. In Second Berkeley Symposium on Mathematical Statistics and Probability; Neyman, J., Ed.; Univ. of California Press: Berkeley, CA, USA, 1951; pp. 93–102. [Google Scholar]
Bordenabe, N.E.; Smith, G. Correlated Secrets in Quantitative Information Flow. In Proceedings of the 2016 IEEE 29th Computer Security Foundations Symposium (CSF), Lisbon, Portugal, 27 June–1 July 2016; pp. 93–104. [Google Scholar] [CrossRef]
Buscemi, F. Degradable channels, less noisy channels, and quantum statistical morphisms: An equivalence relation. Probl. Inf. Transm. 2016, 52, 201–213. [Google Scholar] [CrossRef] [Green Version]
Cam, L.L. Sufficiency and Approximate Sufficiency. Ann. Math. Stat. 1964, 35, 1419–1455. [Google Scholar] [CrossRef]
Raginsky, M. Shannon meets Blackwell and Le Cam: Channels, codes, and statistical experiments. In Proceedings of the 2011 IEEE International Symposium on Information Theory Proceedings, St. Petersburg, Russia, 31 July–5 August 2011; pp. 1220–1224. [Google Scholar] [CrossRef]
Zhang, Y.; Tepedelenlioğlu, C. Analytical and Numerical Characterizations of Shannon Ordering for Discrete Memoryless Channels. IEEE Trans. Inf. Theory 2014, 60, 72–83. [Google Scholar] [CrossRef] [Green Version]
Nasser, R. Characterizations of Two Channel Orderings: Input-Degradedness and the Shannon Ordering. IEEE Trans. Inf. Theory 2018, 64, 6759–6770. [Google Scholar] [CrossRef]
Khouzani, M.; Malacaria, P. Generalised Entropies and Metric-Invariant Optimal Countermeasures for Information Leakage under Symmetric Constraints. IEEE Trans. Inf. Theory 2018, 65, 888–901. [Google Scholar] [CrossRef] [Green Version]
Khouzani, M.; Malacaria, P. Leakage-Minimal Design: Universality, Limitations, and Applications. In Proceedings of the IEEE 30th Computer Security Foundations Symposium (CSF), Santa Barbara, CA, USA, 21–25 August 2017; pp. 305–317. [Google Scholar]
Khouzani, M.; Malacaria, P. Optimal Channel Design: A Game Theoretical Analysis. Entropy 2018, 20, 675. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Ah-Fat, P.; Huth, M. Optimal Accuracy-Privacy Trade-Off for Secure Computations. IEEE Trans. Inf. Theory 2018, 65, 3165–3182. [Google Scholar] [CrossRef]
Köpf, B.; Basin, D.A. An information-theoretic model for adaptive side-channel attacks. In Proceedings of the 14th ACM Conference on Computer and Communications Security, Alexandria, VA, USA, 2 November–31 October 2007; pp. 286–296. [Google Scholar] [CrossRef] [Green Version]
Espinoza, B.; Smith, G. Min-entropy as a resource. Inf. Comp. 2013, 226, 57–75. [Google Scholar] [CrossRef]
Alvim, M.S.; Chatzikokolakis, K.; McIver, A.; Morgan, C.; Palamidessi, C.; Smith, G. Additive and Multiplicative Notions of Leakage, and Their Capacities. In Proceedings of the IEEE 27th Computer Security Foundations Symposium (CSF), Vienna, Austria, 19–22 July 2014; pp. 308–322. [Google Scholar] [CrossRef] [Green Version]
Gervais, A.; Shokri, R.; Singla, A.K.; Capkun, S.; Lenders, V. Quantifying Web-Search Privacy. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, Scottsdale, AZ, USA, 3–7 November 2014. [Google Scholar]
Beigi, G.; Guo, R.; Nou, A.; Zhang, Y.; Liu, H. Protecting User Privacy: An Approach for Untraceable Web Browsing History and Unambiguous User Profiles. arXiv 2018, arXiv:cs.CR/1811.09340. [Google Scholar]
Rebollo-Monedero, D.; Forne, J. Optimized Query Forgery for Private Information Retrieval. IEEE Trans. Inf. Theory 2010, 56, 4631–4642. [Google Scholar] [CrossRef] [Green Version]
Alvim, M.S.; Chatzikokolakis, K.; McIver, A.; Morgan, C.; Palamidessi, C.; Smith, G. An axiomatization of information flow measures. Theor. Comput. Sci. 2019, 777, 32–54. [Google Scholar] [CrossRef]
Arimoto, S. Information-theoretical considerations on estimation problems. Inf. Control 1971, 19, 181–194. [Google Scholar] [CrossRef] [Green Version]
Salicru, M.; Menendez, M.; Morales, D.; Pardo, L. Asymptotic distribution of (h, Φ)-entropies. Commun. Stat.-Theory Methods 1993, 22, 2015–2031. [Google Scholar] [CrossRef]
Csiszár, I. Information-type measures of difference of probability distributions and indirect observation. Stud. Sci. Math. Hung. 1967, 2, 229–318. [Google Scholar]
Ho, S.W.; Verdú, S. Convexity/concavity of rényi entropy and α-mutual information. In Proceedings of the 2015 IEEE International Symposium on Information Theory (ISIT), Hong Kong, China, 14–19 June 2015; pp. 745–749. [Google Scholar] [CrossRef]
Arimoto, S. Information measures and capacity of order α for discrete memoryless channels. In Topics in Information Theory; North-Holland Publishing Company: Amsterdam, The Netherlands, 1977. [Google Scholar]
Massey, J.L. Guessing and entropy. In Proceedings of the IEEE Int. Symposium on Information Theory (ISIT), Trondheim, Norway, 27 June–1 July 1994; p. 204. [Google Scholar] [CrossRef]
Tsallis, C. Possible generalization of Boltzmann-Gibbs statistics. J. Stat. Phys. 1988, 52, 479–487. [Google Scholar] [CrossRef]
Sharma, B.; Mittal, D. New non-additive measures of entropy for discrete probability distributions. J. Math. Sci. 1975, 10, 28–40. [Google Scholar]
Hayashi, M. Exponential decreasing rate of leaked information in universal random privacy amplification. IEEE Trans. Inf. Theory 2011, 57, 3989–4001. [Google Scholar] [CrossRef] [Green Version]
Dahl, G. Matrix majorization. Linear Algebra Its Appl. 1999, 288, 53–73. [Google Scholar] [CrossRef] [Green Version]
Topkis, D.M. Supermodularity and Complementarity; Princeton University Press: Princeton, NJ, USA, 1998. [Google Scholar]
Marshall, A.W.; Olkin, I.; Arnold, B.C. Inequalities: Theory of Majorization and Its Applications; Mathematics in Science and Engineering; Academic Press: Cambridge, MA, USA, 1979; Volume 143. [Google Scholar]
Khouzani, M.H.R.; Malacaria, P. Relative Perfect Secrecy: Universally Optimal Strategies and Channel Design. In Proceedings of the IEEE 29th Computer Security Foundations Symposium, CSF 2016, Lisbon, Portugal, 27 June–1 July 2016; pp. 61–76. [Google Scholar] [CrossRef]
Iwamoto, M.; Shikata, J. Information Theoretic Security for Encryption Based on Conditional Rényi Entropies. In Information Theoretic Security; Padró, C., Ed.; Springer International Publishing: Cham, Switzerland, 2014; pp. 103–121. [Google Scholar]
Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2006. [Google Scholar]

Figure 1. The four implication graphs summarize the relation between preorders. It is known that a preorder

\geq_{i}

implies a preorder

\geq_{j}

if and only if there is a path of solid arrows from

\geq_{i}

to

\geq_{j}

. Preorders that are equivalent are grouped together, and the dotted arrows represent an implication whose validity is an open question. When no path is present, the implication is known not to hold.

Figure 1. The four implication graphs summarize the relation between preorders. It is known that a preorder

\geq_{i}

implies a preorder

\geq_{j}

if and only if there is a path of solid arrows from

\geq_{i}

to

\geq_{j}

. Preorders that are equivalent are grouped together, and the dotted arrows represent an implication whose validity is an open question. When no path is present, the implication is known not to hold.

Figure 2. The solution given by Algorithm 1 for

k = 3

.

Figure 2. The solution given by Algorithm 1 for

k = 3

.

Figure 3. Posterior Shannon and min-entropy for different anonymity solutions. k is

\sqrt{n}

, where n is the size of the input set. For the random solution, the values in the plot are the average of 1000 samples.

Figure 3. Posterior Shannon and min-entropy for different anonymity solutions. k is

\sqrt{n}

, where n is the size of the input set. For the random solution, the values in the plot are the average of 1000 samples.

Table 1. Some examples of core-concave entropies.

	$η (r)$	$F (p)$		Conditional Form $H (X \| Y)$
Shannon $(H_{1})$	r	$- \sum_{i} p_{i} log (p_{i})$		$\sum_{y} p (y) H_{1} (X \| y)$
Min-entropy $(H_{\infty})$	$- log (- r)$	$- {max}_{i} p_{i}$		$- log (\sum_{y} p (y) {max}_{x} p_{X \| y} (x))$
Guessing [44] $(H_{G})$	r	$\sum_{i} i p_{[i]}$		$\sum_{y} p (y) H_{G} (X \| y)$
Arimoto–Rényi [43] $(H_{α})$	$\frac{α}{1 - α} log (r)$	${∥p∥}_{α}$	if $0 < α < 1$	$\frac{α}{1 - α} log (\sum_{y} p (y) {∥p_{X \| y}∥}_{α})$
Arimoto–Rényi [43] $(H_{α})$	$\frac{α}{1 - α} log (- r)$	$- {∥p∥}_{α}$	if $α > 1$	$\frac{α}{1 - α} log (\sum_{y} p (y) {∥p_{X \| y}∥}_{α})$
Hayashi–Rényi [47] $(H_{α}^{'})$	$\frac{1}{1 - α} log (r)$	${∥p∥}_{α}^{α}$	if $0 < α < 1$	$\frac{1}{1 - α} log (\sum_{y} p (y) {∥p_{X \| y}∥}_{α}^{α})$
Hayashi–Rényi [47] $(H_{α}^{'})$	$\frac{1}{1 - α} log (- r)$	$- {∥p∥}_{α}^{α}$	if $α > 1$
Tsallis [45] $(H_{(α, α)})$	$\frac{1}{α - 1} (1 - r)$	${∥p∥}_{α}^{α}$	if $0 < α < 1$	$\frac{1}{α - 1} (1 - (\sum_{y} p (y) {∥p_{X \| y}∥}_{α}^{α}))$
Tsallis [45] $(H_{(α, α)})$	$\frac{1}{α - 1} (1 + r)$	$- {∥p∥}_{α}^{α}$	if $α > 1$
Sharma–Mittal [46] $(H_{(α, β)})$	$\frac{1}{β - 1} (1 - r^{\frac{1 - β}{1 - α}})$	${∥p∥}_{α}^{α}$	if $0 < α < 1$	$\frac{1}{β - 1} (1 - {(\sum_{y} p (y) {∥p_{X \| y}∥}_{α}^{α})}^{\frac{1 - β}{1 - α}})$
Sharma–Mittal [46] $(H_{(α, β)})$	$\frac{1}{β - 1} (1 - {(- r)}^{\frac{1 - β}{1 - α}})$	$- {∥p∥}_{α}^{α}$	if $α > 1$

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Américo, A.; Khouzani, M.; Malacaria, P. Channel-Supermodular Entropies: Order Theory and an Application to Query Anonymization. Entropy 2022, 24, 39. https://doi.org/10.3390/e24010039

AMA Style

Américo A, Khouzani M, Malacaria P. Channel-Supermodular Entropies: Order Theory and an Application to Query Anonymization. Entropy. 2022; 24(1):39. https://doi.org/10.3390/e24010039

Chicago/Turabian Style

Américo, Arthur, MHR Khouzani, and Pasquale Malacaria. 2022. "Channel-Supermodular Entropies: Order Theory and an Application to Query Anonymization" Entropy 24, no. 1: 39. https://doi.org/10.3390/e24010039

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Channel-Supermodular Entropies: Order Theory and an Application to Query Anonymization^†

Abstract

1. Introduction

1.1. Related Literature

1.2. Notational Conventions

2. Preliminaries

2.1. Core-Concave Entropies

2.2. Preorders over Channels

2.3. Relationships between Preorders

3. Channel-Supermodular Entropies

3.1. Examples of Channel-Supermodular and Non-Channel-Supermodular Entropies

4. The Join-Meet Operator and a New Structural Order

A New Structural Ordering

5. Relations between Preorders for Channel-Supermodular Entropies

Results on Channel Capacity

6. Channel Design

6.1. Deterministic Channel Design Problem: A Universal Solution by Channel-Supermodularity

6.2. An Application to Query Anonymity

6.3. Query Anonymity for Related Secrets

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Proofs of Section 3

Appendix A.1. Proof of Proposition 2

Appendix A.2. Proof of Proposition 3

Appendix B. Proofs of Section 5

Appendix B.1. Proof of Proposition 5

Appendix B.2. Proof of Proposition 9

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Channel-Supermodular Entropies: Order Theory and an Application to Query Anonymization †

Abstract

1. Introduction

1.1. Related Literature

1.2. Notational Conventions

2. Preliminaries

2.1. Core-Concave Entropies

2.2. Preorders over Channels

2.3. Relationships between Preorders

3. Channel-Supermodular Entropies

3.1. Examples of Channel-Supermodular and Non-Channel-Supermodular Entropies

4. The Join-Meet Operator and a New Structural Order

A New Structural Ordering

5. Relations between Preorders for Channel-Supermodular Entropies

Results on Channel Capacity

6. Channel Design

6.1. Deterministic Channel Design Problem: A Universal Solution by Channel-Supermodularity

6.2. An Application to Query Anonymity

6.3. Query Anonymity for Related Secrets

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Proofs of Section 3

Appendix A.1. Proof of Proposition 2

Appendix A.2. Proof of Proposition 3

Appendix B. Proofs of Section 5

Appendix B.1. Proof of Proposition 5

Appendix B.2. Proof of Proposition 9

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Channel-Supermodular Entropies: Order Theory and an Application to Query Anonymization^†