Next Article in Journal
Solidification Morphology and Bifurcation Predictions with the Maximum Entropy Production Rate Model
Previous Article in Journal
Analysis of Streamflow Complexity Based on Entropies in the Weihe River Basin, China
Previous Article in Special Issue
Conditional Rényi Divergence Saddlepoint and the Maximization of α-Mutual Information
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Guessing with a Bit of Help †

1
Institute for Data, Systems, and Society and Laboratory for Information & Decision Systems, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
2
Department of Electrical Engineering-Systems, Tel Aviv University, Tel Aviv 69978, Israel
*
Author to whom correspondence should be addressed.
This paper is an extended version of our paper published in proceedings of IEEE International Symposium on Information Theory (ISIT), Vail, CO, USA, 17–22 June 2018.
Entropy 2020, 22(1), 39; https://doi.org/10.3390/e22010039
Submission received: 29 August 2019 / Revised: 22 December 2019 / Accepted: 23 December 2019 / Published: 26 December 2019

Abstract

:
What is the value of just a few bits to a guesser? We study this problem in a setup where Alice wishes to guess an independent and identically distributed (i.i.d.) random vector and can procure a fixed number of k information bits from Bob, who has observed this vector through a memoryless channel. We are interested in the guessing ratio, which we define as the ratio of Alice’s guessing-moments with and without observing Bob’s bits. For the case of a uniform binary vector observed through a binary symmetric channel, we provide two upper bounds on the guessing ratio by analyzing the performance of the dictator (for general k 1 ) and majority functions (for k = 1 ). We further provide a lower bound via maximum entropy (for general k 1 ) and a lower bound based on Fourier-analytic/hypercontractivity arguments (for k = 1 ). We then extend our maximum entropy argument to give a lower bound on the guessing ratio for a general channel with a binary uniform input that is expressed using the strong data-processing inequality constant of the reverse channel. We compute this bound for the binary erasure channel and conjecture that greedy dictator functions achieve the optimal guessing ratio.

1. Introduction

In the classical guessing problem, Alice wishes to learn the value of a discrete random variable (r.v.) X as quickly as possible by sequentially asking yes/no questions of the form “Is X = x ?”, until she makes a correct guess. A guessing strategy corresponds to an ordering of the alphabet of X according to which the guesses are made and induces a random guessing time. It is well known and simple to verify that the guessing strategy which simultaneously minimizes all the positive moments of the guessing time is to order the alphabet according to a decreasing order of probability. Formally, for any s > 0 , the minimal sth-order guessing-time moment of X is
G s ( X ) : = E ORD X s ( X ) ,
where ORD X ( x ) returns the index of the symbol x relative to the order induced by sorting the probabilities in a descending order, with ties broken arbitrarily. For brevity, we refer to G s ( X ) as the guessing-moment of X.
Several motivating problems for studying guesswork are fairness in betting games, computational complexity of sequential decoding [1], computational complexity of lossy source coding and database search algorithms (see the introduction of Reference [2] for a discussion), secrecy systems [3,4,5], and crypt-analysis (password cracking) [6,7]. The guessing problem was first introduced and studied in an information-theoretic framework by Massey [8], who drew a relation between the average guessing time of an r.v. to its entropy. It was later explored more systematically by Arikan [1], who also introduced the problem of guessing with side information. In this problem, Alice is in possession of another r.v. Y that is jointly distributed with X, and then, the optimal conditional guessing strategy is to guess by decreasing order of conditional probabilities. Hence, the associated minimal conditional sth-order guessing-time moment of X given Y is
G s ( X | Y ) : = E ORD X | Y s ( X Y ) ,
where ORD X | Y ( x y ) returns the index of x relative to the order induced by sorting the conditional probabilities of X given that Y = y in a descending order. Arikan showed that, as intuition suggests, side information reduces the guessing-moments ([1], Corollary 1)
G s ( X | Y ) G s ( X ) .
Furthermore, he showed that, if { ( X i , Y i ) } i = 1 n is an i.i.d. sequence, then ([1], Proposition 5)
lim n 1 n log G s 1 / s ( X n | Y n ) = H 1 1 + s ( X 1 Y 1 ) ,
where H α ( X Y ) is the Arimoto-Rényi conditional entropy of order α . As was noted by Arikan a few years later [9], the guessing moments are related to the large deviations behavior of the random variable 1 n log ORD X n | Y n ( X n Y n ) . However, in Reference [9], he was only able to obtain right-tail large deviation bounds since asymptotically tight bounds on G s ( X n Y n ) were only known for positive moments ( s > 0 ). Large deviation principle for the normalized logarithm of the guessing time was later established in Reference [10] using substantial results from References [11,12]. Throughout the years, information-theoretic analysis of the guessing problem was extended in multiple directions, such as guessing until the distortion between the guess and the true value is below a certain threshold [2], guessing under source uncertainty [13], and improved bounds at finite blocklength [14,15,16], to name a few.
In the conditional setting described above, one may think of Y n as side information observed by a “helper”, say Bob, who sends his observations to Alice. Nonetheless, as other problems employing a helper (e.g., source coding [17,18]), it is more realistic to impose communication constraints and to assume that Bob can only send a compressed description of Y n to Alice. This setting was recently addressed by Graczyk and Lapidoth [19,20], who considered the case where Bob encodes Y n at a positive rate using n R bits before sending this description to Alice. They then characterized the best possible guessing-moments attained by Alice for general distributions as a function of the rate R. In this paper, we take this setting to its extreme and attempt to quantify the value of k bits in terms of reducing the guessing-moments by allowing Bob to use only a k-bit description of Y n . The major difference from previous work is that, here, k is finite and does not increase with n, and for some of our results, we further concentrate on the extreme case of k = 1 —a single bit of help. To that end, we define (Section 2) the guessing ratio, which is the (asymptotically) best possible ratio of the guessing-moments of X n obtained with and without observing a function f ( Y n ) { 0 , 1 } k , i.e., the minimal possible ratio G s ( X n f ( Y n ) ) / G s ( X n ) as a function of s > 0 , in the limit of large n.
Sharply characterizing the guessing ratio appears to be a difficult problem in general. Here, we mostly focus on the special case where X n is uniformly distributed over the Boolean cube { 0 , 1 } n and Y n is obtained by passing X n through a memoryless binary symmetric channel (BSC) with crossover probability δ (Section 3). We derive two upper bounds and two lower bounds on the guessing ratio in this case. The upper bounds are derived by analyzing the ratio attained by two specific functions, k-Dictator, to wit f ( Y n ) = Y k , and Majority, to wit f ( Y n ) = 𝟙 ( i = 1 n Y i > n 2 ) , where 𝟙 ( · ) is the indicator function, and for simplicity, we henceforth assume that n is odd when discussing majority functions. For k = 1 , we demonstrate that neither of these functions is better than the other for all values of the moment order s. The first lower bound is based on relating the guessing-moment to entropy using maximum-entropy arguments (generalizing a result of Reference [8]), and the second one on Fourier-analytic techniques combined with a hypercontractivity argument [21]. Furthermore, for the restricted class of functions for which the constituent k-bit functions operate on disjoint sets of bits, a general method is proposed for transforming a lower bound valid for k = 1 to a lower bound valid for any k 1 . Nonetheless, we remark that our bounds are valid for s > 0 and obtaining similar bounds for s < 0 in order to obtain large deviation principle for the normalized logarithm of the guessing time remains an open problem. In Section 4, we briefly discuss the more general case where X n is still uniform over the Boolean cube, but Y n is obtained from X n via a general binary-input, arbitrary-output channel. We generalize our entropy lower bound to this case using the strong data-processing inequality (SDPI) applied to the reverse channel (from Y to X). We then discuss the case of the binary erasure channel (BEC), for which we also provide an upper bound by analyzing the greedy dictator function, namely where Bob sends the first bit that has not been erased. We conjecture that this function minimizes the guessing-moments simultaneously at all erasure parameters and all moments s.
Related Work. As mentioned above, Graczyk and Lapidoth [19,20] considered the same guessing question if Bob can communicate with Alice at some positive rate R, i.e., can use k = n R bits to describe Y n . This setup facilitates the use of large-deviation-based information-theoretic techniques, which allowed the authors to characterize the optimal reduction in the guessing-moments as a function of R to the first order in the exponent. This type of argument cannot be applied in our setup of finite number of bits. Furthermore, as we shall see, in our setup, the exponential order of the guessing moment with help is equal to the one without it and the performance is therefore more finely characterized by bounding the ratio of the guessing-moments. For a single bit of help k = 1 , characterizing the guessing ratio in the case of the BSC with a uniform input can also be thought of as a guessing variant of the most informative Boolean function problem introduced by Kumar and Courtade [22]. There, the maximal reduction in the entropy of X n obtainable by observing a Boolean function f ( Y n ) is sought after. It was conjectured in Reference [22] that a dictator function, e.g., f ( y n ) = y 1 , is optimal simultaneously at all noise levels; see References [23,24,25,26] for some recent progress. As in the guessing case, allowing Bob to describe Y n using n R bits renders the problem amenable to an exact information-theoretic characterization [27]. In another related work [28], we have asked about the Boolean function Y n that maximizes the reduction in the sequential mean-squared prediction error of X n and showed that the majority function is optimal in the noiseless case. There is, however, no single function that is simultaneously optimal at all noise levels. Finally, in a recent line of works [29,30], the average guessing time using the help of a noisy version of f ( X n ) has been considered. The model in this paper is different since the noise is applied to the inputs of the function rather than to its output.

2. Problem Statement

Let X n be an i.i.d. vector from a distribution P X , which is transmitted over a memoryless channel of conditional distribution P Y | X . A helper observes Y n Y n at the output of the channel and can send k bits f ( Y n ) , f : Y n { 0 , 1 } k to a guesser of X n . Our goal is to characterize the best possible multiplicative reduction in guessing-moments offered by a function f, in the limit of large n. Precisely, we wish to characterize the guessing ratio, defined as
γ s , k ( P X , P Y | X ) : = lim sup n min f : Y n { 0 , 1 } k G s ( X n f ( Y n ) ) G s ( X n )
for an arbitrary s > 0 . In this paper, we are mostly interested in the case where P X = ( 1 / 2 , 1 / 2 ) , i.e., X n is uniformly distributed over { 0 , 1 } n , and where the channel is a BSC with crossover probability δ [ 0 , 1 / 2 ] . With a slight abuse of notation, we denote the guessing ratio in this case by γ s , k ( δ ) . Furthermore, some of the results will be restricted to the case of a single bit of help ( k = 1 ), and in this case, we will further abbreviate the notation from γ s , 1 ( δ ) to γ s ( δ ) . We note the following basic facts.
Proposition 1.
The following properties hold:
1. 
The minimum in Equation (5) is achieved by a sequence of deterministic functions.
2. 
γ s , k ( δ ) is a non-decreasing function of δ [ 0 , 1 / 2 ] which satisfies γ s , k ( 0 ) = 2 s k and γ s , k ( 1 / 2 ) = 1 . In addition, γ s , k ( 0 ) is attained by any sequence of functions f n such that f n ( Y n ) is a uniform Bernoulli vector, i.e., Pr ( f n ( Y n ) = b k ) = 2 k for all b k { 0 , 1 } k .
3. 
For a BSC P Y | X , the limit-supremum in Equation (5) defining γ s , k ( δ ) is a regular limit.
4. 
If k = 1 and X n is a uniformly distributed vector, then the optimal guessing order given that f ( Y n ) = 0 is reversed to the optimal guessing order when f ( Y n ) = 1 .
Proof. 
See Appendix A. □

3. Guessing Ratio for a Binary Symmetric Channel

3.1. Main Results

We begin by presenting the bound on the guessing ratio γ s , k ( δ ) obtained by k-dictator functions and then proceed to the bound obtained by majority functions for a single bit of help, k = 1 . The proofs are given in the next two subsections.
Theorem 1.
Let L k , w : = v = 0 w k v for w { 0 , 1 , , k } . The guessing ratio is upper bounded as
γ s , k ( δ ) ( 1 2 δ ) · 2 s k · w = 0 k 1 ( 1 δ ) k 1 w δ w · L k , w s + 1 + ( 2 δ ) k ,
and this upper bound is achieved by k-dictator functions, f ( y n ) = y k .
Specifically, for k = 1 , Theorem 1 implies
γ s ( δ ) ( 1 2 δ ) · 2 s + 2 δ .
Theorem 2.
Let β : = 1 2 δ 4 δ ( 1 δ ) and Z N ( 0 , 1 ) , and denote by Q ( · ) the tail distribution function of the standard normal distribution. Then, the guessing ratio is upper bounded as
γ s ( δ ) 2 · ( s + 1 ) · E Q ( β Z ) · 1 Q ( Z ) s ,
and this upper bound is achieved by majority functions, f ( y n ) = 𝟙 ( i = 1 n y i > n 2 ) .
We remark that, if k = 1 , the guessing ratio of functions similar to the dictator and majority functions, such as single-bit dictator on j > 1 inputs ( f ( y n ) = 1 if and only if y j = 1 j ) or unbalanced majority ( f ( y n ) = 𝟙 ( i = 1 n y i > t ) for some t), may also be analyzed in a similar way. However, numerical computations indicate that they do not improve the bounds of Theorems 1 and 2, and thus, their analysis is omitted.
We next present two lower bounds on the guessing ratio γ s , k ( δ ) . The first is based on maximum-entropy arguments, and the second is based on Fourier-analytic arguments.
Theorem 3.
The guessing ratio satisfies the following lower bound:
γ s , k ( δ ) e 1 · s s 1 · ( s + 1 ) Γ s ( 1 s ) · 2 s k ( 1 2 δ ) 2
where Γ ( z ) : = 0 t z 1 e t d t is Euler’s Gamma function (defined for { z } > 0 ).
Remark 1.
When restricted to k = 1 , the proof of Theorem 3 utilizes the bound H ( X n | f ( Y n ) ) n ( 1 2 δ ) 2 (see Equation (63)). For balanced functions, this bound was improved in Reference [23] for 1 / 2 ( 1 1 / 3 ) δ 1 / 2 . Using this improved bound here leads to an immediate improvement in the bound of Theorem 3. Furthermore, it is known [24] that there exists δ 0 such that the most informative Boolean function conjecture holds for all δ 0 δ 1 / 2 . For such crossover probabilities,
H ( X n | f ( Y n ) ) n 1 + h ( δ )
holds, and then, Theorem 3 may be improved to
γ s ( δ ) e 1 · s s 1 · ( s + 1 ) Γ s ( 1 s ) · 2 s 1 h ( δ ) .
Our Fourier-based bound for k = 1 is as follows:
Theorem 4.
Let τ : = 1 + ( 1 2 δ ) 2 ( 1 λ ) . The guessing ratio is lower bounded as
γ s ( δ ) max 0 λ 1 1 ( s + 1 ) · ( 1 2 δ ) λ τ s + 1 1 / τ .
This bound can be weakened by the possibly suboptimal choice λ = 1 , which leads to a simpler yet explicit bound:
Corollary 1.
γ s ( δ ) 1 ( s + 1 ) · ( 1 2 δ ) 1 + 2 s .
The bound in Theorem 4 is only valid for the case k = 1 . An interesting problem is to find a general way of “transforming” a lower bound which assumes k = 1 to a bound useful for k > 1 . In principle, such a result could stem from the observation that a k bit function provides k different conditional optimal guessing orders for each of its output bits. For a general function, however, distilling a useful bound from this observation seems challenging since the relation between the optimal guessing order induced by each of the bits and the optimal guessing order induced by all k bits might be involved. Nonetheless, such a result is possible to obtain if each of the k single-bit functions operate on a different set of input bits. For this restricted set of functions, there is a simple bound which relates the optimal ordering given each of the bits and all the k bits together. It is reasonable to conjecture that this restricted sub-class is optimal or at least close to optimal, since it seems that more information is transferred to the guesser when the k functions operate on different sets of bits, which make the k functions statistically independent.
Specifically, let us specify a k-bit function f : Y n { 0 , 1 } k by its k constituent one-bit functions f j : Y n { 0 , 1 } , j [ k ] . Let F k be the set of sequences of functions { f ( n ) } , f ( n ) : Y n { 0 , 1 } k , such that each specific sequence of functions { f ( n ) } satisfies the following property: There exists a sequence of partitions { { I j ( n ) } j [ k ] } n = 1 of [ n ] , such that, for all n 1 and j [ k ] , f j ( n ) ( Y n ) only depends on { Y i } i I j ( n ) and lim n | I j ( n ) | = for all j [ k ] . In particular, this implies that { f j ( n ) ( Y n ) } j [ k ] is mutually independent for all n 1 . For example, when k = 2 , f 1 ( x n ) = x 1 , and f 2 ( x n ) = x 2 , we can choose I j ( n ) to be the odd/even indices. For f 1 = Maj ( y 1 n / 2 ) and f 2 = Maj ( y n / 2 + 1 n ) , the sets are the first and second halves of [ n ] . As in Equation (5), we may define the guessing ratio of this constrained set of functions as
γ ˜ s , k ( δ ) : = min { f ( n ) } n = 1 F k lim sup n G s ( X n f ( n ) ( Y n ) ) G s ( X n ) ,
where, in general, γ ˜ s , k ( δ ) γ s , k ( δ ) .
Proposition 2.
γ ˜ s , k ( δ ) γ ˜ s , 1 k ( δ ) ( s + 1 ) k 1 .
We demonstrate our results for k = 1 in Figure 1 (resp. Figure 2) which display the bounds on γ s ( δ ) for fixed values of s (resp. δ ). The numerical results show that, for the upper bounds, when s 3.5 , dictator dominates majority (for all values of δ ), whereas for s 4.25 , majority dominates dictator. For 3.5 s 4.25 , there exists δ s such that majority is better for δ ( 0 , δ s ) and dictator is better for δ ( δ s , 1 / 2 ) . Figure 2 demonstrates the switch from dictator to majority as s increases (depending on δ ). As for lower bounds, we first remark that the conjectured maximum-entropy bound (Equation (11)) is also plotted (see Remark 1). The numerical results show that the maximum-entropy bound is better for low values of δ whereas the Fourier-analysis bound is better for high values of δ . As a function of s, the maximum-entropy bound (resp. Fourier-analysis bound) is better for high (resp. low) values of s. We also mention that, in these figures, the maximizing parameter in the Fourier-based bound (Theorem 4) is λ = 1 and the resulting bound is as in Equation (13). However, for values of s as low as 10, the maximizing λ may be far from 1, and in fact, it continuously and monotonically increases from 0 to 1 as δ increases from 0 to 1 / 2 . Finally, Figure 3 demonstrates the behavior of the k-dictator and maximum-entropy bounds on γ s , k ( δ ) as a function of k.

3.2. Proofs of the Upper Bounds on γ s , k ( δ )

Let a , b N , a b be given. The following sum will be useful for the proofs in the rest of the paper:
K s ( a , b ) : = 1 b a i = a + 1 b i s ,
where we will abbreviate K s ( b ) : = K s ( 0 , b ) . For a pair of sequences { a n } n = 1 , { b n } n = 1 , we will let a n b n mean that lim n a n b n = 1 .
Lemma 1.
Let { a n } n = 1 and { b n } n = 1 be non-decreasing integer sequences such that a n < b n for all n and lim n ( a n + 1 ) / b n = 0 . Then,
K s ( a n , b n ) 1 s + 1 · b n s + 1 a n s + 1 b n a n .
Specifically, G s ( X n ) = K s ( 2 n ) 2 s n s + 1 .
Proof. 
See Appendix A. □
We next prove Theorem 1.
Proof of Theorem 1.
Consider a k-dictator function which directly outputs k of the bits of y n , say, without loss of generality (w.l.o.g.) f ( y n ) = y k . Let d H ( x n , y n ) be the Hamming distance of x n and y n , and recall the assumption 0 < δ < 1 / 2 . It is easily verified that the optimal guessing order of X n given y k has k + 1 parts, such that the wth part, w { 0 , 1 , , k } , is comprised of an arbitrary ordering of the k w · 2 n k vectors for which d H ( x k , y k ) = w . From symmetry, G s ( X n f ( Y n ) ) = G s ( X n f ( Y n ) = b k ) for any b k { 0 , 1 } k . Then, from Lemma 1
G s ( X n f ( Y n ) = b k ) = w = 0 k k w ( 1 δ ) k w δ w · K s ( 2 n k · L k , w 1 , 2 n k · L k , w )
= w = 0 k k w ( 1 δ ) k w δ w · K s ( 2 n k · L k , w 1 , 2 n k · L k , w )
w = 0 k k w ( 1 δ ) k w δ w · 2 s ( n k ) s + 1 · L k , w s + 1 L k , w 1 s + 1 k w
= 2 s ( n k ) s + 1 w = 0 k ( 1 δ ) k w δ w · L k , w s + 1 L k , w 1 s + 1
= 2 s ( n k ) s + 1 ( 1 2 δ ) w = 0 k 1 ( 1 δ ) k 1 w δ w · L k , w s + 1 + δ k 2 k ( s + 1 )
where in the first equality, L k , 1 : = 0 , and the last equality is obtained by telescoping the sum. The result then follows from Equation (5) and Lemma 1. □
We next prove Theorem 2.
Proof of Theorem 2.
Recall that we assume for simplicity that n is odd. The analysis for an even n is not fundamentally different. To evaluate the guessing-moment, we first need to find the optimal guessing strategy. To this end, we let W H ( x n ) be the Hamming weight of x n and note that the posterior probability is given by
Pr ( X n = x n Maj ( Y n ) = 1 ) = Pr ( Maj ( Y n ) = 1 X n = x n ) · Pr ( X n = x n ) Pr ( Maj ( Y n ) = 1 )
= 2 1 n · Pr i = 1 n Y i > n / 2 X n = x n
= 2 1 n · Pr i = 1 n Y i > n / 2 W H ( X n ) = W H ( x n )
= : 2 1 n · r n ( W H ( x n ) ) ,
where Equation (25) follows from symmetry. Evidently, r n ( w ) is an increasing function of w { 0 , 1 , , n } . Indeed, let Bin ( n , δ ) be a binomial r.v. of n trials and success probability δ . Then, for any w n 1 , as δ 1 / 2 ,
r n ( w + 1 )
= Pr Bin ( w + 1 , 1 δ ) + Bin ( n w 1 , δ ) > n / 2
= Pr Bin ( w , 1 δ ) + Bin ( 1 , 1 δ ) + Bin ( n w 1 , δ ) > n / 2
Pr Bin ( w , 1 δ ) + Bin ( 1 , δ ) + Bin ( n w 1 , δ ) > n / 2
= Pr Bin ( w , 1 δ ) + Bin ( n w , δ ) > n / 2
= r n ( w ) ,
where, in each of the above probabilities, the summation is over an independent binomial r.v. Hence, we deduce that, whenever Maj ( Y n ) = 1 (resp. Maj ( Y n ) = 0 ), the optimal guessing strategy is by decreasing (resp. increasing) Hamming weight (with arbitrary order for inputs of equal Hamming weight).
We can now turn to evaluate the guessing-moment for the optimal strategy given the majority of Y n . Let M n , w : = v = 0 w n v for w { 0 , 1 , , n } . From symmetry,
G s ( X n Maj ( Y n ) ) = G s ( X n Maj ( Y n ) = 1 )
= w = 0 n n w 2 1 n r n ( w ) i = M n , w 1 + 1 M n , w i s
where M n , 1 : = 0 . Thus,
G s ( X n Maj ( Y n ) ) w = 0 n n w 2 1 n r n ( w ) M n , w 1 s
= 2 s n + 1 · E r n ( W ) M n , W 1 2 n s
= 2 s n + 1 · E r n ( W ) Pr W W 1 s ,
where W , W Bin ( n , 1 / 2 ) and is independent. For evaluating the asymptotic behavior (for large n) of this expression, we note that the Berry–Esseen central-limit theorem ([31], Chapter XVI.5, Theorem 2) leads to (see, e.g., Reference [28], proof of Lemma 15)
r n ( w ) = Q β · 2 n n 2 w + a δ n ,
for some universal constant a δ . Using the Berry–Esseen central-limit theorem again, we have that 2 n ( n 2 W ) d Z , where Z N ( 0 , 1 ) and d denote convergence in distribution. Thus for a given w,
Pr W w 1 = 1 Pr 2 n n 2 W 2 n n 2 w 1
= 1 Q 2 n n 2 w 1 a 1 / 2 n
= 1 Q 2 n n 2 w O 1 n ,
where the last equality follows from the fact that | Q ( t ) | 1 2 π for all t R . Using the Berry–Esseen theorem once again, we have that 2 n ( n 2 w ) d Z . Hence, Portmanteau’s lemma (e.g., Reference [31], Chapter VIII.1, Theorem 1) and the fact the Q ( t ) is continuous and bounded result in the following:
G s ( X n Maj ( Y n ) ) 2 s n + 1 · E Q β N · 1 Q ( N ) s + O 1 n s / 2 .
Similarly to Equation (34), the upper bound
G s ( X n Maj ( Y n ) ) w = 0 n n w 2 1 n r n ( w ) M w s ,
holds, and a similar analysis leads to an expression which asymptotically coincides with the right-hand side (r.h.s.) of Equation (41). The result then follows from Equation (5) and Lemma 1. □

3.3. Proofs of the Lower Bounds on γ s , k ( δ )

To prove Theorem 3, we first prove the following maximum entropy result. With a standard abuse of notation, we will write the guessing-moment and the entropy of a random variable as functions of its distribution.
Lemma 2.
The maximal entropy under guessing-moment constraint satisfies
max P : G s ( P ) = g H ( P ) = log e 1 / s s ( 1 s ) / s · G s 1 / s ( P ) · Γ 1 s + o ( 1 ) ,
where o ( 1 ) vanishes as g .
Proof. 
To solve the maximum entropy problem ([32], Chapter 12) in Equation (43) (note that the support of P is only restricted to be countable), we first relax the constraint G s ( P ) = g to
i = 1 P ( i ) · i s = g ,
i.e., we omit the requirement that { P ( i ) } is a decreasing sequence. Assuming momentarily that the entropy is measured in nats, it is easily verified (e.g., using the theory of exponential families ([33], Chapter 3) or by Lagrange duality ([34], Chapter 5)) that the entropy maximizing distribution is
P λ ( i ) : = exp ( λ i s ) Z ( λ )
for i N + , where Z ( λ ) : = i = 1 exp ( λ i s ) is the partition function and λ > 0 is chosen such that i = 1 P λ ( i ) · i s = g . Evidently, P λ ( i ) is in decreasing order (and so is G s ( P λ ) = g ) and is therefore the solution to Equation (43). The resulting maximum entropy is then given in a parametric form as
H ( P λ ) = λ G s ( P λ ) + ln Z ( λ ) .
Evidently, if g = G s ( P λ ) , then λ 0 . In this case, we may approximate the limit of the partition function as λ 0 by a Riemann integral. Specifically, by the monotonicity of e λ i s in i N ,
Z ( λ ) = i = 1 e λ i s
= 1 2 i = exp | i | λ 1 / s s 1
1 2 exp | t | λ 1 / s s d t 1
= 1 s λ 1 / s · Γ 1 s 1 2 ,
where the last equality follows from the definition of the Gamma function (see Theorem 3) or from the identification of the integral as an unnormalized generalized Gaussian distribution of zero mean, scale parameter λ 1 / s , and shape parameter s [35]. Further, by the convexity of e λ t s in t R + , Jensen’s inequality implies that
e λ i s i 1 / 2 i + 1 / 2 exp λ | t | s d t
for every i 1 (the r.h.s. can be considered as averaging over a uniform random variable [ i 1 / 2 , i + 1 / 2 ] ) and so, similarly to Equation (50),
Z ( λ ) 1 2 exp | t | λ 1 / s s d t .
Therefore,
Z ( λ ) = ( 1 + a λ ) · 1 s λ 1 / s · Γ 1 s
where a λ 0 as λ 0 . In the same spirit,
G s ( P λ ) = i = 1 i s · exp ( λ i s ) Z ( λ )
= 0 t s exp | t | λ 1 / s s d t + b λ ( 1 + a λ ) 1 s λ 1 / s · Γ 1 s
= 1 s λ s + 1 s · Γ s + 1 s + b λ ( 1 + a λ ) 1 s λ 1 / s · Γ 1 s
= 1 s 2 λ s + 1 s · Γ 1 s + b λ ( 1 + a λ ) 1 s λ 1 / s · Γ 1 s
= 1 s λ ( 1 + c λ ) ,
where in Equation (56), b λ 0 as λ 0 ; in Equation (57), the identity Γ ( t + 1 ) = t Γ ( t ) for t R + was used; and in Equation (58), c λ 0 as λ 0 .
Returning to measure entropy in bits, we thus obtain that, for any distribution P,
H ( P ) log e 1 / s s s 1 / s · G s 1 / s ( P ) · Γ 1 s + o ( 1 ) ,
or, equivalently,
G s ( P ) Ψ s · 2 s H ( P ) · ( 1 + o ( 1 ) ) ,
where Ψ s : = e 1 · s s 1 Γ s ( 1 s ) and o ( 1 ) is a vanishing term as G s ( P ) . In the same spirit, Equation (60) holds whenever H ( P ) . □
Remark 2.
In Reference [8], the maximum-entropy problem was studied for s = 1 . In this case, the maximum-entropy distribution is readily identified as the geometric distribution. The proof above generalizes that result to any s > 0 .
Proof of Theorem 3.
Assume that f is taken from a sequence of functions which achieves the minimum in Equation (5). Using Lemma 2 when conditioning on f ( Y n ) = b k for each of possible b k , we get (see a rigorous justification to Equation (61) in Appendix A)
G s ( X n f ( Y n ) ) n · Ψ s · b k { 0 , 1 } k Pr ( f ( Y n ) = b k ) · 2 s H ( X n f ( Y n ) = b k )
n · Ψ s · 2 s H ( X n | f ( Y n ) )
n · Ψ s · 2 s [ n k ( 1 2 δ ) 2 ]
where in Equation (61), n 1 and Equation (62) follows from Jensen’s inequality. For k = 1 , the bound in Equation (63) is directly related to the Boolean function conjecture [22] and may be proved in several ways, e.g., using Mrs. Gerber’s Lemma ([36], Theorem 1); see ([23], Section IV), References [27,37]. For general k 1 , the bound H ( X n | f ( Y n ) ) n k ( 1 2 δ ) 2 was established in Reference ([27], Corollary 1). □
Before presenting the proof of the Fourier-based bound, we briefly remind the reader of the basic definitions and results of Fourier analysis of Boolean functions [21], and to that end, it is convenient to replace the binary alphabet { 0 , 1 } by { 1 , 1 } . An inner product between two real-valued functions on the Boolean cube f , g : { 1 , 1 } n R is defined as
f , g : = E f ( X n ) g ( X n ) ,
where X n { 1 , 1 } n is a uniform Bernoulli vector. A character associated with a set of coordinates S [ n ] : = { 1 , 2 , , n } is the Boolean function x S : = i S x i , where by convention, x : = 1 . It can be shown ([21], Chapter 1) that the set of all characters forms an orthonormal basis with respect to the inner product (Equation (64)). Furthermore,
f ( x n ) = S [ n ] f ^ S · x S ,
where { f ^ S } S [ n ] are the Fourier coefficients of f, given by f ^ S = x S , f = E ( X S · f ( X n ) ) . Plancherel’s identity then states that f , g = E ( f ( X n ) g ( X n ) ) = S [ n ] f ^ S g ^ S . The p norm of a function f is defined as f p : = [ E | f ( X n ) | p ] 1 / p .
The noise operator operating on a Boolean function f is defined as
T ρ f ( x n ) = E ( f ( Y n ) X n = x n )
where ρ : = 1 2 δ is the correlation parameter. The noise operator has a smoothing effect on the function which is captured by the so-called hypercontractivity theorems. Specifically, we shall use the following version.
Theorem 5
([21], p. 248). Let f : { 1 , 1 } n R and 0 ρ 1 . Then, T ρ f 2 f ρ 2 + 1 .
With the above, we can prove Theorem 4.
Proof of Theorem 4.
From Bayes law (recall that f ( x n ) { 1 , 1 } )
Pr ( X n = x n f ( Y n ) = b ) = 2 ( n + 1 ) · 1 + b T ρ f ( x n ) Pr ( f ( Y n ) = b ) ,
and from the law of total expectation
G s ( X n f ( Y n ) ) = Pr ( f ( Y n ) = 1 ) · G s ( X n f ( Y n ) = 1 ) + Pr ( f ( Y n ) = 1 ) · G s ( X n f ( Y n ) = 1 ) .
Let us denote f ^ ϕ = E f ( X n ) and g : = f f ^ ϕ and abbreviate ORD f ( x n ) : = ORD X n f ( Y n ) ( x n 1 ) . Then, the first addend on the r.h.s. of Equation (68) is given by
Pr ( f ( Y n ) = 1 ) · G s ( X n f ( Y n ) = 1 ) = 2 ( n + 1 ) x n 1 + f ^ ϕ + T ρ g ( x n ) · ORD T ρ g s ( x n )
= ( 1 + f ^ ϕ ) 2 · E ORD T ρ g s ( X n ) + 1 2 T ρ g , ORD T ρ g s
= ( 1 + f ^ ϕ ) 2 · K s ( 2 n ) + 1 2 T ρ g , ORD T ρ g s
= ( 1 + f ^ ϕ ) 2 · n · 2 s n s + 1 + 1 2 T ρ g , ORD T ρ g s ,
where, in the last equality, n 1 (Lemma 1). Let λ [ 0 , 1 ] , and denote ρ 1 : = ρ λ and ρ 2 = ρ 1 λ . Then, the inner-product term in Equation (72) is upper bounded as
T ρ g , ORD T ρ g s = T ρ 1 g , T ρ 2 ORD T ρ g s
T ρ 1 g 2 · T ρ 2 ORD T ρ g s 2
ρ 1 · 1 f ^ ϕ 2 · T ρ 2 ORD T ρ g s 2
ρ 1 · 1 f ^ ϕ 2 · ORD T ρ g s 1 + ρ 2 2
= ρ 1 · 1 f ^ ϕ 2 · K ( 1 + ρ 2 2 ) s ( 2 n ) 1 / ( 1 + ρ 2 2 )
= ρ 1 · 1 f ^ ϕ 2 · k n · 1 ( 1 + ρ 2 2 ) s + 1 1 / ( 1 + ρ 2 2 ) · 2 s n ,
where Equation (73) holds since T ρ is a self-adjoint operator and Equation (74) follows from the Cauchy–Schwarz inequality. To justify Equation (75), we note that
T ρ g 2 2 = T ρ g , T ρ g
= S [ n ] ρ 2 | S | g ^ S 2
= S [ n ] \ ϕ ρ 2 | S | f ^ S 2
ρ 2 · ( 1 f ^ ϕ 2 ) ,
where Equation (80) follows from Plancherel’s identity, Equation (81) is since g ^ S = f ^ S for all S ϕ and g ^ ϕ = 0 , and Equation (82) follows from S [ n ] f ^ S 2 = f 2 2 = E f 2 = 1 . Equation (76) follows from Theorem 5, and in Equation (78), k n 1 . The second addend on the r.h.s. of Equation (68) can be bounded in the same manner. Hence,
G s ( X n f ( Y n ) ) max 0 λ 1 2 s n · n · 1 s + 1 ρ λ · 1 f ^ ϕ 2 · k n · 1 ( 1 + ρ 2 ( 1 λ ) ) s + 1 1 / ( 1 + ρ 2 ( 1 λ ) )
max 0 λ 1 2 s n · n · 1 s + 1 ρ λ k n · 1 ( 1 + ρ 2 ( 1 λ ) ) s + 1 1 / ( 1 + ρ 2 ( 1 λ ) )
2 s n · max 0 λ 1 1 s + 1 ρ λ ( 1 + ρ 2 ( 1 λ ) ) s + 1 1 / ( 1 + ρ 2 ( 1 λ ) )
as n . □
We close this section with the following proof of Proposition 2:
Proof of Proposition 2.
Let I = ( i 1 , , i L ) be a vector of indices in [ n ] such that 1 i 1 < i 2 < < i L n , and let x n ( I ) = ( x i 1 , , x i L ) be the components of x n in those indices. Further, let { f ( n ) } n = 1 F k . Then, it holds that
Pr X n = x n , f ( n ) ( Y n ) = j = 1 k Pr X n ( I j ) = x n ( I j ) , f j ( n ) ( y n ) ,
as well as
ORD X n f ( n ) ( Y n ) ( x n b k ) j = 1 k ORD X n ( I j ) f j ( n ) ( Y n ) ( x n ( I j ) b j ) 1 .
Hence,
G s ( X n f ( n ) ( Y n ) ) j = 1 k [ G s ( n ) ( X n ( I j ) f j ( n ) ( Y n ) ) 1 ]
and the stated bound is deduced after taking limits and normalizing by G s ( X n ) 2 s n s + 1 . □

4. Guessing Ratio for a General Binary Input Channel

In this section, we consider the guessing ratio for general channels with a uniform binary input. The lower bound of Theorem 3 can be easily generalized to this case. To that end, consider the SDPI constant [38,39] of the reverse channel ( P Y , P X | Y ) , given by
η ( P Y , P Y | X ) : = sup Q Y : Q Y P Y D ( Q X | | P X ) D ( Q Y | | P Y ) ,
where Q X is the X-marginal of Q Y P X | Y . As was shown in Reference ([40], Theorem 2), the SDPI constant of ( P Y , P X | Y ) is also given by
η ( P Y , P Y | X ) = sup P W | Y : W Y X , I ( W ; Y ) > 0 I ( W ; X ) I ( W ; Y ) .
Theorem 6.
We have
γ s , k ( P X , P Y | X ) e 1 · s s 1 · ( s + 1 ) Γ s ( 1 s ) · 2 s · k · η ( P Y , P X | Y ) .
Proof. 
See Appendix A. □
Remark 3.
The bound for the BSC case (Theorem 3) is indeed a special case of Theorem 6 as the reverse BSC channel is also a BSC with uniform input and the same crossover probability. For BSCs, it is well known that the SDPI constant is ( 1 2 δ ) 2 ([38], Theorem 9).
Next, we consider in more detail the case where the observation channel is a BEC. We restrict the discussion to the case of a single bit of help, k = 1 .

4.1. Binary Erasure Channel

Suppose that Y n { 0 , 1 , e } n is obtained from X n by erasing each bit independently with probability ϵ [ 0 , 1 ] . As before, Bob observes the channel output Y n and can send one bit f : { 0 , 1 , e } n { 0 , 1 } to Alice, who wishes to guess X n . With a slight abuse of notation, the guessing ratio in Equation (5) will be denoted by γ s ( ϵ ) .
To compute the lower bound of Theorem 6, we need to find the SDPI constant associated with the reverse channel, which is easily verified to be
P X | Y = y ( x ) = 𝟙 ( x = y ) , y = 0 or y = 1 Ber ( 1 / 2 ) , y = e ,
with an input distribution P Y = ( 1 ϵ 2 , ϵ , 1 ϵ 2 ) . Letting Q Y ( y ) = q y for y { 0 , 1 , e } yields Q X ( x ) = q x + q e 2 for x { 0 , 1 } . The computation of η ( P Y , P X | Y ) is now a simple three-dimensional constrained optimization problem. We plotted the resulting lower bound for s = 1 in Figure 4.
Let us now turn to upper bounds and focus for simplicity on the average guessing time, i.e., the guessing-moment for s = 1 . To begin, let S represent the set of indices of the symbols that were not erased, i.e., i S if and only if Y i e . Any function f : { 0 , 1 , e } n { 0 , 1 } is then uniquely associated with a set of Boolean functions { f S } S [ n ] , where f S : { 0 , 1 } | S | { 0 , 1 } designates the operation of the function when S is the set of non-erased symbols. We also let Pr ( S ) = ( 1 ϵ ) | S | · ϵ | S c | be the probability that the non-erased symbols have index set S. Then, the joint probability distribution is given by
Pr ( X n = x n , f ( Y n ) = 1 ) = Pr ( X n = x n ) · Pr ( f ( Y n ) = 1 X n = x n )
= 2 n · S [ n ] Pr ( S ) · Pr ( f ( Y n ) = 1 X n = x n , S )
= 2 n · S [ n ] Pr ( S ) · f S ( x n ) ,
and, similarly,
Pr ( X n = x n , f ( Y n ) = 0 ) = 2 n · S [ n ] Pr ( S ) · ( 1 f S ( x n ) )
= 2 n 2 n · S [ n ] Pr ( S ) · f S ( x n ) .
In accordance with Proposition 1, the optimal guessing order given that f ( Y n ) = 0 is reversed to the optimal guessing order when f ( Y n ) = 1 . It is also apparent that the posterior probability is determined by a mixture of 2 n different Boolean functions { f S } S [ n ] . This may be contrasted with the BSC case, in which the posterior is determined by a single Boolean function (though with noisy input).
A seemingly natural choice is a greedy dictator function, for which f ( Y n ) sends the first non-erased bit. Concretely, letting
k ( y n ) : = n + 1 , y n = e n min i : y i e , otherwise ,
the greedy dictator function is defined by
G-Dict ( y n ) : = Ber ( 1 / 2 ) , y n = e n y k ( y n ) , otherwise ,
where Ber ( α ) is a Bernoulli r.v. of success probability α . From an analysis of the posterior probability, it is evident that, conditioned on f ( Y n ) = 0 , an optimal guessing order must satisfy that x n is guessed before z n whenever
i = 1 n ϵ i 1 · x i i = 1 n ϵ i 1 · z i ,
(see Appendix A for a proof of Equation (100)). This rule can be loosely thought of as comparing the “base 1 / ϵ expansion” of x n and z n . Furthermore, when ϵ is close to 1, then the optimal guessing order tends toward a minimum Hamming weight rule (or maximum Hamming weight in case f = 1 ).
The greedy dictator function is “locally optimal” when ϵ [ 0 , 1 / 2 ] , in the following sense:
Proposition 3.
If ϵ [ 0 , 1 / 2 ] , then an optimal guessing order conditioning on G-Dict ( Y n ) = 0 (resp. G-Dict ( Y n ) = 1 ) is lexicographic (reverse lexicographic). Also, given lexicographic (resp. reverse lexicographic) order when the received bit is 0 (resp. 1), the optimal function f is a greedy dictator.
Proof. 
See Appendix A. □
The guessing ratio of the greedy dictator function can be evaluated for s = 1 , and the analysis leads to the following upper bound:
Theorem 7.
For s = 1 , the guessing ratio is upper bounded as
γ 1 ( ϵ ) 1 2 ϵ ,
and the r.h.s. is achieved with equality by the greedy dictator function in Equation (99) for ϵ [ 0 , 1 / 2 ] .
Proof. 
See Appendix A. □
The upper bound of Theorem 7 is plotted in Figure 4. Based on Proposition 3 and numerical computations for moderate values of n, we conjecture:
Conjecture. 1.
Greedy dictator functions attain γ s ( ϵ ) for the BEC.
Supporting evidence for this conjecture include the local optimality property stated in Proposition 3 (although there are other locally optimal choices) as well as the following heuristic argument: Intuitively, Bob should reveal as much as possible regarding the bits he has seen and as little as possible regarding the erasure pattern. So, it seems reasonable to find a smallest possible set of balanced functions from which to choose all the functions f S , so that they coincide as much as possible. Greedy dictator is a greedy solution to this problem: it uses the function x 1 for half of the erasure patterns, which is the maximum possible. Then, it uses the function x 2 for half of the remaining patterns, and so on. Indeed, we were not able to find a better function than G-Dict for small values of n.
However, applying standard techniques in attempt to prove Conjecture 1 has not been fruitful. One possible technique is induction. For example, assume that the optimal functions for dimension n 1 are f S ( n 1 ) . Then, it might be perceived that there exists a bit, say x 1 , such that the optimal functions for dimension n satisfy f S ( n ) = f S ( n 1 ) if x 1 is erased; in that case, it remains only to determine f S ( n ) when x 1 is not erased. However, observing Equation (95), it is apparent that the optimal choice of f S ( n ) should satisfy two contradicting goals—on the one hand, to match the order induced by
S [ n ] : 1 S Pr ( S ) · f S ( x n )
and, on the other hand, to minimize the average guessing time of
S [ n ] : 1 S Pr ( S ) · f S ( x n ) .
It is easy to see that taking a greedy approach toward satisfying the second goal would result in f S ( n ) ( x n ) = x 1 if 1 S and performing the recursion steps would indeed lead to a greedy dictator function. Interestingly, taking a greedy approach toward satisfying the first goal would also lead to a greedy dictator function, but one which operates on a cyclic permutation of the inputs (specifically, Equation (99) applied to ( y 2 n , y 1 ) ). Nonetheless, it is not clear that choosing { f S ( n ) } S : 1 S with some loss in the average guessing time induced by Equation (103) could not lead to a gain in the second goal (matching the order of Equation (102)), which outweighs that loss.
Another possible technique is majorization. It is known that, if one probability distribution majorizes another, then all the nonnegative guessing-moments of the first are no greater than the corresponding moments of the second ([29], Proposition 1). (The proof in Reference [29] is only for s = 1 , but it is easily extended to the general s > 0 case.) Hence, one approach toward identifying the optimal function could be to try and find a function in which induced posterior distributions majorize the corresponding posteriors that induces by any other functions with the same bias (it is of course not clear that such a function even exists). This approach unfortunately fails for the greedy dictator. For example, the posterior distributions induced by setting f S to be majority functions are not always majorized by those induces by the greedy dictator (although they seem to be “almost” majorized) even though the average guessing time of greedy dictator is lower (this happens, e.g., for n = 5 and ϵ = 0.4 ). In fact, the guessing moments for greedy dictator seem to be better than these of majority irrespective of the value of s.

Author Contributions

Conceptualization, N.W. and O.S.; Investigation, N.W. and O.S.; Methodology, N.W. and O.S.; Writing—original draft, N.W. and O.S.; Writing—review and editing, N.W. and O.S. Both authors contributed equally to the research work and to the writing process of the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by an ERC grant no. 639573. The research of N. Weinberger was partially supported by the MIT-Technion fellowship and the Viterbi scholarship, Technion.

Acknowledgments

We are very grateful to Amos Lapidoth and Robert Graczyk for discussing their recent work on guessing with a helper [19,20] during the second author’s visit to ETH, which provided the impetus for this work. We also thank the anonymous reviewer for helping us clarify the connection between the guessing moments and large deviation principle of the normalized logarithm of the guessing time.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
BECbinary erasure channel
BSCbinary symmetric channel
i.i.d.independent and identically distributed
r.h.s.right-hand side
r.v.random variable
SDPIstrong data-processing inequality
w.l.o.g.without loss of generality

Appendix A. Miscellaneous Proofs

Proof of Proposition 1.
The claim that random functions do not improve beyond deterministic ones follows directly from that property that conditioning reduces guessing-moment ([1], Corollary 1). Monotonicity follows from the fact that Bob can always simulate a noisier channel. Now, if δ = 1 / 2 , then X n and Y n are independent and G s ( X n f ( Y n ) ) = G s ( X n ) 2 s n s + 1 for any f (Lemma 1). For δ = 0 , let
γ s , k ( n ) ( δ ) : = min f : { 0 , 1 } n { 0 , 1 } k G s ( X n f ( Y n ) ) G s ( X n ) ,
and let { f n , k * } n = 1 be a sequence of functions such that f n , k * achieves γ s , k ( n ) ( δ ) . We show that f n , k * must satisfy Pr [ f ( Y n ) = b k ] = 2 k for all b k { 0 , 1 } k . If we denote B k = f n , k * ( Y n ) , then this is equivalent to showing that Pr [ B l = 1 B 1 = b 1 , , B l 1 = b l 1 , B l = b l , B k = b k ] = 1 / 2 for all l [ k ] and ( b 1 , , b l 1 , b l , , b k ) { 0 , 1 } k 1 . Assume towards contradiction that the optimal function does not satisfy this property for, say, l = k . Let us denote Pr ( f n , k * ( Y n ) = b k ) : = q ( b k ) and assume w.l.o.g. that q ( b k 1 , 0 ) > q ( b k 1 , 1 ) for all b k 1 { 0 , 1 } k 1 (for notational simplicity). Further, let q ¯ ( b k 1 ) : = 1 2 [ q ( b k 1 , 0 ) + q ( b k 1 , 1 ) ] . Then,
G s ( X n f n , k * ( Y n ) ) = b k 1 { 0 , 1 } k 1 q ( b k 1 , 0 ) · G s ( X n f n , k * ( Y n ) = ( b k 1 , 0 ) ) + q ( b k 1 , 1 ) · G s ( X n f n , k * ( Y n ) = ( b k 1 , 1 ) ) = b k 1 { 0 , 1 } k 1 q ( b k 1 , 0 ) · K s q ( b k 1 , 0 ) · 2 n + q ( b k 1 , 1 ) · K s q ( b k 1 , 1 ) · 2 n = 2 n b k 1 { 0 , 1 } k 1 i = 1 q ( b k 1 , 0 ) · 2 n i s + i = 1 q ( b k 1 , 1 ) · 2 n i s = 2 n b k 1 { 0 , 1 } k 1 i = 1 q ¯ ( b k 1 ) · 2 n i s + i = q ¯ ( b k 1 ) + 1 q ( b k 1 , 0 ) · 2 n i s + i = 1 q ¯ ( b k 1 ) · 2 n i s i = q ( b k 1 , 1 ) q ¯ ( b k 1 ) · 2 n i s > 2 ( n 1 ) b k 1 { 0 , 1 } k 1 i = 1 q ¯ ( b k 1 ) · 2 n i s .
As equality can be achieved if we modify f n , k * to satisfy q ( b k 1 , 0 ) = q ( b k 1 , 1 ) for all b k 1 { 0 , 1 } k 1 , this contradicts the assumed optimality of f n , k * . The minimal G s ( X n f ( Y n ) ) is thus obtained by any function for which f ( Y n ) { 0 , 1 } k is a uniform Bernoulli vector and equals to K s ( 2 n k ) 2 s ( n k ) s + 1 (Lemma 1).
To prove that the limit in Equation (5) exists, we note that
G s ( X n + 1 ) = 2 ( n + 1 ) i = 1 2 n + 1 i s
= 2 ( n + 1 ) i = 1 2 n ( 2 i 1 ) s + ( 2 i ) s
2 s · 2 n i = 1 2 n ( i 1 ) s
= n · 2 s · 2 n i = 1 2 n i s
= n · 2 s · G s ( X n ) ,
where
n : = i = 1 2 n ( i 1 ) s i = 1 2 n i s .
As before, let { f n , k * } n = 1 be a sequence of functions such that f n , k * achieves γ s , k ( n ) ( δ ) . Denote the order induced by the posterior Pr ( X n = x n f n , k * ( Y n ) = b k ) as ORD b k , n , n , b k { 0 , 1 } k and the order induced by Pr ( X n + 1 = x n + 1 f n * ( Y n ) = b k ) as ORD b k , n , n + 1 . As before (when breaking ties arbitrarily)
ORD b k , n , n + 1 ( x n , 0 ) = 2 ORD b k , n , n ( x n )
and
ORD b k , n , n + 1 ( x n , 1 ) = 2 ORD b k , n , n ( x n ) 1 2 ORD b k , n , n ( x n ) .
Thus,
G s ( X n + 1 f n + 1 , k * ( Y n + 1 ) )
G s ( X n + 1 f n , k * ( Y n ) )
= b k { 0 , 1 } k Pr ( f n , k * ( Y n + 1 ) = b k ) · G s ( X n + 1 f n , k * ( Y n ) = b k ) b k { 0 , 1 } k x n + 1 Pr ( X n + 1 = x n + 1 , f n , k * ( Y n ) = b k ) · ORD b k , n , n + 1 s ( x n + 1 )
2 s · b k { 0 , 1 } k x n Pr ( X n = x n , f n , k * ( Y n ) = b k ) · ORD b k , n , n s ( x n )
= 2 s · G s ( X n f n , k * ( Y n ) ) .
Hence,
γ s , k ( n + 1 ) ( δ ) n 1 · γ s , k ( n ) ( δ ) .
To continue, we further analyze n . The summation in the numerator of Equation (A7) may be started from from i = 2 , and so Equations (A31) and (A33) (proof of Lemma 1 below) imply that
1 n
1 s + 1 · 2 n ( s + 1 ) 1 2 n 1 1 s + 1 · ( 2 n + 1 ) s + 1 1 2 n
2 n ( s + 1 ) 1 ( 2 n + 1 ) s + 1
= 2 n 2 n + 1 s + 1 1 2 n ( s + 1 )
= 1 + 1 2 n ( s + 1 ) 1 2 n ( s + 1 )
= 1 ( s + 1 ) 2 n + O 1 2 2 n 1 2 n ( s + 1 )
= 1 ( s + 1 ) 2 n + O 1 2 n · min { 1 + s , 2 } .
Thus, there exists c , C > 0 such that
log n = 1 n 1 = n = 1 log n 1
n = 1 log 1 c 2 n
C + n = 1 c 2 n + O 1 2 2 n
< ,
and consequently,
d n : = j = n j 1 1
as n . Hence, Equation (A14) implies that
e n : = d n · γ s ( n ) ( δ )
is a non-increasing sequence which is bounded below by 0 and, thus, has a limit. Since d n 1 as n , γ s ( n ) ( δ ) also has a limit.
We finally show the reverse ordering property for k = 1 . The guessing order given that f ( Y n ) = 1 is determined by ordering
Pr ( X n = x n f ( Y n ) = 1 ) = Pr ( X n = x n ) · Pr ( f ( Y n ) = 1 X n = x n ) Pr ( f ( Y n ) = 1 ) ,
or equivalently, by ordering Pr ( f ( Y n ) = 1 X n = x n ) . It then follows that the order, given that f ( Y n ) = 0 , is reversed compared to the order given that f ( Y n ) = 1 since
Pr ( f ( Y n ) = 0 X n = x n ) + Pr ( f ( Y n ) = 1 X n = x n ) = 1 .
Proof of Lemma 1.
The monotonicity of i s and standard bounds on sums using integrals lead to the bounds
K s ( a , b ) a + 1 b + 1 t s b a · d t
= 1 s + 1 · ( b + 1 ) s + 1 ( a + 1 ) s + 1 b a
and
K s ( a , b ) a b t s b a · d t
= 1 s + 1 · b s + 1 a s + 1 b a .
The ratio between the upper and lower bound is
κ s ( a , b ) : = ( b + 1 ) s + 1 ( a + 1 ) s + 1 b s + 1 a s + 1
which satisfies κ s ( a n , b n ) 1 given the premise of the lemma. □
Proof of Equation (61).
Denote by f n * a function which achieves the minimal guessing ratio in Equation (5). Then, it holds that G s ( X n f * ( Y n ) = b k ) is a monotonic non-increasing function of n. To see this, suppose that f n + 1 * is an optimal function for n + 1 . This function f n + 1 * can be used for guessing X n on the basis of k bit of help computed from Y n as follows: Given Y n , the helper randomly generates Y n + 1 P Y | X ( · | 0 ) , computes b k = f n + 1 * ( Y n + 1 ) , and send these bits to the guesser. The guesser of X n then uses the bits b k to guess X n , and the resulting conditional guessing moment is G s ( X n + 1 f n + 1 * ( Y n + 1 ) = b k , X n + 1 = 0 ) , which is less than G s ( X n + 1 f n + 1 * ( Y n + 1 ) = b k ) since conditioning reduces guessing moments. Thus, the optimal function f n * can only achieve lower guessing moments, which implies the desired monotonicity property. For brevity, we henceforth simply write the optimal function as f (with dimension and optimality being implicit).
Define the set
B k : = { b k { 0 , 1 } k : sup n G s ( X n f ( Y n ) = b k ) = } ,
to wit, the set of k-tuples such that the conditional guessing moment grows without bound when conditioned on that k-tuple. By the law of total expectation
G s ( X n f ( Y n ) ) = b k B k Pr ( f ( Y n ) = b k ) · G s ( X n f ( Y n ) = b k )
= + b k { 0 , 1 } k \ B k Pr ( f ( Y n ) = b k ) · G s ( X n f ( Y n ) = b k )
= : G n ( 1 ) + G n ( 2 ) .
So, since G s ( X n f ( Y n ) ) grows without bound as a function of n, it must hold that B k is not empty and that there exists n such that G s ( X n f ( Y n ) ) = n G n ( 1 ) , where n 1 as n . Let η > 0 be given. The monotonicity property previously established and Equation (60) imply that there exists n 0 ( η ) such that for all n n 0 ( η ) both
G s ( X n f ( Y n ) = b k ) ( 1 η ) · Ψ s · 2 s H ( X n f ( Y n ) = b k )
and
Ψ s · 2 s H ( X n f ( Y n ) = b k ) ( 1 η ) · G s ( X n f ( Y n ) = b k )
hold for any b k B k . Thus, also
G s ( X n f ( Y n ) ) n ( 1 η ) b k B k Pr ( f ( Y n ) = b k ) · Ψ s · 2 s H ( X n f ( Y n ) = b k )
and
b k B k Pr ( f ( Y n ) = b k ) · Ψ s · 2 s H ( X n f ( Y n ) = b k ) b k B k Pr ( f ( Y n ) = b k ) ( 1 η ) · G s ( X n f ( Y n ) = b k )
hold, and the last equation implies that the term on its left-hand side is unbounded. Moreover, Equation (60) and the sentence that follows it both imply that, if G s ( X n f ( Y n ) = b k ) is bounded, then H ( X n f ( Y n ) = b k ) is bounded too. Thus, there exists k n which satisfies k n 1 as n such that
b k B k Pr ( f ( Y n ) = b k ) · Ψ s · 2 s H ( X n f ( Y n ) = b k ) = k n · b k { 0 , 1 } n Pr ( f ( Y n ) = b k ) · Ψ s · 2 s H ( X n f ( Y n ) = b k ) .
Combining Equation (A40) with the last equation and noting that η > 0 is arbitrary completes the proof. □
Proof of Theorem 6.
The proof follows the same lines as the proof of Theorem 3 up to Equation (62), yielding
G s ( X n f ( Y n ) ) k n · Ψ s · 2 s n I ( X n ; f ( Y n ) ) .
Now, let W ( n ) be such that X n Y n W ( n ) forms a Markov chain. Then,
sup f : Y n { 0 , 1 } I ( X n ; f ( Y n ) ) I ( Y n ; f ( Y n ) ) sup P W ( n ) | Y n I ( X n ; W ( n ) ) I ( Y n ; W ( n ) )
= η ( P Y n , P X n | Y n )
= η ( P Y , P X | Y ) ,
where Equation (A46) follows since the SDPI constant tensorizes (see Reference [40] for an argument obtained by relating the SDPI constant to the hypercontractivity parameter or its extended version, Reference ([40], p. 5), for a direct proof). Thus, for all f,
I ( X n ; f ( Y n ) ) η ( P Y , P X | Y ) · I ( Y n ; f ( Y n ) )
η ( P Y , P X | Y ) · H ( f ( Y n ) )
η ( P Y , P X | Y ) · k .
Inserting Equation (A49) into Equation (A43) yields
G s ( X n f ( Y n ) ) k n · Ψ s · 2 s n k · η ( P Y , P X | Y ) ,
and substituting this in the definition of the guessing ratio of Equation (5) completes the proof. □
Proof of Equation (100).
Let us evaluate the posterior probability conditioned on G-Dict ( Y n ) = 0 . Since G-Dict is balanced, Bayes law implies that
Pr ( X n = x n G-Dict ( Y n ) = 0 )
= 2 ( n 1 ) · Pr ( G-Dict ( Y n ) = 0 X n = x n )
= 2 ( n 1 ) · i = 1 n + 1 Pr k ( y n ) = i X n = x n · Pr ( G-Dict ( Y n ) = 0 X n = x n , k ( y n ) = i )
= 2 ( n 1 ) · i = 1 n ( 1 ϵ ) ϵ i 1 · 𝟙 x i = 0 + 1 2 ϵ n .
This immediately leads to the guessing rule in Equation (100). From Proposition 1, the guessing rule for G-Dict ( Y n ) = 1 is on reverse order. □
Proof of Proposition 3.
We denote the lexicographic order by ORD lex . Assume that G-Dict ( Y n ) = 0 and that ORD lex ( x n ) ORD lex ( z n ) . Then, there exists j [ n ] such that x j 1 = z j 1 (where x 0 is the empty string) and x j = 0 < z j = 1 . Then,
Pr ( X n = x n G-Dict ( Y n ) = 0 ) Pr ( X n = z n G-Dict ( Y n ) = 0 )
= ϵ j 1 + i = j + 1 n ϵ i 1 · z i x i
ϵ j 1 i = j + 1 n ϵ i 1
= ϵ j 1 1 ϵ 1 2 ϵ + ϵ n j + 1
0 .
This proves the first statement of the proposition. Now, let ORD 0 ( ORD 1 ) be the guessing order given that the received bit is 0 (resp. 1), and let { f S } be the Boolean functions (which are not necessarily optimal). Then, from Equations (97) and (95)
G 1 ( X n f ( Y n ) )
= x n Pr ( X n = x n , f ( Y n ) = 0 ) · ORD 0 ( x n ) + Pr ( X n = x n , f ( Y n ) = 1 ) · ORD 1 ( x n )
= 2 n · S [ n ] Pr ( S ) x n 1 f S ( x n ) · ORD 0 ( x n ) + f S ( x n ) · ORD 1 ( x n )
= 2 n · S [ n ] Pr ( S ) x S 1 f S ( x n ) · PORD 0 ( x S | | S ) + f S ( x n ) · PORD 1 ( x S | | S )
2 n · S [ n ] Pr ( S ) x n min PORD 0 ( x S | | S ) , PORD 1 ( x S | | S ) ,
where for b { 0 , 1 } , the projected orders are defined as
PORD b ( x S | | S ) : = x ( S c ) ORD b ( x n ) .
It is easy to verify that, if ORD 0 ( ORD 1 ) is the lexicographic (resp. revered lexicographic) order, then the greedy dictator achieves Equation (A61) with equality due to the following simple property: If ORD lex ( x n ) < ORD lex ( z n ) , then
x ( S c ) ORD lex ( x n ) x ( S c ) ORD lex ( z n )
for all S [ n ] . This can be proved by induction over n. For n = 1 , the claim is easily asserted. Suppose it holds for n 1 , let us verify it for n. If 1 S , then whenever ORD lex ( x n ) < ORD lex ( z n )
x ( S c ) ORD lex ( x n ) = x ( S c ) ORD lex ( x 1 , x 2 n )
= x 1 · 2 n 1 + x ( S c ) ORD lex ( x 2 n )
z 1 · 2 n 1 + z ( S c ) ORD lex ( z 2 n )
= z ( S c ) ORD lex ( z n )
where the inequality follows from the induction assumption and since x 1 z 1 . If 1 S then, similarly,
x ( S c ) ORD lex ( x n ) = x ( S c \ { 1 } ) 2 n 1 + 2 · ORD lex ( x 2 n )
z ( S c \ { 1 } ) 2 n 1 + 2 · ORD lex ( z 2 n )
= z ( S c ) ORD lex ( z n ) .
Proof of Theorem 7.
We denote the lexicographic order by ORD lex . Then,
G 1 ( X n G-Dict ( Y n ) ) = G 1 ( X n G-Dict ( Y n ) = 0 )
x n Pr ( X n = x n G-Dict ( Y n ) = 0 ) · ORD lex ( x n )
= 2 ( n 1 ) · x n i = 1 n ( 1 ϵ ) ϵ i 1 · 𝟙 x i = 0 · ORD lex ( x n ) + ϵ n K 1 ( 2 n )
= 2 ( n 1 ) · ( 1 ϵ ) i = 1 n ϵ i 1 · x n 𝟙 x i = 0 · ORD lex ( x n ) + ϵ n K 1 ( 2 n )
= ( 1 ϵ ) J n + ϵ n K 1 ( 2 n ) ,
where J 1 : = 1 2 and for n 2
J n : = 2 ( n 1 ) · i = 1 n ϵ i 1 · x n 𝟙 x i = 0 · ORD lex ( x n )
= 2 ( n 1 ) x n 𝟙 x i = 0 · ORD lex ( x n ) + 2 ( n 1 ) i = 2 n ϵ i 1 · x n 𝟙 x i = 0 · ORD lex ( x n ) = K 1 ( 2 n 1 )
= + 2 ( n 1 ) i = 2 n ϵ i 1 · x 2 n 𝟙 x 1 = 0 , x i = 0 · ORD lex ( 0 , x 2 n ) + 𝟙 x 1 = 1 , x i = 0 · ORD lex ( 1 , x 2 n ) = K 1 ( 2 n 1 ) + 2 ( n 1 ) ϵ i = 1 n 1 ϵ i 1 · x n 1 𝟙 x i = 0 ORD lex ( x n 1 )
= + 2 ( n 1 ) ϵ i = 1 n 1 ϵ i 1 · x n 1 𝟙 x i = 0 2 n 1 + ORD lex ( x n 1 )
= K 1 ( 2 n 1 ) + ϵ J n 1 + i = 1 n 1 ϵ i · x n 1 𝟙 x i = 0
= K 1 ( 2 n 1 ) + ϵ J n 1 + 2 n 2 · ϵ ϵ n 1 ϵ .
So,
J n = K 1 ( 2 n 1 ) + ϵ K 1 ( 2 n 2 ) + ϵ J n 2 + 2 n 3 · ϵ ϵ n 1 1 ϵ + 2 n 2 · ϵ ϵ n 1 ϵ
= K 1 ( 2 n 1 ) + ϵ K 1 ( 2 n 2 ) + ϵ 2 J n 2 + 2 n 3 · ϵ 2 ϵ n 1 ϵ + 2 n 2 · ϵ ϵ n 1 ϵ
= i = 1 n ϵ i 1 K 1 ( 2 n i ) + 1 1 ϵ i = 1 n 2 i 2 · ( ϵ n i + 1 ϵ n ) .
Hence,
G 1 ( X n G-Dict ( Y n ) ) ( 1 ϵ ) i = 1 n ϵ i 1 K 1 ( 2 n i ) + i = 1 n 2 i 2 · ϵ n i + 1 ϵ n + ϵ n K 1 ( 2 n ) .
Noting that K 1 ( M ) = M + 1 2 , we get
G 1 ( X n G-Dict ( Y n ) )
2 n 1 ( 1 ϵ ) ϵ i = 1 n ϵ 2 i + ( 1 ϵ ) ( 1 ϵ n ) 2 ϵ + 1 4 i = 1 n 2 ϵ i · ϵ n + 1 1 2 2 n 1 ϵ n + 2 n 1 ϵ n + ϵ n 2
= 1 2 ϵ 2 n 1 + ϵ n 2 ( 1 ϵ ) + ( 1 ϵ ) ( 1 ϵ n ) 2 ϵ
2 n 1 2 ϵ .

References

  1. Arikan, E. An inequality on guessing and its application to sequential decoding. IEEE Trans. Inf. Theory 1996, 42, 99–105. [Google Scholar] [CrossRef] [Green Version]
  2. Arikan, E.; Merhav, N. Guessing subject to distortion. IEEE Trans. Inf. Theory 1998, 44, 1041–1056. [Google Scholar] [CrossRef]
  3. Merhav, N.; Arikan, E. The Shannon cipher system with a guessing wiretapper. IEEE Trans. Inf. Theory 1999, 45, 1860–1866. [Google Scholar] [CrossRef] [Green Version]
  4. Hayashi, Y.; Yamamoto, H. Coding theorems for the Shannon cipher system with a guessing wiretapper and correlated source outputs. IEEE Trans. Inf. Theory 2008, 54, 2808–2817. [Google Scholar] [CrossRef] [Green Version]
  5. Hanawal, M.K.; Sundaresan, R. The Shannon cipher system with a guessing wiretapper: General sources. IEEE Trans. Inf. Theory 2011, 57, 2503–2516. [Google Scholar] [CrossRef] [Green Version]
  6. Christiansen, M.M.; Duffy, K.R.; du Pin Calmon, F.; Médard, M. Multi-user guesswork and brute force security. IEEE Trans. Inf. Theory 2015, 61, 6876–6886. [Google Scholar] [CrossRef] [Green Version]
  7. Yona, Y.; Diggavi, S. The effect of bias on the guesswork of hash functions. In Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017; pp. 2248–2252. [Google Scholar]
  8. Massey, J.L. Guessing and entropy. In Proceedings of the 1994 IEEE International Symposium on Information Theory, Trondheim, Norway, 27 June–1 July 1994; p. 204. [Google Scholar]
  9. Arikan, E. Large deviations of probability rank. In Proceedings of the 2000 IEEE International Symposium on Information Theory, Washington, DC, USA, 25–30 June 2000; p. 27. [Google Scholar]
  10. Christiansen, M.M.; Duffy, K.R. Guesswork, large deviations, and Shannon entropy. IEEE Trans. Inf. Theory 2012, 59, 796–802. [Google Scholar] [CrossRef] [Green Version]
  11. Pfister, C.E.; Sullivan, W.G. Rényi entropy, guesswork moments, and large deviations. IEEE Trans. Inf. Theory 2004, 50, 2794–2800. [Google Scholar] [CrossRef]
  12. Hanawal, M.K.; Sundaresan, R. Guessing revisited: A large deviations approach. IEEE Trans. Inf. Theory 2011, 57, 70–78. [Google Scholar] [CrossRef] [Green Version]
  13. Sundaresan, R. Guessing under source uncertainty. IEEE Trans. Inf. Theory 2007, 53, 269–287. [Google Scholar] [CrossRef] [Green Version]
  14. Serdar, B. Comments on “An inequality on guessing and its application to sequential decoding”. IEEE Trans. Inf. Theory 1997, 43, 2062–2063. [Google Scholar]
  15. Sason, I.; Verdú, S. Improved bounds on lossless source coding and guessing moments via Rényi measures. IEEE Trans. Inf. Theory 2018, 64, 4323–4346. [Google Scholar] [CrossRef] [Green Version]
  16. Sason, I. Tight bounds on the Rényi entropy via majorization with applications to guessing and compression. Entropy 2018, 20, 896. [Google Scholar] [CrossRef] [Green Version]
  17. Wyner, A. A theorem on the entropy of certain binary sequences and applications—II. IEEE Trans. Inf. Theory 1973, 19, 772–777. [Google Scholar] [CrossRef]
  18. Ahlswede, R.; Körner, J. Source coding with side information and a converse for degraded broadcast channels. IEEE Trans. Inf. Theory 1975, 21, 629–637. [Google Scholar] [CrossRef]
  19. Graczyk, R.; Lapidoth, A. Variations on the guessing problem. In Proceedings of the 2018 IEEE International Symposium on Information Theory, Vail, CO, USA, 17–22 June 2018; pp. 231–235. [Google Scholar]
  20. Graczyk, R. Guessing with a Helper. Master’s Thesis, ETH Zurich, Zürich, Switzerland, 2017. [Google Scholar]
  21. O’Donnell, R. Analysis of Boolean Functions; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
  22. Courtade, T.A.; Kumar, G.R. Which Boolean functions maximize mutual information on noisy inputs? IEEE Trans. Inf. Theory 2014, 60, 4515–4525. [Google Scholar] [CrossRef]
  23. Ordentlich, O.; Shayevitz, O.; Weinstein, O. An improved upper bound for the most informative Boolean function conjecture. In Proceedings of the 2016 IEEE International Symposium on Information Theory, Barcelona, Spain, 10–15 July 2016; pp. 500–504. [Google Scholar]
  24. Samorodnitsky, A. On the entropy of a noisy function. IEEE Trans. Inf. Theory 2016, 62, 5446–5464. [Google Scholar] [CrossRef]
  25. Kindler, G.; O’Donnell, R.; Witmer, D. Continuous Analogues of the most Informative Function Problem. Available online: http://arxiv.org/pdf/1506.03167.pdf (accessed on 26 December 2015).
  26. Li, J.; Médard, M. Boolean functions: Noise stability, non-interactive correlation, and mutual information. In Proceedings of the 2018 IEEE International Symposium on Information Theory, Vail, CO, USA, 17–22 June 2018; pp. 266–270. [Google Scholar]
  27. Chandar, V.; Tchamkerten, A. Most informative quantization functions. Presented at the 2014 Information Theory and Applications Workshop, San Diego, CA, USA, 9–14 February 2014. [Google Scholar]
  28. Weinberger, N.; Shayevitz, O. On the optimal Boolean function for prediction under quadratic Loss. IEEE Trans. Inf. Theory 2017, 63, 4202–4217. [Google Scholar] [CrossRef]
  29. Burin, A.; Shayevitz, O. Reducing guesswork via an unreliable oracle. IEEE Trans. Inf. Theory 2018, 64, 6941–6953. [Google Scholar] [CrossRef] [Green Version]
  30. Ardimanov, N.; Shayevitz, O.; Tamo, I. Minimum Guesswork with an Unreliable Oracle. In Proceedings of the 2018 IEEE International Symposium Information Theory, Vail, CO, USA, 17–22 June 2018; pp. 986–990, Extended Version. Available online: http://arxiv.org/pdf/1811.08528.pdf (accessed on 26 December 2018).
  31. Feller, W. An Introduction to Probability Theory and Its Applications; John Wiley & Sons: New York, NY, USA, 1971; Volume 2. [Google Scholar]
  32. Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley-Interscience: Hoboken, NJ, USA, 2006. [Google Scholar]
  33. Wainwright, M.J.; Jordan, M.I. Graphical models, exponential families, and variational inference. Found. Trends® Mach. Learn. 2008, 1, 1–305. [Google Scholar] [CrossRef] [Green Version]
  34. Boyd, S.P.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
  35. Nadarajah, S. A generalized normal distribution. J. Appl. Stat. 2005, 32, 685–694. [Google Scholar] [CrossRef]
  36. Wyner, A.; Ziv, J. A theorem on the entropy of certain binary sequences and applications—I. IEEE Trans. Inf. Theory 1973, 19, 769–772. [Google Scholar] [CrossRef]
  37. Erkip, E.; Cover, T.M. The efficiency of investment information. IEEE Trans. Inf. Theory 1998, 44, 1026–1040. [Google Scholar] [CrossRef]
  38. Ahlswede, R.; Gács, P. Spreading of sets in product spaces and hypercontraction of the Markov operator. Ann. Probab. 1976, 925–939. [Google Scholar] [CrossRef]
  39. Raginsky, M. Strong data processing inequalities and Φ–Sobolev inequalities for discrete channels. IEEE Trans. Inf. Theory 2016, 62, 3355–3389. [Google Scholar] [CrossRef] [Green Version]
  40. Anantharam, V.; Gohari, A.; Kamath, S.; Nair, C. On hypercontractivity and a data processing inequality. In Proceedings of the 2014 IEEE International Symposium on Information Theory, Honolulu, HI, USA, 29 June–4 July 2014; pp. 3022–3026. [Google Scholar]
Figure 1. Bounds on γ s ( δ ) for s = 1 (left) and s = 5 (right) as a function of δ [ 0 , 1 / 2 ] .
Figure 1. Bounds on γ s ( δ ) for s = 1 (left) and s = 5 (right) as a function of δ [ 0 , 1 / 2 ] .
Entropy 22 00039 g001
Figure 2. Bounds on γ s ( δ ) for δ = 0.1 (left) and δ = 0.4 (right) as a function of s [ 1 , 10 ] .
Figure 2. Bounds on γ s ( δ ) for δ = 0.1 (left) and δ = 0.4 (right) as a function of s [ 1 , 10 ] .
Entropy 22 00039 g002
Figure 3. Bounds on γ s , k ( δ ) for δ = 0.1 and s = 1 as a function of k.
Figure 3. Bounds on γ s , k ( δ ) for δ = 0.1 and s = 1 as a function of k.
Entropy 22 00039 g003
Figure 4. Bounds on γ s ( δ ) for s = 1 as a function of ϵ [ 0 , 1 ] .
Figure 4. Bounds on γ s ( δ ) for s = 1 as a function of ϵ [ 0 , 1 ] .
Entropy 22 00039 g004

Share and Cite

MDPI and ACS Style

Weinberger, N.; Shayevitz, O. Guessing with a Bit of Help. Entropy 2020, 22, 39. https://doi.org/10.3390/e22010039

AMA Style

Weinberger N, Shayevitz O. Guessing with a Bit of Help. Entropy. 2020; 22(1):39. https://doi.org/10.3390/e22010039

Chicago/Turabian Style

Weinberger, Nir, and Ofer Shayevitz. 2020. "Guessing with a Bit of Help" Entropy 22, no. 1: 39. https://doi.org/10.3390/e22010039

APA Style

Weinberger, N., & Shayevitz, O. (2020). Guessing with a Bit of Help. Entropy, 22(1), 39. https://doi.org/10.3390/e22010039

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop