*Article* **Tsirelson's Bound Prohibits Communication through a Disconnected Channel**

**Avishy Carmi 1,2,\* and Daniel Moskovich 1,2,\***


Received: 4 December 2017; Accepted: 24 February 2018; Published: 27 February 2018

**Abstract:** Why does nature only allow nonlocal correlations up to Tsirelson's bound and not beyond? We construct a channel whose input is statistically independent of its output, but through which communication is nevertheless possible if and only if Tsirelson's bound is violated. This provides a statistical justification for Tsirelson's bound on nonlocal correlations in a bipartite setting.

**Keywords:** nonlocality; Bell inequality; Tsirelson's bound; no-signaling; information causality; Fisher information

#### **1. Introduction**

Some of the predictions made by quantum mechanics appear to be at odds with common sense. Yet quantum mechanics remains the most precisely tested and successful quantitative theory of nature. It is therefore believed that even if quantum mechanics is someday replaced, any successor will have to inherit at least some of its "preposterous" but highly predictive principles. Perhaps the most counter-intuitive quantum mechanical feature is *nonlocality* [1]: the correlations exhibited by remote parties may exceed those allowed by any local realistic model.

The mystery of nonlocality is not only why nature is as nonlocal as it is, but why nature is not *more* nonlocal than it is. There are alternative *Non-Signaling* theories which permit nonlocality beyond the quantum limit [2,3]; why doesn't nature choose one of these theories over quantum mechanics? In Section 1.1 we review several previously proposed explanations. This paper presents another explanation, from statistics.

In this paper we construct a protocol (a repeated oblivious transfer) which sends messages through a disconnected channel. We show that Alice can communicate nontrivial information to Bob via this protocol if and only if the maximal quantum mechanical violation of the Bell–CHSH inequality [1,4], *Tsirelson's bound* [5], is exceeded. We thus provide a statistical explanation of this bound that is independent of the mathematical formalism of quantum mechanics.

We briefly recall the setting for the Bell–CHSH experiment. Section 2 provides a more detailed account. A famous application of nonlocality is to construct an 1*-*2 *oblivious transfer protocol* between two distant agents (A)lice and (B)ob. Alice and Bob each hold a box. Alice's box might, for example, contain one half of a singlet state of spin–<sup>1</sup> <sup>2</sup> particles, with Bob's box containing the other half [1,4]. In addition, Alice possesses a pair of bits *x*<sup>0</sup> and *x*1, each of which is a zero or a one. Using boolean algebra and her boxes (the protocol will be described later), Alice encodes her pair of bits into a single bit *x*(1) which she sends across a classical channel to Bob. Bob wants to know the value either of *x*<sup>0</sup> or of *x*1, but Alice doesn't know which of these Bob wants to know. Bob uses the received bit *x*(1), his box, and some boolean algebra to construct an estimate *yi* for his desired bit *xi*. See Figure 2 later on.

What is the probability that Bob correctly estimates the bit he wishes to know? He has two possible sources of knowledge—the bit *x*(1) he received from Alice, and some mysterious "nonlocal" correlation between his box and Alice's. The strength of such a nonlocal coordination between two systems is captured by a parameter *c* ∈ [−1, 1] called the *Bell–CHSH correlator*. Bob's probability of guessing the value of Alice's bit correctly is (1 + |*c*|)/2. The *Bell–CHSH inequality* states that |*c*| ≤ 1/2 in a world governed by classical (non-quantum) mechanics [1,4]. *Nonlocality* is the state of affairs in which the Bell–CHSH inequality is violated. To the best of our knowledge, real world physics is nonlocal. Over the years, the violation of the Bell–CHSH inequality has been measured in increasingly accurate and loophole-free experiments, culminating in celebrated loophole-free verifications [6–8].

Thus, we know that |*c*| can exceed 1/2. How large can |*c*| be? Tsirelson's bound tells us that |*c*| cannot exceed 1/ <sup>√</sup><sup>2</sup> in a world described by quantum mechanics [5]. This quantum bound on nonlocality:

$$|c| \le \frac{1}{\sqrt{2}}\,,$$

has been tested experimentally, with the current state of the art being an experiment which has achieved a value of *c* which is only 0.00084 ± 0.00051 distant from Tsirelson's bound [9]. Such experimental evidence supports the contention that Tsirelson's bound indeed holds true in the real world. Tsirelson's result as presented in the original paper is a specifically quantum mechanical fact, following from the Hilbert-space mathematical formalism for quantum mechanics, for which there has been no good conceptual physical explanation. How fundamental is Tsirelson's bound? Must this inequality also hold for any future theory which might someday supercede quantum mechanics [10]? We are led to the following question: *Can we identify a plausible physical principle, independent of quantum mechanics (or independent of functional analysis), which is necessary and sufficient to guarantee that* <sup>|</sup>*c*<sup>|</sup> <sup>≤</sup> 1/√2*?*

#### *1.1. Existing Principles*

For the last two decades, people have searched for physical principles that bound nonlocality. It was initially expected that the physical principle of relativistic causality (no-signaling) itself restricts the strength of nonlocality [11–13]. But then it was discovered that no-signaling theories may exist for which |*c*| > 1/ <sup>√</sup>2. This led to the device-independent formalism of *No-Signaling (NS)–boxes* [2,14] (see also [3]). In particular, maximum violation of the Bell–CHSH inequality is achieved by *Popescu–Rohrlich (PR)–boxes* which are consistent with relativistic causality.

So relativistic causality doesn't limit nonlocality after all; Why then does nature not permit (1) to be violated (as far as we know)? Several suggestions have been made. Superquantum correlations lead to violations of the Heisenberg uncertainty principle [15,16], which is another seemingly purely quantum result. PR–boxes would allow distributed computation to be performed with only one bit of communication [17], which looks unlikely but doesn't violate any known physical law. Similarly, in stronger-than-quantum nonlocal theories some computations exceed reasonable performance limits [18]. The principle of *Information Causality* [19] shows that no sensible measure of mutual information exists between pairs of systems in superquantum nonlocal theories. Our approach is most directly comparable with Information Causality, with a conceptual difference being that we use variance of an efficient estimator, therefore Fisher information, whereas information causality uses mutual information (Shannon information). The relationship between our approach and theirs is the topic of Section 6. Finally, it was shown that superquantum nonlocality does not permit local (non-nonlocal) physics to emerge in the limit of infinitely many microscopic systems [20,21].

#### *1.2. Tsirelson's Bound from a Statistical No-Signaling Condition*

Here we show that Tsirelson's bound follows from the following principle applied to a certain limiting Bell–CHSH setting:

*Statistical No-Signaling*: It is impossible to communicate a nontrivial message through a channel whose output is independent of its input.

*Entropy* **2018**, *20*, 151

Our strategy is to construct a channel whose input is a Bernoulli random variable *X* of mean *θ* and whose output is another Bernoulli random variable *Y* (Section 3.2). The construction of our channel is not new— it is a reinterpretation of the well-known van Dam protocol [17]. Through the channel, Alice sends 2*<sup>n</sup>* samples <sup>A</sup> def = {*x*0, *x*1,..., *x*2*n*−1} from *X*, and at the other end Bob receives a set of values <sup>B</sup> def = {*y*0, *y*1,..., *ym*−1}.

We imagine *θ* ∈ [−1, 1] as encoding a message, perhaps in the digits of its binary expansion. Bob's task is to estimate *θ*. The following theorem states that he can do so if and only if Tsirelson's bound fails.

#### **Theorem 1.1.**

*1. The channel from X to Y we construct is described by the conditional probability p*(*Y* = *x* | *X* = *x*) = (1 + *cn*)/2*, where c is the Bell–CHSH correlator. Its output satisfies:*

$$p(\boldsymbol{\gamma} = 1 \mid \boldsymbol{\theta}) = \frac{1}{2} + \frac{c^n \cdot \boldsymbol{\theta}}{2} \dots$$

*In the n* → ∞ *limit it disconnects for p*(*Y* | *X*) = *p*(*Y*) *(i.e. we can arrange that c* < 1*). 2. The unbiased estimator:*

$$\hat{\theta} \stackrel{4\kappa}{=} \frac{1}{2^n c^n} \sum\_{i=0}^{2^n - 1} y\_i \quad \text{or} \quad \hat{\theta}$$

*for θ has variance:*

$$\text{Var}\left[\theta \mid \theta\right] = \lim\_{n \to \infty} \frac{1 - c^{2n}\theta^2}{\left(2c^2\right)^n} = \begin{cases} 0, & 2c^2 > 1 \text{ (signaling)}\\ 1, & 2c^2 = 1 \text{ (randomness)}\\ \infty, & 2c^2 < 1 \text{ (no-signaling)} \end{cases}$$

*3. The estimator* ˆ *θ is* efficient*, i.e. it has the minimal variance of any estimator of θ constructed from Bob's set of samples* <sup>B</sup> *for all n* <sup>∈</sup> <sup>N</sup>*.*

The theorem is visually summarized by Figure 1.

The theorem shows that failure of Tsirelson's bound leads to failure of the following consequence of Statistical No-Signaling—*Consequence of Statistical No-Signaling*—In the above notation, if *X* and *Y* are independent, then no estimator constructed from B has both mean *θ* and variance 0.

Section 5 shows that a violation of Uffink's inequality [22], a generalization of Tsirelson's bound, also leads to the failure of the same consequence of Statistical No-Signaling. Uffink's inequality is also known to be recovered by Information Causality [23].

Theorem 1.1 is formulated as an asymptotic construction, but in practice a finite number of samples suffices because for any experimental setup there exists a nonzero minimal possible environmental noise level > 0 . By Theorem 1.1, *p*(*Y* = 1 | *θ*) is physically indistinguishable from 1/2 when the absolute value of *<sup>c</sup>nθ*/2 is less than . Since <sup>|</sup>*θ*<sup>|</sup> <sup>≤</sup> 1, we need *<sup>n</sup>* <sup>≥</sup> ln 2/ ln *<sup>c</sup>* trials. As an example, for a photon pair where is greater than or equal to the reduced Planck constant *h*¯, we find that *n* ≥ 244 suffices to make *p*(*Y* = 1 | *θ*) physically indistinguishable from 1/2 when |*c*| ≤ 1/ <sup>√</sup>2. Thus, if we can still distinguish *p*(*Y* = 1 | *θ*) from 1/2 for *n* = 244, we know that Tsirelson's bound has been violated, and if not then it holds.

**Figure 1.** The Statistical No-Signaling condition. The van Dam protocol defines an underlying channel which becomes disconnected in the *n* → ∞ limit. The upper illustration shows this channel and the Fisher information (one over the variance) of the maximum likelihood estimators for *θ* at its input and at its output. When the number of nonlocal resources increases unboundedly, the two ends of the channel become disconnected as illustrated by a vanishing bottleneck in the lower illustration. Statistical No-Signaling dictates that in this case no information can pass through. This occurs if and only if 2*c*<sup>2</sup> <sup>≤</sup> 1. The case of 2*c*<sup>2</sup> <sup>&</sup>gt; 1 leads to a physically unreasonable limit where Bob can fully read off the value of Alice's *θ* through a disconnected channel.

#### *1.3. Organization of This Paper*

Section 2 recalls the bipartite Bell experiment and exhibits the Bell–CHSH correlator *c* as the correlator of a certain noisy symmetric channel. Section 3 presents the van Dam protocol as an extension of the Bell–CHSH setup, and explain how it defines a noisy symmetric channel with correlator *cn*. Section 4 computes the means and variance of an estimator ˆ *θ* for *θ*, and proves that ˆ *θ* is an efficient estimator. Section 5 extends Theorem 1.1 to recover Uffink's inequality [22,23] for anisotropic correlators from Statistical No-Signaling. Finally, Section 6 discusses the relationship of Statistical No-Signaling with Information Causality.

#### **2. The Bipartite Bell Experiment as a Noisy Symmetric Channel**

In this section we recall the definition of the Bell–CHSH correlator *c* and we formulate the Bell–CHSH inequality, establishing notation. We then exhibit *c* as the correlator of a symmetric binary channel.

#### *2.1. The Bell–CHSH Inequality*

Let us recall the classical bipartite Bell experiment [1]. Alice and Bob each hold one half of an EPR pair (a pair of particles with certain properties summarized below) such as a singlet state of spin–<sup>1</sup> 2 particles. They each possess two different measuring instruments. Alice measures her particle using one of the instruments, and Bob measures his particles using one of his. We write *i* for the index of the instrument used by Alice, and *a* for its reading. Similarly, we let *j* and *b* denote the index of an instrument chosen by Bob and its reading correspondingly. In the language of probability, *a* and *b* are ±1–valued Bernoulli random variables. The choices of measuring instrument, *i* and *j*, may be either parameters or 0/1–valued Bernoulli random variables.

Repeating the experiment for many different EPR pairs, Alice and Bob may compute the two-point correlator *E ab* | *i*, *j* of their readings *a* and *b* for any given pair of indices *i* and *j*, where *E*[·] is the statistical expectation operator. We now define the *Bell–CHSH correlator c* by the formula:

$$c \stackrel{\text{def}}{=} \frac{1}{4} \left\{ E\left[ ab \mid 0, 0 \right] + E\left[ ab \mid 0, 1 \right] + E\left[ ab \mid 1, 0 \right] - E\left[ ab \mid 1, 1 \right] \right\} \tag{2}$$

In a theory in which both Alice and Bob's choices, and the readings of their measuring devices, are *local*, the Bell–CHSH inequality [4] holds:

$$|c| \le \frac{1}{2} \,. \tag{3}$$

Operationally speaking, locality means that Alice's readings may only be affected by her own choices (and perhaps by other variables hidden locally at her site), and similarly for Bob's readings. Quantum mechanically, however, Alice and Bob may violate (3). Correlators violating (3) are said to be *nonlocal*.

#### *2.2. The Bell–CHSH Correlator c as a Channel Correlator*

Non-signaling (NS)–boxes provide an abstraction and an extension of the Bell–CHSH experiment [2,14]. This time, Alice and Bob each owns a box. Such a box may be thought of as a complete laboratory containing two measuring devices. Either participants inserts their choice of measuring device into their box. The box output is the respective reading of the chosen measuring device.

Alice and Bob share a pair of NS–boxes whose 0/1–valued inputs are *i* and *j* and whose ±1–valued outputs are Bernoulli random variables *a* and *b*. We will show that the Bell–CHSH correlator (2) represents the correlator of a symmetric binary channel whose input is the Bernoulli random variable *<sup>X</sup>* def = (−1)*ij* and whose output is the Bernoulli random variable *<sup>Y</sup>* def = *a* · *b*.

Let *x* ∈ {−1, 1}. Define the *channel correlators cx* as follows:

$$\mathcal{L}\_x \stackrel{\text{def}}{=} E\left[XY \mid X=x\right] = p(Y=x \mid X=x) - p(Y \neq x \mid X=x) = 2p(Y=x \mid X=x) - 1 \ . \tag{4}$$

With respect to a particular choice of measuring devices *<sup>i</sup>* and *<sup>j</sup>* and for *<sup>x</sup>* = (−1)*ij*, (4) becomes:

$$\mathcal{L}\_x(i,j) = E\left[a \cdot b \cdot (-1)^{ij} \mid i, j\right] = 2p(a \cdot b = (-1)^{ij} \mid i, j) - 1 \; . \tag{5}$$

Assume the underlying channel is symmetric and therefore that *cx*(*i*, *j*) is fixed for all *i*, *j*. By (5) the Bell–CHSH correlator (2) may be written as:

$$c = \frac{1}{4} \left( c\_1(0,0) + c\_1(0,1) + c\_1(1,0) + c\_{-1}(1,1) \right) = c\_\mathbf{x}(i,j) = 2p(a \cdot b = ij \mid i,j) - 1 \tag{6}$$

which is our promised interpretation of the Bell–CHSH correlator as a correlator of a noisy symmetric binary channel.

#### **3. The Van Dam Protocol as a Noisy Symmetric Channel**

In this section we recall the construction of the van-Dam protocol [17,19]. We then reinterpret this protocol as underlying a noisy symmetric binary channel, as a special case of the construction of Section 2. We compute its correlator, and establish the effect of noise on its classical component.

#### *3.1. The Van Dam Protocol*

The van Dam protocol realizes an *oblivious transfer protocol* by means of a classical channel and a collection of NS-boxes. Each of Alice's boxes has a corresponding box on Bob's side, and different pairs of boxes are statistically independent. Suppose that Alice has in her possession the bits *x*0, ... , *xm*−<sup>1</sup> where *<sup>m</sup>* <sup>=</sup> <sup>2</sup>*n*, *<sup>n</sup>* <sup>≥</sup> 1. Bob wishes to know the value of one of her bits. He may do so by specifying the address of the bit whose value he wishes to know via its binary address *j* = *jn*−<sup>1</sup> *jn*−<sup>2</sup> ··· *j*0. For example, if *n* = 2 then Bob may specify which of the bits *x*<sup>0</sup> to *x*<sup>3</sup> he wants by specifying a binary address, 00, 01, 10, or 11. Alice bits and Bob addresses are encoded into the inputs of 2*<sup>n</sup>* <sup>−</sup> 1 NS-boxes following a particular protocol which is described next.

Alice uses outputs of boxes and choices of measuring device to determine choices of measuring device for other boxes. Such a procedure is called *wiring*. The wiring of boxes on Alice side admits a recursive description which we now give. Let *ak*,*<sup>l</sup> <sup>i</sup>* denote the output of Alice's *l*th box on the *k*th level for the input *i*. We follow the convention that box outputs for the van Dam protocol are 0/1–valued (rather than ±1–valued) random variables. Let also:

$$f^{k,l}\left(q\_1, q\_2\right) \stackrel{\text{def}}{=} q\_1 \oplus a\_{q\_1 \oplus q\_2}^{k,l} \; . \tag{7}$$

Suppose that Alice wishes to encode *m* = 4 bits with her boxes. To do so, she first picks two boxes and computes:

$$\mathbf{x}\_1^{(1)} \stackrel{\scriptstyle \omega t}{=} f^{1,1}\left(\mathbf{x}\_0, \mathbf{x}\_1\right), \quad \mathbf{x}\_2^{(1)} \stackrel{\scriptstyle \omega t}{=} f^{1,2}\left(\mathbf{x}\_2, \mathbf{x}\_3\right) \ . \tag{8}$$

This forms the first level in her construction. The second level then follows:

$$\mathbf{x}^{(2)} \stackrel{\text{def}}{=} f^{2,1}\left(\mathbf{x}\_1^{(1)}, \mathbf{x}\_2^{(1)}\right) \; . \tag{9}$$

In this example there are only two levels and so *x*(2) is the bit which Alice transmits to Bob through the classical channel. In case where *m* = 2*<sup>n</sup>* there will be *n* levels and thus *x*(*n*) is the bit Bob will receive from Alice.

Unbeknownst to Alice, Bob now decides which bit *xj* he would like to know the value of. He takes its binary address *<sup>j</sup>* = *jn*−<sup>1</sup> *ji*−<sup>2</sup> ··· *<sup>j</sup>*0, and inserts *jk*−<sup>1</sup> into all of his boxes whose counterparts are on the *k* level on Alice's side. He then uses the values *bk*,*<sup>l</sup> jk*−<sup>1</sup> that he obtains, together with the bit *<sup>x</sup>*(*n*) he received from Alice, to construct the decoding function:

$$\mathbf{y}\_{j} \stackrel{\scriptstyle \text{def}}{=} \mathbf{x}^{(n)} \oplus \mathbf{b}\_{j0}^{1,l\_1} \oplus \mathbf{b}\_{j\_1}^{2,l\_2} \oplus \cdots \oplus \mathbf{b}\_{j\_{n-1}}^{n,l\_n} \,. \tag{10}$$

The values *l*1, ... , *ln* (which boxes Bob uses) are determined by the binary address *j* = *jn*−<sup>1</sup> *jn*−<sup>2</sup> ··· *<sup>j</sup>*<sup>0</sup> via the recursive formula *lh*−<sup>1</sup> = 2*lh* − 1 + *lh*−<sup>1</sup> for *<sup>h</sup>* = 1, 2, ... *<sup>n</sup>* − 1 starting from *ln* = 1.

The van Dam protocol we have described above is summarized in Figure 2.

The probability that Bob will decode the correct value of the bit he desires is governed by the NS–box correlator *c*. In general, decoding any bit out of 2*<sup>n</sup>* possible bits involves using *n* pairs of NS boxes. Noting that an even number of errors, *a* ⊕ *b* = *ij*, will cancel out in such a construction, we obtain the following expression [19]:

$$\mathbf{c}^{n} = 2p(y\_{j} = \mathbf{x}\_{j} \mid \mathbf{x}\_{j}) - \mathbf{1} \; . \tag{11}$$

#### For example, for *n* = 2:

$$p(a\_{i\_1} \oplus b\_{j\_1} \oplus a\_{j\_2} \oplus b\_{j\_2} = i\_1j\_1 \oplus i\_2j\_2 \mid i\_{1,2}, j\_{1,2}, i\_1j\_1 \oplus i\_2j\_2) = $$

$$p(a\_{i\_1} \oplus b\_{j\_1} = i\_1j\_1 \mid a\_{1}, b\_{1})p(a\_{i\_2} \oplus b\_{j\_2} = i\_2j\_2 \mid i\_2, j\_2) + $$

$$p(a\_{i\_1} \oplus b\_{j\_1} \neq i\_1j\_1 \mid i\_{1,}, j\_1)p(a\_{i\_2} \oplus b\_{j\_2} \neq i\_2j\_2 \mid i\_2, j\_2) = $$

$$\frac{1}{2}(1+c)\cdot\frac{1}{2}(1+c) + \frac{1}{2}(1-c)\cdot\frac{1}{2}(1-c) = \frac{1}{2}(1+c^2) \quad \text{(12)}$$

**Figure 2.** Distributed oblivious transfer (van Dam) protocol [17]. Its basic building block is on the left, where Alice inserts *x*<sup>0</sup> ⊕ *x*<sup>1</sup> into her box, receives *a*, and sends *x*<sup>0</sup> ⊕ *a* to Bob. Bob decides that he wants to know the value of *xj*, and he feeds *j* into his box, which outputs *b*. Bob's estimate of *xi* is then *<sup>x</sup>*(1) <sup>⊕</sup> *<sup>b</sup>*. When there are multiple boxes, Alice concatenates (the process is called *wiring*). For example, with seven boxes, Alice begins with a collection of bits *x*0, *x*1, ... , *x*7, and she inputs *x*2*<sup>i</sup>* ⊕ *x*2*i*+<sup>1</sup> into box *i*, where *i* = 0, 1, 2, 3, receiving *a*0, *a*1, *a*2, *a*<sup>3</sup> correspondingly. The bits fed into the next level of boxes become *x* (1) *i* def <sup>=</sup> *<sup>x</sup>*2*<sup>i</sup>* <sup>⊕</sup> *ai* with *<sup>i</sup>* <sup>=</sup> 0, 1, 2, 3. The final output *<sup>x</sup>*(3) is sent to Bob. Bob encodes the address of the bit he wants as the binary number *j*<sup>3</sup> *j*<sup>2</sup> *j*1—for example, if he wants *x*2, then he sets *j*<sup>3</sup> = 0, *j*<sup>2</sup> = 1, and *j*<sup>1</sup> = 0 because 10 is 2 in binary. This binary encoding describes a path in his binary tree from a root to a branch, where 0 means 'go left' and 1 means 'go right'. Bob inserts *j*<sup>3</sup> into the lowermost box to obtain *b*6. Setting *k* def = 5 − (1 − *j*3), he then inserts *j*<sup>2</sup> into box *k* to obtain *bk*. Finally, setting *l* def = *k* − (3 − *j*3) − (1 − *j*2), Bob inserts *j*<sup>1</sup> into box *l* to obtain *Bl*. His final estimate for *xj* is *yj* <sup>=</sup> *<sup>x</sup>*(3) <sup>⊕</sup> *<sup>b</sup>*<sup>6</sup> <sup>⊕</sup> *bk* <sup>⊕</sup> *bl*.

#### *3.2. Van Dam Protocol as a Symmetric Channel*

This section describes the modification of the van Dam protocol that we use.

Alice has in her possession an information source that is a ±1-valued Bernoulli random variable *X* whose mean is *θ*. Alice takes *m* iid samples, *x*˜0, ... , *x*˜*m*−1, from *X* and converts them into 0/1-valued bits, *x*0, *x*1, ... , *xm*−<sup>1</sup> by mapping 0 to −1 and 1 to 1. Alice and Bob repeat the van Dam protocol *m* times, once for each of Alice's samples. Each time, Bob uses the protocol to estimate Alice's bit, first *x*0, then *x*1, and so on until *xm*−1.

As in (12), the van Dam protocol has a *memoryless* property:

$$p(y\_i = \mathbf{x}\_i \mid \mathbf{x}\_0, \mathbf{x}\_1, \dots, \mathbf{x}\_{m-1}) = p(y\_i = \mathbf{x}\_i \mid \mathbf{x}\_i) \tag{13}$$

*Entropy* **2018**, *20*, 151

From this it follows that if Alice's inputs *x*0, *x*1, ... , *xm*−<sup>1</sup> are iid then Bob's outputs *y*0, *y*1, ... , *ym*−<sup>1</sup> are also iid. Therefore the set of *y*˜*<sup>i</sup>* def = (−1)*yi* determines a Bernoulli random variable *<sup>Y</sup>*. In this way, the van Dam protocol may be viewed as a symmetric binary channel whose input is *X* and whose output is *Y*. By (11) the channel correlator is:

$$E\left[XY \mid X = \overline{x}\_{i}\right] = 2p(Y = \overline{x}\_{i} \mid X = \overline{x}\_{i}) - 1 = 2p(y\_{i} = x\_{i} \mid x\_{i}) - 1 = \mathcal{c}^{n} \tag{14}$$

We generalize slightly, for the purpose of treating the |*c*| = 1 case in the next section. Suppose that Alice's bits are contaminated with noise and therefore might be flipped once injected into her boxes. Let [1− (*c* )*n*]/2 be the probability that the bit *xi* is flipped where <sup>|</sup>*c* | ≤ 1. In this case the corresponding channel correlator (14) is *E* [*XY* | *X* = *x*˜*i*] = (*cc* )*n*, which follows from (4) and:

$$p(Y = \overline{x}\_{i} \mid X = \overline{x}\_{i}) = p(Y = \overline{x}\_{i} \mid X' = \overline{x}\_{i}) p(X' = \overline{x}\_{i} \mid X = \overline{x}\_{i}) + \\

$$p(Y = \overline{x}\_{i} \mid X' \neq \overline{x}\_{i}) p(X' \neq \overline{x}\_{i} \mid X = \overline{x}\_{i}) = \frac{1}{2} [1 + (\text{cc}')^{n}] \ , \quad \text{(15)}$$
$$

where *<sup>p</sup>*(*<sup>Y</sup>* <sup>=</sup> *<sup>x</sup>*˜*<sup>i</sup>* <sup>|</sup> *<sup>X</sup>* <sup>=</sup> *<sup>x</sup>*˜*i*)=[<sup>1</sup> <sup>+</sup> *<sup>c</sup>n*]/2 underlies the channel defined by the ordinary van Dam protocol, and *p*(*X* = *x*˜*<sup>i</sup>* | *X* = *x*˜*i*)=[1 − (*c* )*n*]/2 is the probability of *xi* having been flipped.

*3.3. The Van Dam Channel Disconnects in the n* → ∞ *Limit*

If |*c*| < 1 or |*c* | < 1 then it follows that:

$$E[XY] = 2p(Y = i \mid X = i) - 1 = (\mathfrak{cc}')^n \xrightarrow{n \to \infty} 0 \ . \tag{16}$$

Therefore, in the *n* → ∞ limit:

$$p(\mathbf{Y} = i \mid X = i) = 1/2 \; . \tag{17}$$

But also:

$$p(Y=i) = p(Y=i \mid X=i)p(X=i) + p(Y=i \mid X\neq i)p(X\neq i) = \frac{1}{2}(p(X=i) + p(X\neq i)) = \frac{1}{2} \tag{18}$$

Combining (17) with (18) gives:

$$p(\boldsymbol{Y} \mid \boldsymbol{X}) \stackrel{\boldsymbol{\eta} \to \infty}{\longrightarrow} p(\boldsymbol{Y})\,. \tag{19}$$

Thus *X* and *Y* are statistically independent in the *n* → ∞ limit, proving the first part of Theorem 1.1.

#### **4. Bob's Estimator**

#### *4.1. Bob's Estimator*

In Section 3 we used the van Dam protocol to construct a symmetric channel whose input is a ±1–valued Bernoulli random variable *X* and whose output is another ±1–valued Bernoulli random variable *Y*. The channel correlator is *cn*.

Alice sends *<sup>m</sup>* iid random samples <sup>X</sup> def = {*X*1,..., *Xm*} through the channel. Denote the set of respective outputs <sup>Y</sup> def = {*Y*1,...,*Ym*}. Assume a prior distribution for *X* given by:

$$p(X = -1 \mid \theta) = \frac{1}{2}(1 + \theta) \quad , \tag{20}$$

*Entropy* **2018**, *20*, 151

with parameter *θ* ∈ [−1, 1].

Bob attempts to estimate *θ* using the estimator:

$$\hat{\theta} \stackrel{\text{def}}{=} \frac{1}{2^n \mathcal{C}^n} \sum\_{i=0}^{2^n - 1} \mathcal{Y}\_i \ . \tag{21}$$

We will show that Bob's estimator is unbiased, *E* ˆ *θ* | *θ* = *θ*. Note that

$$E\left[\mathbf{Y}\_{i}\mid\theta\right] = p(\mathbf{Y} = 1 \mid \theta) - p(\mathbf{Y} = -1 \mid \theta) \; . \tag{22}$$

and

$$p(Y=-1 \mid \theta) = p(Y=-1 \mid X=-1)p(X=-1 \mid \theta) + p(Y=-1 \mid X=1)p(X=1 \mid \theta) = \frac{1+\alpha\theta}{2} \dots \text{(23)}$$

From (22) and (23) together, deduce:

$$E\left[\mathbf{Y}\_i \mid \theta\right] = c^\eta \theta \; . \tag{24}$$

and therefore, *E* ˆ *θ* | *θ* = *θ*.

As for variance, by (24):

$$\text{Var}\left[\mathbf{Y}\_{\bar{i}} \mid \theta\right] = E\left[\mathbf{Y}\_{\bar{i}}^{2} \mid \theta\right] - E\left[\mathbf{Y}\_{\bar{i}} \mid \theta\right]^{2} = 1 - c^{2n}\theta^{2} \; . \tag{25}$$

Therefore:

$$\text{Var}\left[\hat{\theta} \mid \theta\right] = \frac{1 - c^{2n}\theta^2}{(2c^2)^n} \; . \tag{26}$$

We have proved the second part of Theorem 1.1.

#### *4.2. Bob's Estimator* ˆ *θ is Efficient*

We prove efficiency of ˆ *θ* by calculating the Fisher information about *θ* contained in Bob's set of samples B. The Cramer–Rao Theorem tells us that one over this Fisher information is a lower bound for the variance of an estimator for *<sup>θ</sup>* constructed from <sup>B</sup>. By showing that <sup>ˆ</sup> *θ* saturates this bound, we will have proven that it is efficient. In the derivation that follows, we assume that |*c*| < 1 by replacing *c* by *cc* if necessary.

We compute the Fisher information. The *likelihood* of *θ* given the set B is given by the expression:

$$p(\mathcal{B} \mid \theta) = \left[ p(Y = -1 \mid \theta) \right]^{\sum\_{i=1}^{2^{\sigma}} \mathbf{1}\_{\{Y\_i = -1\}}} \left[ p(Y = 1 \mid \theta) \right]^{\sum\_{i=1}^{2^{\sigma}} \mathbf{1}\_{\{Y\_i = 1\}}} \tag{27}$$

where the *indicator* random variable of a random event *A* is given as:

$$\mathbf{1}\_A \stackrel{\text{def}}{=} \begin{cases} 1, & \text{ $A$  occurred;}\\ 0, & \text{otherwise.} \end{cases} \tag{28}$$

According to (27) the log-likelihood is given by the expression:

$$\mathcal{L}(\boldsymbol{\theta}) \stackrel{\text{def}}{=} \log p(\mathcal{B} \mid \boldsymbol{\theta}) = \left[ \sum\_{i=1}^{2^n} \mathbf{1}\_{\{Y\_i = -1\}} \right] \log p(Y = -1 \mid \boldsymbol{\theta}) + \left[ \sum\_{i=1}^{2^n} \mathbf{1}\_{\{Y\_i = 1\}} \right] \log p(Y = 1 \mid \boldsymbol{\theta}) \tag{29}$$

The *Fisher information* about *θ* contained in the set B is defined as:

$$\mathcal{Z}\_{\mathcal{B}}(\theta) \stackrel{\text{def}}{=} E\left[ \left( \frac{\partial \mathcal{L}(\theta)}{\partial \theta} \right)^{2} \right] = -E\left[ \frac{\partial^{2} \mathcal{L}(\theta)}{\partial \theta^{2}} \right] \tag{30}$$

Note that:

$$E\left[\sum\_{i=1}^{2^n} \mathbf{1}\_{\{Y\_i = s\}}\right] = \sum\_{i=1}^{2^n} E\left[\mathbf{1}\_{\{Y\_i = s\}}\right] = 2^n p(Y = s \mid \theta), \quad s = -1, 1 \quad . \tag{31}$$

Using this, (30) reads:

$$\mathcal{I}\_{\mathcal{B}}(\theta) = \frac{(2c^2)^n}{1 - c^{2n}\theta^2} \ . \tag{32}$$

Indeed the Fisher information about *θ* in B as given by Equation (32) equals one over the variance of ˆ *θ* as given by Equation (26). Thus, by the Cramer–Rao Theorem, ˆ *θ* is an efficient estimator for *θ*. Parenthetically, note that the minimum of IB(*θ*) is obtained for *<sup>θ</sup>* = 0 in which case *<sup>p</sup>*(*<sup>X</sup>* | *<sup>θ</sup>*) = 1/2 and IB(0)=(2*c*2)*n*. We have proved the final part of Theorem 1.1.

#### **5. Uffink's Inequality from Statistical No-Signalling**

The basic protocol in Section 3 assumes all box correlators are identical in absolute value. When this assumption is relaxed, Statistical No-Signaling leads to Uffink's inequality, which is a necessary condition for quantum mechanical Bell-CHSH correlators [22,23]. Our approach is based on evaluating the total Fisher information IB(*θ*) gained by Bob in 2*<sup>n</sup>* trials of the experiment.

Suppose that the mean of Alice's bits, *xi*, is *θ* for even *i*, and *θ* otherwise. Consider now a pair of NS-boxes with correlators, *c*(*i*, *j*) def = *E*[*ab* | *i*, *j*]. The channel underlying the van Dam protocol in this case is described by

$$p(y\_j = \mathbf{x}\_j \mid \mathbf{x}\_{0\prime} \ge \mathbf{1}\_j) = p(\mathbf{a} \oplus b = i\mathbf{j} \mid j, i = \mathbf{x}\_0 \oplus \mathbf{x}\_1) = \left[1 + \mathbf{c}(\mathbf{x}\_0 \oplus \mathbf{x}\_1, j)\right] / 2,\tag{33}$$

where *yj* is Bob's guess of Alice's bit *xj*. It now follows that

*p*(*yj* = 1 | *θ* , *θ*) = *<sup>p</sup>*(*yj* = *xj* | *xj* = 1, *<sup>x</sup>*1−*<sup>j</sup>* = <sup>1</sup>)*p*(*xj* = <sup>1</sup>)*p*(*x*1−*<sup>j</sup>* = <sup>1</sup>) + *<sup>p</sup>*(*yj* = *xj* | *xj* = 0, *<sup>x</sup>*1−*<sup>j</sup>* = <sup>0</sup>)*p*(*xj* = <sup>0</sup>)*p*(*x*1−*<sup>j</sup>* = <sup>0</sup>)+ *<sup>p</sup>*(*yj* = *xj* | *xj* = 1, *<sup>x</sup>*1−*<sup>j</sup>* = <sup>0</sup>)*p*(*xj* = <sup>1</sup>)*p*(*x*1−*<sup>j</sup>* = <sup>0</sup>) + *<sup>p</sup>*(*yj* = *xj* | *xj* = 0, *<sup>x</sup>*1−*<sup>j</sup>* = <sup>1</sup>)*p*(*xj* = <sup>0</sup>)*p*(*x*1−*<sup>j</sup>* = <sup>1</sup>) = 1 2 1 + <sup>1</sup> <sup>2</sup> (*c*(0, *<sup>j</sup>*)+(−1)*<sup>j</sup> c*(1, *j*))*θ* + <sup>1</sup> <sup>2</sup> (*c*(0, *<sup>j</sup>*) <sup>−</sup> (−1)*<sup>j</sup> c*(1, *j*))*θ* . (34)

For simplicity, assume that *θ* = 0. It can now be verified that for a *n*-level construction in the van Dam protocol

$$p(y\_{j\_1,\ldots,j\_n} = 1 \mid \theta) = \frac{1}{2} \left[ 1 + c\_{j\_1} c\_{j\_2} \cdots c\_{j\_n} \theta \right] \, , \tag{35}$$

where *cj* def = (*c*(0, *<sup>j</sup>*) <sup>−</sup> (−1)*<sup>j</sup> c*(1, *j*))/2. According to (32) the Fisher information about *θ* contained in *yj*1,...,*jn* is

$$\mathcal{Z}\_{\dot{\gamma}\_1,\ldots,\dot{\gamma}\_n}(\theta) = \frac{\left(\mathbf{c}\_{\dot{\gamma}\_1}\cdots\mathbf{c}\_{\dot{\gamma}\_n}\right)^2}{1 - \left(\mathbf{c}\_{\dot{\gamma}\_1}\cdots\mathbf{c}\_{\dot{\gamma}\_n}\right)^2 \theta^2}. \tag{36}$$

Assuming <sup>|</sup>*c*(*i*, *<sup>j</sup>*)<sup>|</sup> <sup>&</sup>lt; 1, Bob's total amount of information about *<sup>θ</sup>* in 2*<sup>n</sup>* trials is

$$\mathcal{Z}\_{\mathbb{E}}(\theta) = \sum\_{j\_1=0,1} \cdots \sum\_{j\_n=0,1} \mathcal{Z}\_{j\_1,\ldots,j\_n}(\theta) \approx \sum\_{j\_1=0,1} \cdots \sum\_{j\_n=0,1} \left(c\_{j\_1} \cdots c\_{j\_n}\right)^2 = \left[c\_0^2 + c\_1^2\right]^n,\tag{37}$$

for large *n*. As before, the underlying channel asymptotically disconnects for *cj*<sup>1</sup> ··· *cjn* → 0 in the *n* → ∞ limit. Statistical No-Signaling dictates that in this case the variance of Bob's estimator lim*n*→<sup>∞</sup> Var ˆ *θ* | *θ* <sup>=</sup> lim*n*→<sup>∞</sup> IB(*θ*)−<sup>1</sup> <sup>≥</sup> 1, which holds if and only if Uffink's inequality holds [22],

$$c\_0^2 + c\_1^2 = \frac{1}{4} \left[ \varepsilon(0,0) - \varepsilon(1,0) \right]^2 + \frac{1}{4} \left[ \varepsilon(0,1) + \varepsilon(1,1) \right]^2 \le 1. \tag{38}$$

#### **6. Relation to Information Causality**

Of previous non-quantum justifications of Tsirelson's bound, Information Causality (IC) is perhaps the closest to Statistical No-Signalling [19]. IC is also stated as a limit on communication: *Information gain that Bob can reach about a previously unknown to him data set of Alice, by using all his local resources and m classical bits communicated by Alice, is at most m bits.*

IC is formally a restriction on the classical channel capacity. Detecting violation of this principle therefore requires the utilization of nonlocal resources, which the authors achieve through the application of IC to the van Dam protocol, that is the same communication protocol used in this paper.

The Information Causality quantity *I* is defined as the Shannon mutual information of Alice's input and Bob's output given the value of the single bit transmitted in the van Dam protocol. IC holds if *I* ≤ 1 and is violated if *I* > 1. At the end of the supplementary section of [19], the following expression for the IC quantity is obtained:

$$I \ge \frac{1}{2\ln(2)} \left(c\_1^2 + c\_{-1}^2\right)^n \,\, \,\, \,\, \,\tag{39}$$

where *ci* def = *E XY* <sup>|</sup> *<sup>X</sup>* <sup>=</sup> ˜*<sup>i</sup>* as in (4). In the symmetric setting, *c*<sup>1</sup> = *c*−<sup>1</sup> = *c*, and for *θ* = 0, Equations (39) and (32) combine to yield:

$$I \ge \frac{2^n c^{2n}}{2\ln(2)} = \frac{\left[1 - c^{2n} \theta^2\right] \mathcal{Z}\_\mathcal{E}(\theta)}{2\ln(2)}\ . \tag{40}$$

In particular, in the *<sup>n</sup>* <sup>→</sup> <sup>∞</sup> limit, if 2*c*<sup>2</sup> <sup>&</sup>gt; 1 then IB(*θ*) <sup>→</sup> <sup>∞</sup> implying that *<sup>I</sup>* <sup>→</sup> <sup>∞</sup>. Thus, violation of Statistical No-Signaling implies violation of IC. Conversely, as (39) is an inequality, it is unknown whether Tsirelson's bound being satisfied implies *I* ≤ 1 (IC for the van Dam protocol), although, by our main theorem, it does imply IB(*θ*) ≤ 1 (Statistical No-Signaling for the van Dam protocol).

#### **7. Conclusions**

We have formulated a *Statistical No-Signaling* principle which dictates that no information can pass through a disconnected channel. A violation of Tsirelson's bound, *i.e.* a value of |*c*| greater that 1/ <sup>√</sup>2, allows us to violate Statistical No-Signalling by constructing a disconnected channel through which Bob can construct an unbiased estimator with variance 0 for Alice's parameter *θ*. Conversely, when Tsirelson's bound holds, then, through this channel, so does Statistical No-Signalling. Our construction thus provides a purely statistical justification for Tsirelson's bound, independent of quantum mechanics.

**Acknowledgments:** The authors thank Daniel Rohrlich for useful discussions. Avishy Carmi acknowledges support from Israel Science Foundation Grant No. 1723/16.

**Author Contributions:** Avishy Carmi and Daniel Moskovich have both written the text and worked out the mathematical proofs in this paper.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


c 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Entropy* Editorial Office E-mail: entropy@mdpi.com www.mdpi.com/journal/entropy

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18