A Maximum Entropy Approach to the Realizability of Spin Correlation Matrices

Dai Pra, Paolo; Pavon, Michele; Sahasrabudhe, Neeraja

doi:10.3390/e15062448

Open AccessArticle

A Maximum Entropy Approach to the Realizability of Spin Correlation Matrices

by

Paolo Dai Pra

,

Michele Pavon

^* and

Neeraja Sahasrabudhe

Department of Mathematics, University of Padova, via Trieste 63, 35121 Padova, Italy

^*

Author to whom correspondence should be addressed.

Entropy 2013, 15(6), 2448-2463; https://doi.org/10.3390/e15062448

Submission received: 26 February 2013 / Revised: 10 June 2013 / Accepted: 15 June 2013 / Published: 21 June 2013

Download Versions Notes

Abstract

:

Deriving the form of the optimal solution of a maximum entropy problem, we obtain an infinite family of linear inequalities characterizing the polytope of spin correlation matrices. For

n \leq 6

, the facet description of such a polytope is provided through a minimal system of Bell-type inequalities.

Keywords:

correlation matrix; spin system; maximum entropy; Bell’s inequalities; moment problem

1. Introduction

Moment problems are fairly common in many areas of applied mathematics, statistics and probability, economics, engineering, physics and operations research. Historically, moment problems came into focus with Stieltjes in 1894 [1], in the context of studying the analytic behavior of continued fractions. The term “moment” was borrowed from mechanics: the moments could represent the total mass of an unknown mass density, the torque necessary to support the mass on a beam, etc. Over time, however, moment problems took the shape of an important field in their own right. A deep connection with convex geometry was discovered by Krein in the mid 1930s and developed by the Russian school; see, e.g., [2,3]. Another fundamental connection with the work of Caratheodory, Toeplitz, Schur, Nevanlinna and Pick on analytic interpolation was investigated in the first half of the twentieth century [4]. This led to important developments in operator theory; see, e.g., [5,6]. In more recent times, a rather impressive application and generalization of this mathematics has been developed by the Byrnes-Georgiou-Lindquist school for signal and control engineering applications. In their approach, one seeks a multivariate, stationary stochastic process as the input of a bank of rational filters whose output covariance has been estimated. This turns into a Nevanlinna-Pick interpolation problem with a bounded degree [7,8]. The latter can be viewed as a generalized moment problem (namely, a moment problem with complexity constraints), which is advantageously cast in the frame of various convex optimization problems, often featuring entropic-type criteria. An example is provided by the covariance extension problem and its generalization; see [9,10,11,12,13,14]. These problems pose a number of theoretical and computational challenges, especially in the multivariable framework, for which we also refer the reader to [15,16,17,18,19,20,21,22]. Besides signal processing, significant applications of this theory are found in modeling and identification [23,24,25],

H_{\infty}

robust control [26,27], and biomedical engineering [28].

A general moment problem can be stated as follows. Suppose we are given a measurable space,

(Ω, A)

, a family,

F

, of measurable functions for Ω to

R

and a corresponding family of real numbers,

{c_{f} : f \in F}

. One wants to determine whether there exists a probability, P, on

(Ω, A)

, such that:

E_{P} (f) : = \int f d P = c_{f}

for every

f \in F

and, if so, characterize all probabilities having this property.

Among the various instances of this inverse problem, the covariance realization/completion problem has raised wide interest, in part because of its important applications in mathematical statistics [29] and in theoretical engineering [30]. It is well known that an

n \times n

matrix is the covariance matrix of some

R^{n}

-valued random vector if and only if it belongs to the convex cone of symmetric and positive semidefinite matrices. A relevant problem for applications considers the situation in which only some entries of the covariance matrix are given, for example, those that have been estimated from the data. In this context, one aims at characterizing all possible completions of the partially given covariance matrix or completions that possess certain desirable properties; see, e.g., [29,31,32,33,34,35,36,37] and references therein. Another more theoretical problem investigates the geometry of correlation matrices, namely, covariances of standardized random variables. Clearly, correlation matrices form a compact, convex subset of the vector space of symmetric matrices of dimension, n, since the latter is determined by a family of linear inequalities, namely, the positivity constraints. A natural question is to determine the extreme points of this convex set. This problem was solved by Parthasarathy in [38], who could parametrize the uncountable family of extremals, which turn out to be all singular.

The geometry of correlation matrices may change dramatically if one adds constraints on the values of the random vector realizing the covariance matrix. In this paper, we consider the case in which the components of the random vector are required to be

{- 1, 1}

-valued. We call spin systems random vectors of this type or, by abuse of language, their distributions. Although spin systems have been extensively studied in statistical mechanics and probability, various questions are still open concerning their covariance matrices. In [39], J. C. Gupta proved that covariance matrices of a system of n spins form a polytope and exhibited its

2^{n - 1}

extremals. Apparently, his result is contained in some form in Pitowsky’s previous work [40] and in even earlier, but not easily accessible, work by Assouad (1980) and Assouad and Deza (1982); cf. [41] (Section 5.3), for more information.

A more delicate problem is to characterize covariance matrices of spin systems by a system of linear inequalities. This problem was tackled in [40] (see, however [41] (p.54) for a thorough description of preceding contributions on “correlation polytopes” coming from such different fields as mathematical physics, quantum logic and analysis). There, the dual description, in the sense of linear programming, was considered, and the high complexity of the problem of determining the extremals of this dual, i.e., the facets of the polytope, was discussed. Moreover, in [42], it has been shown that this dual is generated by the Bell’s inequalities (see [43] for an overview of the role played by these inequalities in quantum mechanics) for

n = 3, 4

, but not for

n \geq 5

. Finally, we mention the paper [44], where the problem of realizability of correlations has been extended to the more general setting of random points in a topological space.

The aim of this paper is two-fold.

We derive an infinite family of linear inequalities characterizing covariances of spin systems, via the solution of a maximum entropy problem. Besides its intrinsic interest, this method has the advantage of describing, in terms of certain Lagrangian multipliers, an explicit probability realizing the covariances, whenever they are realizable. The search for the Lagrange multipliers is an interesting computational problem, which will be addressed in a forthcoming paper.
Via a computer-aided proof, we determine the facets of the polytope of covariance matrices of spin systems for $n \leq 6$ . In particular, we show that for these values of n, Bell’s inequalities are actually facets of the polytope, but generate the whole polytope only for $n = 3, 4$ . For $n = 5$ and 6, the remaining facets are given by suitable generalizations of Bell’s inequalities. Although the problem is computationally feasible also for some larger values of n, the number of extremal inequalities increases dramatically, and we have not been able to describe them synthetically. We mention the fact that the case $n = 3$ is peculiar, since it is the only case in which the polytope is a simplex. A more detailed description of this case is contained in the note [45]. Our work here inevitably overlaps with some previous research on linear descriptions of polytopes in combinatorial geometry, such as [46]; see also (Section 30.6 in [41]) and, in particular, the footnote on p. 503 of the latter reference (the book [41] by M. Deza and M. Laurent is a general, comprehensive reference on discrete geometry). We remark that our arguments go through even when the covariance matrix is only partially given, a case important for applications, but typically not considered in the discrete geometry literature.

Summing up, we obtain necessary and sufficient conditions for the existence of a covariance completion, as well as a “canonical” (maximum entropy) probability realizing the given covariances.

2. Spin Systems and Spin Correlation Matrices

Let us define a spin system first. Let

Ω_{n} = {\{- 1, 1\}}^{n}

be the space of length-n sequences, which are denoted by

σ = (σ_{1}, σ_{2}, \dots, σ_{n})

, where

σ_{i} \in {- 1, 1}

. Define the spin random variables,

ξ_{i} : Ω_{n} ⟶ {- 1, 1}

, for

1 \leq i \leq n

as

ξ_{i} (σ) = σ_{i}

. For a probability, P, on

Ω_{n}

, we denote by

E_{P}

the expectation with respect to P. The finite probability space

(Ω_{n}, P)

is called the spin system. As in [39], for simplicity, we only consider symmetric probabilities, i.e., those for which

E_{P} (ξ_{i}) = 0

for all

1 \leq i \leq n

. Note that, in this case, the covariance matrix,

C = E_{P} {(ξ_{i} ξ_{j})}_{i, j = 1}^{n}

, has all diagonal elements equal to 1. We will refer to this matrix as the spin-correlation matrix associated to P.

Suppose that we are given the spin-spin correlations,

c_{i j}

’s, and look for a probability, P, on

Ω_{n}

, such that

c_{i j} = E_{P} (ξ_{i} ξ_{j})

for every

1 \leq i, j \leq n

or for all pairs

(i, j)

for which

c_{i j}

is given. We consider the following questions:

Under what conditions does a distribution with those correlations exist?
If one such distribution exists, that is, if the given correlations are realizable, then how does one characterize the maximum entropy probability measure?

The spin correlation/covariance matrices form a convex polytope whose description in terms of vertices is known. Let us denote the convex polytope of the spin correlation matrices of the order, n, by

{Cov}_{n}

. J. C. Gupta proved that

{Cov}_{n}

is the convex hull of

2^{n - 1}

matrices (it turns out that these matrices are exactly the extremal vertices of this polytope) that can be found explicitly.

Theorem 1

(J. C. Gupta, 1999): The class of realizable correlation matrices of n spin variables is given by:

\begin{matrix} {Cov}_{n} = C o n v e x H u l l {Σ^{S} : S \in S} \end{matrix}

where

Σ^{S}

and

S

are defined as follows:

\begin{matrix} S = \{S \subset \{1, 2, \dots, n\} : 1 \in S\} \end{matrix}

and

\begin{matrix} Σ^{S} = ((c_{i j}^{S})) \end{matrix}

where

c_{i i}^{S} = 1

for all i and

c_{i j}^{S} = {(- 1)}^{| S \cap \{i, j\} |}

for

i \neq j

.

These

Σ^{S}

’s are rank-1 matrices obtained by considering probability measures on

Ω_{n}

supported at two points (configurations), such that each of these two configurations have probability,

1 / 2

. The proof of above theorem can be found in [39].

It is interesting to note that the description of extremals of spin correlation matrices is rather simple when compared with the description of extremals of the convex set of correlation matrices in general (see [38]). As mentioned in the introduction, it is more difficult to obtain the dual representation in terms of linear inequalities. One simple observation is that every spin correlation matrix,

C = (c_{i j})

, must satisfy the following Bell’s inequalities: for every

ε \in Ω_{n}

and

1 \leq i < j < k \leq n

:

\begin{matrix} 1 + ε_{i} ε_{j} c_{i j} + ε_{j} ε_{k} c_{j k} + ε_{k} ε_{i} c_{k i} \geq 0 \end{matrix}

(1)

The necessity of these inequalities is easy to show:

\begin{matrix} ε_{i} ε_{j} c_{i j} + ε_{j} ε_{k} c_{j k} + ε_{k} ε_{i} c_{k i} & = E_{P} [ε_{i} ε_{j} ξ_{i} ξ_{j} + ε_{j} ε_{k} ξ_{j} ξ_{k} + ε_{k} ε_{i} ξ_{k} ξ_{i}] \\ = \frac{1}{2} E_{P} [{(ε_{i} ξ_{i} + ε_{j} ξ_{j} + ε_{k} ξ_{k})}^{2}] - \frac{3}{2} \geq - 1 \end{matrix}

where, in the last step, we have observed that

{(σ_{i} + σ_{j} + σ_{k})}^{2} \geq 1

for every

σ \in Ω_{n}

. One immediate consequence is that not all positive matrices, with diagonal elements equal to 1, are spin correlation matrices.

Example 1

Consider a symmetric matrix:

\begin{matrix} C & = (\begin{matrix} 1 & c_{1} & c_{2} \\ c_{1} & 1 & c_{1} \\ c_{2} & c_{1} & 1 \end{matrix}) \end{matrix}

with

- 1 \leq c_{1}, c_{2} \leq 1

. Then, the condition for positive-definiteness is:

1 - 2 c_{1}^{2} + c_{2} \geq 0

and the Bell’s inequalities are given by:

1 \pm 2 c_{1} + c_{2} \geq 0

For instance, for

c_{1} \in (- \frac{1}{\sqrt{2}}, - \frac{1}{2}) \cup (\frac{1}{2}, \frac{1}{\sqrt{2}})

and

c_{2} = 0

, we get a matrix, C, that is symmetric and positive semi-definite—hence, a correlation matrix—but it does not satisfy the Bell’s inequalities, so it can’t be a spin correlation matrix.

3. The Dual Representation for $n \leq 6$

We have seen that the spin correlation matrices form a convex polytope with extreme points given by Theorem 1. Every convex polytope has two representations: one as the convex hull of finitely many extreme points (known as the V-representation) and another in terms of the inequalities defining the faces of the polytope (known as the H-representation). These inequalities provide necessary and sufficient conditions for a point to lie inside the convex hull. Thus, finding necessary and sufficient conditions for a matrix, M, to lie in

{Cov}_{n}

is equivalent to finding the H-representation of

{Cov}_{n}

. The problem of obtaining the H-representation from the V-representation is called the facet enumeration problem, while the dual one is called the vertex enumeration or the convex hull problem. These are well known problems in the theory of linear programming.

The program, cdd+ (cdd, respectively), is a C++ (ANSI C) program that performs both tasks. Given the equations of faces of the polytope, it returns the set of vertices and extreme rays and vice versa [47]. This program is a computer implementation of the double description method (see, for instance, [48]). This program works with integer arithmetics; in particular, when data in the input are integers, it does not make any rounding.

We executed the cdd+ program to find the necessary and sufficient condition for

3 \leq n \leq 6

. We know the extremals in each case from Theorem 1. We summarize below the results obtained. We remark that the facets of this correlation polytope have been previously computed for

n \leq 8

(Section 30.6 in [41]). Our point here is to connect these facets with Bell’s inequalities and their generalizations (see Section 4).

3.1. Cases $n = 3, 4$

These are the simplest cases, already covered in [42]. The program returns exactly the Bell’s inequalities in Equation (1). In particular, the following nontrivial facts follow:

Bell’s inequalities imply positivity of the matrix;
Bell’s inequalities correspond to the facets of the polytope of spin correlation matrices in dimension three and four; in particular, they provide the “minimal” description in terms of linear inequalities.

3.2. Case $n = 5$

The polytope of spin-correlation matrices has 56 facets. Forty of these are given by the Bell’s inequalities, corresponding to

(\binom{5}{3}) = 10

choices of three indexes and

2^{2}

for

ε \in {- 1, 1}^{3}

(modulo change of sign).

The remaining 16 facets correspond to the following inequalities: for every

ε \in {- 1, 1}^{5}

, the modulo sign:

2 + \sum_{1 \leq i < j \leq 5} ε_{i} ε_{j} c_{i j} \geq 0

(2)

3.3. Case $n = 6$

There are 368 facets. We can group the corresponding inequalities into three groups.

We have the $(\binom{6}{3}) 2^{2} = 80$ Bell’s inequalities.
For $T \subseteq {1, 2, 3, 4, 5, 6}$ with $| T | = 5$ and $ε \in {- 1, 1}^{T}$ , we consider the inequality analogous to Equation (2):

$2 + \sum_{i < j; i, j \in T} ε_{i} ε_{j} c_{i j} \geq 0$

There are $(\binom{6}{5}) 2^{4} = 96$ inequalities of this sort.
For $T \subseteq {1, 2, 3, 4, 5, 6}$ with $| T | = 5$ and $ε \in {- 1, 1}^{T}$ , letting $j_{T}$ be the only element of $T^{c}$ , consider the inequalities:

$4 + \sum_{i < j; i, j \in T} ε_{i} ε_{j} c_{i j} + 2 \sum_{i \in T} ε_{i} ε_{j_{T}} c_{i j_{T}} \geq 0$

There are $(\binom{6}{5}) 2^{5} = 192$ such inequalities.

We will see in the next section that inequalities of the types above hold for spin-correlation matrices also in higher dimensions, where, however, facets of different types appear.

4. Maximum Entropy Measure for Spin Systems

4.1. Maximum Entropy Method

Our aim now is to find an explicit measure that realizes the given covariances. One of the most natural and popular approach in these kind of problems is to use the maximum entropy method. The rationale underline this approach has been discussed over the years by various “deep thinkers" such as Jaynes [49,50,51] (physics), Dempster [29] (statistics) and Csiszár [52] (information theory). We refer the reader to these references for full motivation of this approach.

We want to find a probability measure that realizes the given covariances and that also maximizes the entropy of the system. In other words, we want to solve the following optimization problem:

Maximize the entropy:

\begin{matrix} S (P) : = - \sum_{σ} P (σ) ln P (σ) \end{matrix}

over

\begin{matrix} P : Ω_{n} \to [0, \infty) \end{matrix}

subject to:

\begin{matrix} \sum_{σ} σ_{h} σ_{k} P (σ) & = c_{h k} \\ \sum_{σ} P (σ) & = 1 \end{matrix}

Consider the

Lagrangianfunction

:

\begin{matrix} L (P) = S (P) + \sum_{h, k} λ_{h k} (c_{h k} - \sum_{σ} σ_{h} σ_{k} P (σ)) + μ (\sum_{σ} P (σ) - 1) \end{matrix}

Notice that

L (P)

coincides with

S (P)

on the set of P satisfying the constraints:

\begin{matrix} M = \{P (σ) | P (σ) \geq 0, \sum_{σ} P (σ) = 1, \sum_{σ} σ_{h} σ_{k} P (σ) = c_{h k}\} \end{matrix}

Here,

μ \in R

and the

n \times n

matrix

Λ = (λ_{h k})

are the

Lagrange multipliers

.

L (P)

is a strictly concave function of P on the convex cone,

P

, of positive measures on

Ω_{n}

. Thus, if we can find an internal point,

P^{*} \in P

, such that:

L^{'} (P^{*}; δ P) = lim_{ϵ \to 0} \frac{L (P^{*} + ϵ δ P) - L (P^{*})}{ϵ} = 0

for all

δ P : Ω_{n} \to R

, then, necessarily,

P^{*}

is the unique maximum point for

L

over

P

. Since:

\begin{matrix} L^{'} (P^{*}; δ P) = \sum_{σ} [- log P^{*} (σ) - \sum_{h, k} λ_{h k} σ_{h} σ_{k} + μ - 1] δ P (σ) \end{matrix}

we get the optimality condition:

- log P^{*} (σ) - \sum_{h, k} λ_{h k} σ_{h} σ_{k} + μ - 1 \equiv 0

Namely,

P^{*}

has the form:

\begin{matrix} P^{*} (σ) = \frac{1}{Z} exp \{- \sum_{h, k} λ_{h k} σ_{h} σ_{k}\}, σ \in Ω_{n} \end{matrix}

(3)

where

Z = exp \{1 - μ\}

. Such a

P^{*}

is, in fact, an internal point of

P

. Note that any probability of this form is such that

P^{*} (σ) = P^{*} (- σ)

. In particular, this implies that each spin has a mean of zero with respect to

P^{*}

.

Also note that this last formula simply specifies a class of probability measures on

Ω_{n}

, parametrized by the matrix of Lagrange multipliers,

Λ = (λ_{i j})

. It remains to establish whether the given correlations are realized by any such probability and, if so, to determine the corresponding values of the multipliers. To this aim, we consider the so-called dual functional. Let us denote by

P_{Λ}^{*}

the probability in Equation (3). Then, the dual functional,

J

, which is a real valued function of the Lagrange multipliers, is defined by:

J (Λ) : = L (P_{Λ}^{*})

(4)

Observing that:

Λ (P_{Λ}^{*}) = \sum_{h, k} λ_{h k} c_{h k} - \sum_{σ} \sum_{h, k} λ_{h k} σ_{h} σ_{k} P_{Λ}^{*} (σ)

and:

\begin{matrix} S (P_{Λ}^{*}) & = log Z + \sum_{σ} \sum_{h, k} λ_{h k} σ_{h} σ_{k} P_{Λ}^{*} (σ) \\ = log [\sum_{σ} exp \{- \sum_{h, k} λ_{h k} σ_{h} σ_{k}\}] + \sum_{σ} \sum_{h, k} λ_{h k} σ_{h} σ_{k} P_{Λ}^{*} (σ) \end{matrix}

we obtain the convex function:

J (Λ) = \sum_{h, k} λ_{h k} c_{h k} + log [\sum_{σ} exp \{- \sum_{h, k} λ_{h k} σ_{h} σ_{k}\}]

(5)

If we denote by

\nabla J

the gradient of

J

with respect to the variables,

λ_{i j}

, it is immediately seen that the following statements are equivalent:

Λ is a critical point for $J$ , i.e., $\nabla J (Λ) = 0$ ;
$P_{Λ}^{*}$ realizes the assigned correlations, i.e.,:

$\sum_{σ} \sum_{h, k} λ_{h k} σ_{h} σ_{k} P_{Λ}^{*} (σ) = c_{h k}$

for every $h, k$ .

A critical point exists if

J

is proper, which means:

\begin{matrix} lim_{∥ Λ ∥ \to \infty} J (Λ) = + \infty \end{matrix}

It is also clear that the following set of inequalities ensures the properness of

J

:

\begin{matrix} \sum_{i, j} c_{i j} λ_{i j} > min \{\sum_{i, j} λ_{i j} σ_{i} σ_{j} : σ \in Ω\}, for every Λ \end{matrix}

Let us denote by

M (Λ)

, the minimum given by:

min \{\sum_{i, j} λ_{i j} σ_{i} σ_{j} : σ \in Ω\}

. We denote by

Δ_{n}

the set of matrices defined by:

\begin{matrix} Δ_{n} = \{C = (c_{i j}) : \sum_{i, j} c_{i j} λ_{i j} \geq M (Λ) for every Λ\} \end{matrix}

(6)

We can now state the main result of this section.

Theorem 2

Let

Δ_{n}

be as defined in Equation (6). Then:

Δ_{n} = {Cov}_{n}

Proof.

We first show that

Δ_{n} \subseteq {Cov}_{n}

. Since

{Cov}_{n}

is closed, it is enough to show that

\overset{\circ}{Δ_{n}} \subseteq {Cov}_{n}

, where

\overset{\circ}{Δ_{n}}

denotes the interior of

Δ_{n}

. We know that

\overset{\circ}{Δ_{n}} = \{C = (c_{i j}) : \sum_{i, j} c_{i j} λ_{i j} > M (Λ) for every Λ\}

.

Thus, for

C \in \overset{\circ}{Δ_{n}}

, the dual functional,

J (Λ)

, is proper. This implies feasibility. Thus, there exists a probability, P, that realizes C as a correlation matrix of spin variables. Hence,

C \in {Cov}_{n}

.

Now, to show

{Cov}_{n} \subseteq Δ_{n}

, let

C = (c_{i j}) \in {Cov}_{n}

. Then, for every Λ, we have:

\begin{matrix} \sum_{i, j} λ_{i j} c_{i j}^{T} & = E (\sum_{i, j} λ_{i j} σ_{i} σ_{j}) \\ \geq min_{σ} \sum_{i, j} λ_{i j} σ_{i} σ_{j} \\ = M (Λ) \end{matrix}

This implies

C \in Δ_{n}

. As pointed out by one reviewer, Theorem 4.1 can also be proven by contradiction using the hyperplane separation theorem, using the knowledge of the extremal points of the polytope. Our proof, however, does not rely on this knowledge and holds, with minimal modifications, for random variables taking values in general subsets of

R

. The theorem above provides a (non-minimal) dual description of the polytope of spin-correlation matrices.

Its main consequence is that it guarantees that whenever C is in the interior of

{Cov}_{n}

, then it can be realized by some probability of the form Equation (3), for a Λ, which minimizes the dual functional

J

. The search of a probability that realizes a given correlation matrix is therefore reduced to finding the minimum of a function that, as we will see shortly, is convex. Before giving some details on this point, we observe that various classes of inequalities are obtained from Equation (6) by a suitable choice of Λ.

Positivity: let $x \in R^{n}$ and set $λ_{i j} = x_{i} x_{j}$ . Then, for every $σ \in Ω_{n}$ :

$\sum_{i j} λ_{i j} σ_{i} σ_{j} = \sum_{i j} x_{i} x_{j} σ_{i} σ_{j} = \frac{1}{2} {[\sum_{i} x_{i} σ_{i}]}^{2} \geq 0$

So, $M (Λ) \geq 0$ . Thus, for $C \in Δ_{n}$

$\sum_{i j} x_{i} x_{j} c_{i j} \geq 0$

which implies positivity.
Bell’s inequalities: let $A \subset {1, 2, \dots, n}$ with $| A | = 3$ and $ε \in {- 1, 1}^{A}$ . We set:

$λ_{i j} = \{\begin{matrix} ε_{i} ε_{j} for i, j \in A, i \neq j \\ 0 otherwise \end{matrix}$

Then, for $A = {r, s, t}$ :

$\begin{matrix} \frac{1}{2} \sum_{i j} λ_{i j} σ_{i} σ_{j} & = ε_{r} ε_{s} σ_{r} σ_{s} + ε_{r} ε_{t} σ_{r} σ_{t} + ε_{s} ε_{t} σ_{s} σ_{t} \\ = η_{r} η_{s} + η_{r} η_{t} + η_{s} η_{t} \geq - 1 \end{matrix}$

where $η_{i} = ε_{i} σ_{i}$ . So, we have $M (Λ) \geq - 1$ . Thus, for $C \in Δ_{n}$

$ε_{r} ε_{s} c_{r s} + ε_{r} ε_{t} c_{r t} + ε_{s} ε_{t} c_{s t} \geq - 1$

which are Bell’s inequalities.
Generalizations of Bell’s inequalities: Let us consider $T \subset {1, 2, \dots, n}$ , such that $| T |$ is odd. Then, let $ε \in {- 1, 1}^{T}$ . We set:

$λ_{i j} = \{\begin{matrix} ε_{i} ε_{j} for i, j \in T, i \neq j \\ 0 otherwise \end{matrix}$

we have:

$\begin{matrix} \sum_{i, j} λ_{i j} σ_{i} σ_{j} = {(\sum_{i \in T} ε_{i} σ_{i})}^{2} - | T | \end{matrix}$

since, $| T |$ is odd, we have:

$min_{σ} {(\sum_{i \in T} ε_{i} σ_{i})}^{2} = 1$

Thus, $M (Λ) = 1 - | T |$ . As a result, we obtain the inequality

$\begin{matrix} | T | - 1 + \sum_{i \neq j} ε_{i} ε_{j} c_{i j} \geq 0 \end{matrix}$

(7)

We call these the generalized Bell’s inequalities. These, as kindly pointed out by one anonymous reviewer, are special instances of the hypermetric inequalities introduced by M. Deza in the 1960s, (Section 6.1 in [41]). They reduce to Bell’s inequalities for $| T | = 3$ . We have seen in the previous section that these inequalities, for $| T | = 3$ and $| T | = 5$ , give the facets of the polytope of spin-correlation matrices for $n = 5$ .
Many other variants of the Bell’s inequalities could be obtained with other choices of the $λ_{i j}$ . For instance, we can generalize to all even dimensions the inequalities of type (three) for the case $n = 6$ . Let $n \geq 6$ be even, and consider $T \subset {1, 2, \dots, n}$ , such that $| T | = n - 1$ . Then, choose:

$λ_{i j} = \{\begin{matrix} ε_{i} ε_{j} for i, j \in T, i \neq j \\ 2 ε_{i} ε_{j} for i \in T, j \in T^{c} o r i \in T^{c}, j \in T \\ 0 otherwise \end{matrix}$

We obtain:

$\begin{matrix} \sum_{i, j} λ_{i j} σ_{i} σ_{j} = {(\sum_{i \in T} ε_{i} σ_{i})}^{2} - | T | + 4 ε_{j_{T}} (\sum_{i \in T} ε_{i} σ_{i}) \end{matrix}$

where $j_{T}$ is the only element of $T^{c}$ . It is easy to check that the expression:

${(\sum_{i \in T} ε_{i} σ_{i})}^{2} + 4 ε_{j_{T}} (\sum_{i \in T} ε_{i} σ_{i})$

as a function of $σ \in Ω_{n}$ , attains its minimum at $(\sum_{i \in T} ε_{i} σ_{i}) = k ε_{j_{T}}$ with k equal to three or five, and the minimum is $- 3$ , which gives $M (Λ) = - (| T | + 3) = - (n + 2)$ , and the family of inequalities:

$n + 2 + \sum_{i \neq j; i, j \in T} ε_{i} ε_{j} c_{i j} + 4 \sum_{i \in T} ε_{i} ε_{j_{T}} c_{i j_{T}} \geq 0$

These, for $n = 6$ , reduce to the inequality of type (three).

Remark 1

It is important to note that nowhere in the process of obtaining the maximum entropy measure have we assumed that we are given all the

c_{i j}^{'} s

. Suppose we are only given a partial matrix. Then,

Δ_{n}

can be interpreted as the set of conditions under which the given partial matrix can be extended to a spin correlation matrix. Once we have feasibility, we know that the maximum entropy measure,

P^{*}

, exists and can be used to complete the given matrix to a spin correlation matrix.

4.2. Finding the Minimum of the Dual Functional

We have observed that the maximum entropy method allows us to reduce the problem of realizing a given spin correlation matrix to finding the minimum of the function,

J

, defined in Equation (5). In this section, we show that this minimum can be obtained by an explicit gradient descent algorithm. Note first that

J

has some obvious symmetry properties:

J (Λ) = J (Λ^{'})

if

λ_{i j} = λ_{i j}^{'}

for all

i \neq j

, and

J

is indifferent to symmetrization:

J (Λ) = J (\frac{Λ + Λ^{T}}{2})

where

A^{T}

is the transposition of the matrix, A. It is, therefore, enough to deal with the minimization problem within the set of symmetric matrices with zero diagonal elements. These matrices can be identified with elements of

R^{I}

, where:

I : = {(i, j) : 1 \leq i < j \leq n}

In what follows, we use the usual vector notation for elements of

R^{I}

: for

v, w \in R^{I}

,

v^{T}

denotes its transposition,

v^{T} w

is the scalar product in

R^{I}

and

v w^{T}

is an element of

R^{I \times I}

.

Proposition 1

Consider the discrete time dynamical system in

R^{I}

, defined by:

λ (t + 1) = λ (t) - \frac{1}{K} \nabla J (λ (t))

(8)

For every

K > \frac{n (n - 1)}{2}

, this system has a unique fixed point,

λ^{*}

, which is a global attractor, and it is the unique minimum of

J

.

Proof.

Let

G : {- 1, 1}^{n} \to R^{I}

be defined by:

G_{i j} (σ) : = σ_{i} σ_{j}

for

(i, j) \in I

. Moreover,

C \in C o v_{n}

are also obviously identified with elements of

R^{I}

. In particular, in what follows,

C^{T}

denotes the transposition of C as a vector in

R^{I}

, rather than a matrix in

M_{n} (R

. With these notations,

J

can be rewritten as:

J (λ) = C^{T} λ + log (\sum_{σ} e^{G^{T} (σ) λ})

By elementary computations, we can compute the gradient,

\nabla J

, and the Hessian,

\nabla^{2} J

:

\nabla J (λ) = C^{T} - \frac{\sum_{σ} G^{T} e^{G^{T} (σ) λ}}{\sum_{σ} e^{G^{T} (σ) λ}}

\nabla^{2} J (λ) = \frac{\sum_{σ} G G^{T} e^{G^{T} (σ) λ}}{\sum_{σ} e^{G^{T} (σ) λ}} - \frac{\sum_{σ} G e^{G^{T} (σ) λ}}{\sum_{σ} e^{G^{T} (σ) λ}} \frac{\sum_{σ} G^{T} e^{G^{T} (σ) λ}}{\sum_{σ} e^{G^{T} (σ) λ}}

It follows, in particular, that

\nabla^{2} J (λ)

is the covariance matrix of the vector, G, with respect to the probability:

P_{λ}^{*} (σ) = \frac{e^{G^{T} (σ) λ}}{\sum_{σ^{'}} e^{G^{T} (σ^{'}) λ}}

(9)

and it is, therefore, nonnegative. Thus,

J

is convex. In the next steps, we establish more detailed properties of

J

, including its strict convexity.

Step 1: the elements of G are linearly independent functions. Suppose

α_{0}, α_{i j}

for

(i, j) \in I

, such that:

α_{0} + \sum_{1 \leq i < j \leq n} α_{i j} σ_{i} σ_{j} = 0

(10)

for every

σ \in {- 1, 1}^{n}

. We show that:

α_{0} = α_{i j} = 0

(11)

for every

(i, j) \in I

. We proceed by induction on n. There is nothing to prove for

n = 1

. We can write, assuming Equation (10):

σ_{1} \sum_{j = 2}^{n} α_{1 j} σ_{j} + (α_{0} + \sum_{2 \leq i < j \leq n} α_{i j} σ_{i} σ_{j}) = 0

(12)

Since the second summand in Equation (12) does not contain

σ_{1}

, this implies:

\sum_{j = 2}^{n} α_{1 j} σ_{j} \equiv 0

(13)

and:

α_{0} \sum_{2 \leq i < j \leq n} α_{i j} σ_{i} σ_{j} \equiv 0

(14)

Identity in Equation (13) implies

a_{1 j} = 0

for

j = 2, \dots, n

, as can be shown, for instance, again by induction on n. Identity in Equation (14) implies

α_{0} = α_{i j} = 0

for

2 \leq i < j \leq n

by the inductive assumption.

Step 2: for every

λ \in R^{I}

,

\nabla^{2} J (λ)

is strictly positive definite. Denote by

V (G)

the covariance matrix of G with respect to the probability in Equation (9). We have:

\nabla^{2} J (λ) = V (G G^{T})

Thus, for

b \in R^{I}

:

b^{T} \nabla^{2} J (λ) b = 0 \Rightarrow V (b^{T} G) = 0

that, since

P_{λ}^{*}

is fully supported, gives:

b^{T} G (σ) = constant

for every

σ \in {- 1, 1}^{n}

. By Step 1, this implies that

b = 0

, which proves the claim.

Step 3:

λ \in R^{I}

, the largest eigenvalue of

\nabla^{2} J (λ)

is less than or equal to

\frac{n (n - 1)}{2}

. Let δ denote this largest eigenvalue and

v \in R^{I}

, a corresponding eigenvector with

v^{T} v = 1

. We have:

\begin{matrix} δ & = v^{T} \nabla^{2} J (λ) v = V (v^{T} G) \\ \leq sup_{σ} {[v^{T} G (σ)]}^{2} \leq sup_{σ} G^{T} (σ) G (σ) = | I | = \frac{n (n - 1)}{2} \end{matrix}

Step 4: For

K > \frac{n (n - 1)}{2}

, the map,

λ \mapsto λ - \frac{1}{K} \nabla J (λ)

, is a strict contraction, and therefore, it has a unique fixed point. Let:

ϕ (λ) : = λ \mapsto λ - \frac{1}{K} \nabla J (λ)

We have:

ϕ (λ) - ϕ (μ) = λ - μ - \frac{1}{K} \nabla^{2} J (ξ) (λ - μ)

for some ξ in the segment joining λ and μ. Thus, setting

{∥ v ∥}^{2} = v^{T} v

:

{∥ϕ (λ) - ϕ (μ)∥}^{2} = {∥[I - \frac{1}{K} \nabla^{2} J (ξ)] (λ - μ)∥}^{2}

The conclusion now follows from the fact that, by Steps 2 and 3, the symmetric matrix,

I - \frac{1}{K} \nabla^{2} J (ξ)

, has all eigenvalues in

(0, 1)

.

Step 5: Conclusion. By Step 4, the system, Equation (8), has a unique fixed point, λ, for which, necessarily,

\nabla J (λ) = 0

.

It should be observed that the computation of

J (λ)

and of its gradient involves computing a sum over

σ \in {- 1, 1}^{n}

. This may be hard or even practically unfeasible for large n; this difficulty may be made less severe by the use of Monte Carlo methods. This and other computational aspects of this algorithm will be discussed in a forthcoming paper.

Acknowledgments

We are grateful to an anonymous reviewer for pointing out to us some literature in discrete geometry that is relevant for the present paper. The work of M. Pavon was partially supported by the QuantumFuture research grant of the University of Padova and by an Alexander von Humboldt Foundation fellowship at the Institut für Angewandte Mathematik, Universität Heidelberg, Germany.

Conflict of Interest

The authors declare no conflict of interest.

References

Stieltjes, T.J. Recherches sur les fractions continues. (in French). Annales de la Facult des Sciences de Toulouse 1894, 8, J1–J122. [Google Scholar] [CrossRef]
Akhiezer, N.I. The Classical Moment Problem and Some Related Questions in Analysis; (Translated from Russian by N. Kemmer); Hafner Publishing Co.: New York, NY, USA, 1965. [Google Scholar]
Krein, M.G.; Nudelman, A.A. The Markov Moment Problem and Extremal Problems; American Mathematical Society: Providence, RI, USA, 1977. [Google Scholar]
Grenander, U.; Szegö, G. Toeplitz Forms and Their Applications; University of California Press: Berkeley, CA, USA, 1958. [Google Scholar]
Sarason, D. Generalized interpolation in H^∞. Trans. Am. Math. Soc. 1967, 127, 179–203. [Google Scholar]
Arov, D.Z.; Dym, H. On three Krein extension problems and some generalizations. Int. Equat. Oper. Theory 1998, 31, 1–91. [Google Scholar] [CrossRef]
Blomqvist, A.; Lindquist, A.; Nagamune, R. Matrix-valued Nevanlinna-Pick interpolation with complexity constraint: An optimization approach. IEEE Trans. Autom. Control 2003, 48, 2172–2190. [Google Scholar] [CrossRef]
Georgiou, T. The interpolation problem with a degree constraint. IEEE Trans. Autom. Control 1999, 44, 631–635. [Google Scholar] [CrossRef]
Georgiou, T. Realization of power spectra from partial covariance sequences. IEEE Trans. Acoust. Speech Signal Process. 1987, 35, 438–449. [Google Scholar] [CrossRef]
Byrnes, C.; Lindquist, A.; Gusev, S.; Matveev, A.S. A complete parameterization of all positive rational extensions of a covariance sequence. IEEE Trans. Autom. Control 1995, 40, 1841–1857. [Google Scholar] [CrossRef]
Byrnes, C.; Lindquist, A. On the partial stochastic realization problem. IEEE Trans. Autom Control 1997, 42, 1049–1070. [Google Scholar] [CrossRef]
Byrnes, C.I.; Gusev, S.; Lindquist, A. A convex optimization approach to the rational covariance extension problem. SIAM J. Control Optim. 1999, 37, 211–229. [Google Scholar] [CrossRef]
Byrnes, C.I.; Georgiou, T.; Lindquist, A. A generalized entropy criterion for Nevanlinna-Pick interpolation with degree constraint: A convex optimization approach to certain problems in systems and control. IEEE Trans. Autom. Control 2001, 46, 822–839. [Google Scholar] [CrossRef]
Georgiou, T. Spectral analysis based on the state covariance: The maximum entropy spectrum and linear fractional parameterization. IEEE Trans. Autom. Control 2002, 47, 1811–1823. [Google Scholar] [CrossRef]
Georgiou, T.; Lindquist, A. Kullback-Leibler approximation of spectral density functions. IEEE Trans. Inf. Theory 2003, 49, 2910–2917. [Google Scholar] [CrossRef]
Georgiou, T. The structure of state covariances and its relation to the power spectrum of the input. IEEE Trans. Autom. Control 2002, 47, 1056–1066. [Google Scholar] [CrossRef]
Georgiou, T. Solution of the general moment problem via a one-parameter imbedding. IEEE Trans. Autom. Control 2005, 50, 811–826. [Google Scholar] [CrossRef]
Byrnes, C.I.; Linquist, A. Important moments in systems and control. SIAM J. Control Opt. 2008, 47, 2458–2469. [Google Scholar] [CrossRef]
Ferrante, A.; Pavon, M.; Ramponi, F. Hellinger vs. Kullback-Leibler multivariable spectrum approximation. IEEE Trans. Autom. Control 2008, 53, 954–967. [Google Scholar] [CrossRef]
Ramponi, F.; Ferrante, A.; Pavon, M. A globally convergent matricial algorithm for multivariate spectral estimation. IEEE Trans. Autom. Control 2009, 54, 2376–2388. [Google Scholar] [CrossRef]
Ferrante, A.; Pavon, M.; Zorzi, M. A maximum entropy enhancement for a family of high-resolution spectral estimators. IEEE Trans. Autom. Control 2012, 57, 318–329. [Google Scholar] [CrossRef]
Ferrante, A.; Masiero, C.; Pavon, M. Time and spectral domain relative entropy: A new approach to multivariate spectral estimation. IEEE Trans. Autom. Control 2012, 57, 2561–2575. [Google Scholar] [CrossRef]
Byrnes, C.I.; Enqvist, P.; Linquist, A. Identifiability and well-posedness of shaping-filter parameterizations: A global analysis approach. SIAM J. Control Optim. 2002, 41, 23–59. [Google Scholar] [CrossRef]
Georgiou, T.; Lindquist, A. A convex optimization approach to ARMA modeling. IEEE Trans. Autom. Control 2008, AC-53, 1108–1119. [Google Scholar] [CrossRef]
Enqvist, P.; Karlsson, J. Minimal Itakura-Saito Distance and Covariance Interpolation. In Proceedings of the 47th IEEE Conference on Decision and Control (CDC 2008), Cancun, Mexico, 9–11 December 2008; pp. 137–142.
Byrnes, C.I.; Georgiou, T.; Lindquist, A.; Megretski, A. Generalized interpolation in H-infinity with a complexity constraint. Trans. Am. Math. Soc. 2006, 358, 965–987. [Google Scholar] [CrossRef]
Georgiou, T.; Lindquist, A. Remarks on control design with degree constraint. IEEE Trans. Autom. Control 2006, AC-51, 1150–1156. [Google Scholar] [CrossRef]
Nasiri Amini, A.; Ebbini, E.; Georgiou, T. Noninvasive estimation of tissue temperature via high-resolution spectral analysis techniques. IEEE Trans. Biomed. Eng. 2005, 52, 221–228. [Google Scholar] [CrossRef] [PubMed]
Dempster, A.P. Covariance selection. Biometric 1972, 28, 157–175. [Google Scholar] [CrossRef]
Burg, J.; Luenberger, D.; Wenger, D. Estimation of structured covariance matrices. Proc. IEEE 1982, 70, 963–974. [Google Scholar] [CrossRef]
Dembo, A.; Mallows, C.; Shepp, L.A. Embedding nonnegative definite Toeplitz matrices in nonnegative definite circulant matrices, with application to covariance estimation. IEEE Trans. Inf. Theory 1989, 35, 1206–1212. [Google Scholar] [CrossRef]
Barrett, W.; Johnson, C.; Loewy, R. The real positive definite completion problem: Cycle completability. Mem. Am. Math. Soc. 1996, 122, 1–68. [Google Scholar] [CrossRef]
Laurent, M. The real positive semidefinite completion problem for series-parallel graphs. Linear Algebra Appl. 1997, 252, 347–366. [Google Scholar] [CrossRef]
Glunt, W.; Hayden, T.; Johnson, C.; Tarazaga, P. Positive definite completions and determinant maximization. Linear Algebra Appl. 1999, 288, 1–10. [Google Scholar] [CrossRef]
Laurent, M. Polynomial instances of the positive semidefinite and Euclidean distance matrix completion problems. SIAM J. Matrix Anal. Appl. 2001, 22, 874–894. [Google Scholar] [CrossRef]
Ferrante, A.; Pavon, M. Matrix completion à la Dempster by the principle of parsimony. IEEE Trans. Inf. Theory 2011, 57, 3925–3931. [Google Scholar] [CrossRef]
Carli, F.; Ferrante, A.; Pavon, M.; Picci, G. A maximum entropy solution of the covariance extension problem for reciprocal processes. IEEE Trans. Autom. Control 2011, 56, 1999–2012. [Google Scholar] [CrossRef]
Parthasarathy, K.R. On extremal correlations. J. Stat. Plan. Inference 2002, 102, 282–285. [Google Scholar] [CrossRef]
Gupta, J.C. Characterisation of correlation matrices of spin variables. Sankhya Indian J. Stat. 1999, 61, 282–285. [Google Scholar]
Pitowsky, I. Correlation polytopes: Their geometry and complexity. Math. Program. 1991, 50, 395–414. [Google Scholar] [CrossRef]
Deza, M.M.; Laurent, M. Geometry of Cuts and Metrics; Springer: Berlin, Germany, 1997. [Google Scholar]
Balasubramanian, K.; Gupta, J.C.; Parthasarathy, K.R. Remarks on Bell’s inequalities for spin correlations. Sankhya Indian J. Stat. 1998, 60, 29–35. [Google Scholar]
Werner, R.F.; Wolf, M.M. Bell inequalities and entanglement. Quant. Inf. Comput. 2001, 1, 1–25. [Google Scholar]
Kuna, T.; Lebowitz, J.L.; Speer, E.R. Necessary and sufficient conditions for realizability of point processes. Ann. Appl. Probab. 2011, 11, 1253–1281. [Google Scholar] [CrossRef]
DaiPra, P.; Pavon, M.; Sahasrabudhe, N. A note on the geometric interpretation of Bell’s inequalities. Lett. Math. Phys. 2013. Preprint:arXiv:1301.4823v3. To appear. [Google Scholar] [CrossRef]
McRae, W.B.; Davidson, E.R. Linear inequalities for density matrices II. J. Math. Physics 1972, 13, 1527–1538. [Google Scholar] [CrossRef]
Bremner, D.; Fukuda, K.; Marzetta, A. Primal-dual methods for vertex and facet enumeration. Discret. Comput. Geom. 1998, 20, 333–357. [Google Scholar] [CrossRef]
Fukuda, K.; Prodon, A. Double Description Method Revisited. In Combinatorics and Computer Science (Brest, 1995); Springer: Berlin, Germany, 1996; Volume 1120, Lecture Notes in Computer Science ; pp. 91–111. [Google Scholar]
Jaynes, E.T. Information theory and statistical mechanics. Phys. Rev. Ser. II 1957, 106, 620–630. [Google Scholar] [CrossRef]
Jaynes, E.T. Information theory and statistical mechanics II. Phys. Rev. Ser. II 1957, 108, 171–190. [Google Scholar] [CrossRef]
Jaynes, E.T. On the rationale of maximum-entropy methods. Proc. IEEE 1982, 70, 939–952. [Google Scholar] [CrossRef]
Csiszár, I. Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems. Ann. Stat. 1991, 19, 2032–2066. [Google Scholar] [CrossRef]

© 2013 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

Share and Cite

MDPI and ACS Style

Dai Pra, P.; Pavon, M.; Sahasrabudhe, N. A Maximum Entropy Approach to the Realizability of Spin Correlation Matrices. Entropy 2013, 15, 2448-2463. https://doi.org/10.3390/e15062448

AMA Style

Dai Pra P, Pavon M, Sahasrabudhe N. A Maximum Entropy Approach to the Realizability of Spin Correlation Matrices. Entropy. 2013; 15(6):2448-2463. https://doi.org/10.3390/e15062448

Chicago/Turabian Style

Dai Pra, Paolo, Michele Pavon, and Neeraja Sahasrabudhe. 2013. "A Maximum Entropy Approach to the Realizability of Spin Correlation Matrices" Entropy 15, no. 6: 2448-2463. https://doi.org/10.3390/e15062448

APA Style

Dai Pra, P., Pavon, M., & Sahasrabudhe, N. (2013). A Maximum Entropy Approach to the Realizability of Spin Correlation Matrices. Entropy, 15(6), 2448-2463. https://doi.org/10.3390/e15062448

Article Menu

A Maximum Entropy Approach to the Realizability of Spin Correlation Matrices

Abstract

1. Introduction

2. Spin Systems and Spin Correlation Matrices

3. The Dual Representation for $n \leq 6$

3.1. Cases $n = 3, 4$

3.2. Case $n = 5$

3.3. Case $n = 6$

4. Maximum Entropy Measure for Spin Systems

4.1. Maximum Entropy Method

4.2. Finding the Minimum of the Dual Functional

Acknowledgments

Conflict of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

A Maximum Entropy Approach to the Realizability of Spin Correlation Matrices

Abstract

1. Introduction

2. Spin Systems and Spin Correlation Matrices

3. The Dual Representation for n ≤ 6

3.1. Cases n = 3 , 4

3.2. Case n = 5

3.3. Case n = 6

4. Maximum Entropy Measure for Spin Systems

4.1. Maximum Entropy Method

4.2. Finding the Minimum of the Dual Functional

Acknowledgments

Conflict of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3. The Dual Representation for $n \leq 6$

3.1. Cases $n = 3, 4$

3.2. Case $n = 5$

3.3. Case $n = 6$