Minimax Estimation of Quantum States Based on the Latent Information Priors

Koyama, Takayuki; Matsuda, Takeru; Komaki, Fumiyasu

doi:10.3390/e19110618

Open AccessArticle

Minimax Estimation of Quantum States Based on the Latent Information Priors

by

Takayuki Koyama

¹,

Takeru Matsuda

²

and

Fumiyasu Komaki

^2,3,*

¹

FANUC Corporation, 3580 Furubaba Shibokusa Oshino-mura, Yamanashi 401-0597, Japan

²

Department of Mathematical Informatics, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-8656, Japan

³

RIKEN Brain Science Institute, 2-1 Hirosawa, Wako-shi, Saitama 351-0198, Japan

^*

Author to whom correspondence should be addressed.

Entropy 2017, 19(11), 618; https://doi.org/10.3390/e19110618

Submission received: 13 September 2017 / Revised: 12 November 2017 / Accepted: 13 November 2017 / Published: 16 November 2017

(This article belongs to the Special Issue Transfer Entropy II)

Download

Browse Figure

Versions Notes

Abstract

:

We develop priors for Bayes estimation of quantum states that provide minimax state estimation. The relative entropy from the true density operator to a predictive density operator is adopted as a loss function. The proposed prior maximizes the conditional Holevo mutual information, and it is a quantum version of the latent information prior in classical statistics. For one qubit system, we provide a class of measurements that is optimal from the viewpoint of minimax state estimation.

Keywords:

Bayes; conditional Holevo mutual information; latent information prior; predictive density operator; quantum estimation

1. Introduction

In quantum mechanics, the outcome of a measurement is subject to a probability distribution determined from the quantum state of the measured system and the measurement performed. The task of estimating the quantum state from the outcome of measurement is called the quantum estimation and it is a fundamental problem in quantum statistics [1,2,3]. Tanaka and Komaki [4] and Tanaka [5] discussed quantum estimation using the framework of statistical decision theory and showed that Bayesian methods provide better estimation than the maximum likelihood method. In Bayesian methods, we need to specify a prior distribution on the unknown parameters of the quantum states. However, the problem of prior selection has not been fully discussed for quantum estimation [6].

The quantum state estimation problem is related to the predictive density estimation problem in classical statistics [7]. This is a problem of predicting the distribution of an unobserved variable y based on an observed variable x. Suppose

(x, y) \sim p (x, y ∣ θ),

where

θ

denotes an unknown parameter. Based on the observed x, we predict the distribution

p (y ∣ x, θ)

of y using a predictive density

\hat{p} (y ∣ x)

. The plug-in predictive density is defined as

{\hat{p}}_{plug - in} (y ∣ x) = p (y ∣ x, \hat{θ} (x))

, where

\hat{θ} (x)

is some estimate of

θ

from x. The Bayesian predictive density with respect to a prior distribution

d π (θ)

is defined as

{\hat{p}}_{π} (y ∣ x) = \int p (y ∣ x, θ) d π (θ ∣ x) = \frac{\int p (y ∣ x, θ) p (x ∣ θ) d π (θ)}{\int p (x ∣ θ) d π (θ)},

(1)

where

d π (θ ∣ x)

is the posterior distribution. We compare predictive densities using the framework of statistical decision theory. Specifically, a loss function

L (q, p)

is introduced that evaluates the difference between the true density q and the predictive density p. Then, the risk function

R (θ, \hat{p})

is defined as the average loss when the true value of the parameter is

θ

:

R (θ, \hat{p}) = \int L (p (y ∣ x, θ), \hat{p} (y ∣ x)) p (x ∣ θ) d x .

A predictive density

{\hat{p}}_{*}

is called minimax if it minimizes the maximum risk among all predictive densities:

\max_{θ} R (θ, {\hat{p}}_{*}) = \min_{\hat{p}} \max_{θ} R (θ, \hat{p}) .

(2)

We adopt the Kullback–Leibler divergence

L (q, p) = \int q (x) \log \frac{q (x)}{p (x)} d x

(3)

as a loss function, since it satisfies many desirable properties compared to other loss functions such as the Hellinger distance and the total variation distance [8]. Under this setting, Aitchison [9] proved

R (π, {\hat{p}}_{π}) = \min_{\hat{p}} R (π, \hat{p}),

(4)

where

R (π, \hat{p}) = \int R (θ, \hat{p}) π (θ) d θ

is called the Bayes risk. Namely, the Bayesian predictive density

{\hat{p}}_{π} (y ∣ x)

minimizes the Bayes risk. We provide the proof of Equation (4) in the Appendix A. Therefore, it is sufficient to consider only Bayesian predictive densities from the viewpoint of Kullback–Leibler risk, and the selection of the prior

π

becomes important.

For the predictive density estimation problem above, Komaki [10] developed a class of priors called the latent information priors. The latent information prior

π_{LIP}

is defined as a prior that maximizes the conditional mutual information

I_{θ, y ∣ x} (π)

between the parameter

θ

and the unobserved variable y given the observed variable x. Namely,

\begin{matrix} I_{θ, y ∣ x} (π_{LIP}) = \max_{π} I_{θ, y ∣ x} (π), \end{matrix}

where

\begin{matrix} I_{θ, y ∣ x} (π) & = \int \sum_{x, y} p (x, y ∣ θ) \log p (x, y ∣ θ) d π (θ) - \sum_{x, y} p_{π} (x, y) \log p_{π} (x, y) \\ - \int \sum_{x} p (x ∣ θ) \log p (x ∣ θ) d π (θ) + \sum_{x} p_{π} (x) \log p_{π} (x) \end{matrix}

(5)

is the conditional mutual information between y and

θ

given x. Here,

\begin{matrix} p_{π} (x, y) = \int p (x, y ∣ θ) d π (θ), p_{π} (x) = \int p (x ∣ θ) d π (θ) \end{matrix}

(6)

are marginal densities. The Bayesian predictive densities based on the latent information priors are minimax under the Kullback–Leibler risk:

\max_{θ} R (θ, {\hat{p}}_{π}) = \min_{\hat{p}} \max_{θ} R (θ, \hat{p}) .

The latent information prior is a generalization of the reference prior [11] that is a prior maximizing the unconditional mutual information

I_{θ, y} (π)

between

θ

and y.

Now, we consider the problem of estimating the quantum state of a system

Y

based on the outcome of a measurement on a system

X

. Suppose the quantum state of the composed system

(X, Y)

be

σ_{θ}^{X Y}

where

θ

denotes an unknown parameter. We perform a measurement on the system

X

and obtain the outcome x. Based on the measurement outcome x, we estimate the state of the system

Y

by a predictive density operator

ρ (x)

. Similarly to the Bayesian predictive density (1), the Bayesian predictive density operator with respect to the prior

d π (θ)

is defined as

\begin{matrix} σ_{π}^{Y} (x) = \int σ_{θ, x}^{Y} d π (θ ∣ x) = \frac{\int σ_{θ, x}^{Y} p (x ∣ θ) d π (θ)}{\int p (x ∣ θ) d π (θ)}, \end{matrix}

(7)

where

d π (θ ∣ x)

is the posterior distribution. Like the predictive density estimation problem discussed above, we compare predictive density operators using the framework of statistical decision theory. There are several possibilities for the loss function

L (σ, ρ)

in quantum estimation such as the fidelity and the trace norm [12]. In this paper, we adopt the quantum relative entropy

L (σ, ρ) = Tr σ (\log σ - \log ρ)

(8)

as a loss function, since it is a quantum analogue of the Kullback–Leibler divergence (3). Note that the fidelity and the trace norm correspond to the Hellinger distance and the total variation distance in the classical statistics, respectively. Under this setting, Tanaka and Komaki [4] proved that the Bayesian predictive density operators minimize the Bayes risk:

\int R (θ, σ_{π}^{Y}) d π (θ) = \min_{ρ} \int R (θ, ρ) d π (θ) .

This is a quantum version of Equation (4).

From Tanaka and Komaki [4], the selection of the prior becomes important also in quantum estimation. However, this problem has not been fully discussed [6]. In this paper, we provide a quantum version of the latent information priors and prove that they provide minimax predictive density operators. Whereas the latent information prior in the classical case maximizes the conditional Shannon mutual information, the proposed prior maximizes the conditional Holevo mutual information. The Holevo mutual information, which is a quantum version of the Shannon mutual information, is a fundamental quantity in the classical-quantum communication [13]. Our result shows that the conditional Holevo mutual information also has a natural meaning in terms of quantum estimation.

Unlike the classical statistics, the measurement is not unique in quantum statistics. Therefore, selection of the measurement also becomes important. From the viewpoint of minimax state estimation, measurements that minimize the minimax risk are considered to be optimal. We provide a class of optimal measurements for one qubit system. This class includes the symmetric informationally complete measurement [14,15]. These measurements and latent information priors provide robust quantum estimation.

2. Preliminaries

2.1. Quantum States and Measurements

We briefly summarize several notations of quantum states and measurements. Let

H

be a separable Hilbert space of a quantum system. A Hermitian operator

ρ

on

H

is called a density operator if it satisfies

\begin{matrix} Tr ρ = 1, ρ \geq 0 . \end{matrix}

The state of a quantum system is described by a density operator. We denote the set of all density operators on

H

as

S (H)

.

Denote the set of all linear operators on Hilbert space

H

by

L (H)

and the set of all positive linear operators by

L_{+} (H) \subset L (H)

. Let

Ω

be a measurable space of all possible outcomes of a measurement and

B (Ω)

be a

σ

-algebra of

Ω

. A map

E : B (Ω) \to L_{+} (H)

is called a positive operator-valued measure (POVM) if it satisfies

E (\emptyset) = O, E (Ω) = I

, and

E (\cup_{i} B_{i}) = \sum_{i} E (B_{i}),

where

B_{i} \cap B_{j} = \emptyset, \forall B_{i} \in B (H)

. Any quantum measurement is represented by a POVM on

Ω

. In this paper, we mainly assume

Ω

is finite. In such case, we denote

Ω = X = {1, \dots, N}

and any POVM is represented by a set of positive Hermitian operators

E = {E_{x} ∣ x \in X}

such that

\sum_{x \in X} E_{x} = I

.

The outcome of a measurement E on a quantum system with the state

ρ \in S (H)

is distributed with a probability measure

\begin{matrix} \Pr (B) = Tr E (B) ρ, B \in B (Ω) . \end{matrix}

Let

X

,

Y

be quantum systems with Hilbert spaces

H^{X}

and

H^{Y}

. The Hilbert space of the composed system

(X, Y)

is given by the tensor product

H^{X} \otimes H^{Y}

. Suppose the state of this composed system is

σ^{X Y}

. Then, the states of two subsystems can be yielded by the partial trace:

\begin{matrix} σ^{X} = {Tr}_{Y} σ^{X Y}, σ^{Y} = {Tr}_{X} σ^{X Y} . \end{matrix}

If a measurement

E = {E_{x} ∣ x \in X}

is performed on the system

X

and the measurement outcome is x, then the state of the system

Y

becomes

\begin{matrix} σ_{x}^{Y} = \frac{1}{p_{x}} {Tr}_{X} (E_{x} \otimes I^{Y}) σ^{X Y}, \end{matrix}

where the normalization constant

\begin{matrix} p_{x} = Tr (E_{x} \otimes I^{Y}) σ^{X Y} \end{matrix}

is the probability of the outcome x. Here,

I^{Y}

is the identity operator on the space

H^{Y}

. We call the operator

σ_{x}^{Y}

the conditional density operator.

2.2. Quantum State Estimation

We formulate the quantum state estimation problem using the framework of statistical decision theory. Let

X

and

Y

be quantum systems with finite-dimensional Hilbert spaces

H^{X}

and

H^{Y}

, where

\dim H^{X} = d_{X}

and

\dim H^{Y} = d_{Y}

.

Suppose the state of the composed system

(X, Y)

be

σ_{θ}^{X Y}

, where

θ \in Θ

denotes unknown parameters. We perform a measurement

E = {E_{x} ∣ x \in X}

on

X

, observe the outcome

x \in X

, and estimate the conditional density operator

σ_{θ, x}^{Y}

of

Y

by a predictive density operator

ρ (x)

. As discussed in the introduction (1) and (7), the Bayesian predictive density operator based on a prior

π (θ)

is defined by

\begin{matrix} σ_{π}^{Y} (x) = \int σ_{θ, x}^{Y} d π (θ ∣ x) = \frac{\int σ_{θ, x}^{Y} p (x ∣ θ) d π (θ)}{\int p (x ∣ θ) d π (θ)}, \end{matrix}

where

d π (θ ∣ x)

is the posterior distribution.

To evaluate predictive density operators, we introduce a loss function

L (σ, ρ)

that evaluates the difference between the true conditional density operator

σ

and the predictive density operator

ρ

. In this paper, we adopt the quantum relative entropy (8) since it is a quantum analogue of the Kullback–Leibler divergence (3). Then, the risk function

R (θ, ρ)

of a predictive density operator

ρ

is defined by

\begin{matrix} R (θ, ρ) = \sum_{x \in X} p (x ∣ θ) Tr σ_{θ, x}^{Y} (\log σ_{θ, x}^{Y} - \log ρ (x)), \end{matrix}

where

\begin{matrix} p (x ∣ θ) = Tr (E_{x} \otimes I^{Y}) σ_{θ}^{X Y} = Tr E_{x} σ_{θ}^{X} \end{matrix}

is the probability of the outcome x. Similarly to the classical case (2), a predictive density operator

ρ_{*}

is called minimax if it minimizes the maximum risk among all predictive density operators [16,17]:

\max_{θ} R (θ, ρ_{*}) = \min_{ρ} \max_{θ} R (θ, ρ) .

Tanaka and Komaki [4] showed

\begin{matrix} R (π, σ_{π}^{Y}) = \min_{ρ} R (π, ρ), \end{matrix}

(9)

where

\begin{matrix} R (π, ρ) = \int R (θ, ρ) d π (θ) \end{matrix}

is called the Bayes risk. Namely, the Bayesian predictive density operator minimizes the Bayes risk. This result is a quantum version of Equation (4). Although Tanaka and Komaki [4] considered separable models (

σ_{θ}^{X Y} = σ_{θ}^{X} \otimes σ_{θ}^{Y}

), the relation (9) holds also for non-separable models as shown in the Appendix A. Therefore, it is sufficient to consider only Bayesian predictive density operators and the problem of prior selection becomes crucial.

2.3. Notations

For a quantum state family

{σ_{θ}^{X Y} ∣ θ \in Θ}

, we define another quantum state family

M = {\oplus_{x} p (x ∣ θ) σ_{θ, x}^{Y} ∣ θ \in Θ},

where

\begin{matrix} \oplus_{x} p (x ∣ θ) σ_{θ, x}^{Y} = (\begin{matrix} p (1 ∣ θ) σ_{θ, 1}^{Y} & O & \dots & O \\ O & p (2 ∣ θ) σ_{θ, 2}^{Y} & ⋮ \\ ⋮ & ⋱ & O \\ O & \dots & O & p (N ∣ θ) σ_{θ, N}^{Y} \end{matrix}) \end{matrix}

is a density operator in

C^{N} \otimes H^{Y}

. Since

\dim C^{N} \otimes H^{Y} = N d_{Y}

, the state family

M

can be regarded as a subset of the Euclidean space

R^{N^{2} d_{Y}^{2} - 1}

. By identifying

Θ

with

M

, the parameter space

Θ

is endowed with the induced topology as a subset of

R^{N d_{Y}^{2} - 1}

.

Any measurement on the system

X

is represented by a projective measurement

{e_{x x} = | x 〉 〈 x | ∣ x = 1, \dots, N}

, where

{| 1 〉, \dots, | N 〉}

is an orthonormal basis of

C^{N}

. For every

x \in X

, we define

S_{θ} (x) \in L_{+} (H^{Y})

as

\begin{matrix} S_{θ} (x) : = & {Tr}_{C^{N}} (e_{x x} \otimes I^{Y}) (\oplus_{x} p (x ∣ θ) σ_{θ, x}^{Y}) = p (x ∣ θ) σ_{θ, x}, \end{matrix}

which is the unnormalized state of

Y

conditional on the measurement outcome x. We also define

\begin{matrix} S_{π} (x) = \int S_{θ} (x) d π (θ), p_{π} (x) = Tr S_{π} (x), σ_{π} (x) = \frac{S_{π} (x)}{p_{π} (x)} . \end{matrix}

3. Minimax Estimation of Quantum States

In this section, we develop the latent information prior for quantum state estimation and show that this prior provides a minimax predictive density operator.

In the following, we assume the following conditions:

$Θ$ is compact.
For every $x \in X$ , $E_{x} \neq O$ .
For every $x \in X$ , there exists $θ \in Θ$ such that $p (x ∣ θ) = Tr E_{x} σ_{θ}^{X} > 0$ .

The third assumption is achieved by adopting sufficiently small Hilbert space. Namely, if there exists

x \in X

such that

p (x ∣ θ) = Tr E_{x} σ_{θ}^{X} = 0

for every

θ \in Θ

, then we redefine the state space

H

as the orthogonal complement of

Ker E_{x}

.

Let

P

be the set of all probability measures on

Θ

endowed with the weak convergence topology and the corresponding Borel algebra. By the Prohorov theorem [18] and the first assumption,

P

is compact.

When x is fixed, the function

θ \in Θ \mapsto S_{θ} (x)

is bounded and continuous. Thus, for every fixed

x \in X

, the function

\begin{matrix} π \in P \mapsto S_{π} (x) = \int S_{θ} (x) d π (θ) \end{matrix}

is continuous because

P

is endowed with the weak convergence topology and

\dim H^{Y} < \infty

. Let

{λ_{x, i}}_{i}, {| ϕ_{x, i} 〉}_{i}

be the eigenvalues and the normalized eigenvectors of the predictive density operator

ρ (x)

. For every predictive density operator

ρ

, consider the function from

P

to

[0, \infty]

defined by

\begin{matrix} D_{ρ} (π) & = \sum_{x} Tr S_{π} (x) (\log S_{π} (x) - \log (p_{π} (x) ρ (x))) \\ = \sum_{x} Tr S_{π} (x) (\log S_{π} (x) - (\log p_{π} (x)) I - \log ρ (x)) \\ = \sum_{x} Tr S_{π} (x) \log S_{π} (x) - \sum_{x} p_{π} (x) \log p_{π} (x) \\ + \sum_{i : λ_{x, i} \neq 0} - p_{π} (x) 〈 ϕ_{x, i} | σ_{π} (x) | ϕ_{x, i} 〉 \log λ_{x, i} \\ + \sum_{i : λ_{x, i} = 0} - p_{π} (x) 〈 ϕ_{x, i} | σ_{π} (x) | ϕ_{x, i} 〉 \log λ_{x, i} . \end{matrix}

(10)

The last term in (10) is lower semicontinuous under the definition

0 \log 0 = 0

[10], since each summand takes either zero or infinity and so the set of

π \in P

such that this term takes zero is closed. In addition, the other terms in (10) are continuous since the von Neumann entropy is continuous [12]. Therefore, the function

D_{ρ} (π)

in (10) is lower-semicontinuous.

Now, we prove that the class of predictive density operators that are limits of Bayesian predictive density operators is an essentially complete class. We prepare three lemmas. Lemma 1 is useful for differentiation of quantum relative entropy (see Hiai and Petz [19]). Lemmas 2 and 3 are from Komaki [10].

Lemma 1.

Let

A, B

be n-dimensional self-adjoint matrices and t be a real number. Assume that

f : (α, β) \to R

is a continuously differentiable function defined on an interval and assume that the eigenvalues of

A + t B

are in

(α, β)

if t is sufficiently close to

t_{0}

. Then,

\begin{matrix} \frac{d}{d t} Tr f (A + t B) |_{t = t_{0}} = Tr (B f^{'} (A + t_{0} B)) . \end{matrix}

Lemma 2

([10]). Let μ be a probability measure on Θ. Then,

P_{ε μ} = {ε μ + (1 - ε) π ∣ π \in P}

is a closed subset of

P

for

0 \leq ε \leq 1

.

Lemma 3

([10]). Let

f : P \to [0, \infty]

be continuous, and let μ be a probability measure on Θ such that

p_{μ} (x) : = \int p (x ∣ θ) d μ (θ) > 0

for every

x \in X

. Then, there is a probability measure

π_{n}

in

\begin{matrix} P_{μ / n} = \{\frac{1}{n} μ + (1 - \frac{1}{n}) π ∣ π \in P\} \end{matrix}

for every n, such that

f (π_{n}) = \inf_{π \in P_{μ / n}} f (π)

. Furthermore, there exists a convergent subsequence

{π_{m}^{'}}_{m = 1}^{\infty}

of

{π_{n}}_{n = 1}^{\infty}

and the equality

f (π_{\infty}^{'}) = \inf_{π \in P} f (π)

holds, where

π_{\infty}^{'} = \lim_{m \to \infty} π_{m}^{'}

.

By using these results, we obtain the following theorem, which is a quantum version of Theorem 1 of Komaki [10].

Theorem 1.

(1): Let $ρ (x)$ be a predictive density operator. If there exists a prior ${\hat{π}}^{ρ} \in P$ such that $D_{ρ} ({\hat{π}}^{ρ}) = \inf_{π \in P} D_{ρ} (π)$ and $p_{{\hat{π}}^{ρ}} (x) > 0$ for every $x \in X$ , then $R (θ, σ_{{\hat{π}}^{ρ}} (x)) \leq R (θ, ρ (x))$ for every $θ \in Θ$ .
(2): For every predictive density operator ρ, there exists a convergent prior sequence ${π_{n}^{ρ}}_{n = 1}^{\infty}$ such that $D_{ρ} (\lim_{n \to \infty} π_{n}^{ρ}) = \inf_{π \in P} D_{ρ} (π)$ , $\lim_{n \to \infty} σ_{π_{n}^{ρ}} (x)$ exists, and $R (θ, \lim_{n \to \infty} σ_{π_{n}^{ρ}} (x)) \leq R (θ, ρ)$ for every $θ \in Θ$ .

Next, we develop priors that provide minimax predictive density operators. Let x be a random variable, which represents the outcome of the measurement, i.e.,

x \sim p (\cdot ∣ θ)

. Then, as a quantum analogue of the conditional mutual information (5), we define the conditional Holevo mutual information [13] between the quantum state

σ_{x}^{Y}

of Y and the parameter

θ

given the measurement outcome x as

\begin{matrix} I_{θ, σ ∣ x} (π) & = \int \sum_{x} Tr S_{θ} (x) \log S_{θ} (x) d π (θ) - \sum_{x} Tr S_{π} (x) \log S_{π} (x) \\ - \int \sum_{x} p (x ∣ θ) \log p (x ∣ θ) d π (θ) + \sum_{x} p_{π} (x) \log p_{π} (x) \\ = \int \sum_{x} p (x ∣ θ) Tr σ_{θ, x} (\log σ_{θ, x} - \log σ_{π, x}) d π (θ), \end{matrix}

(11)

which is a function of

π \in P

. Here, we used

\begin{matrix} \sum_{x} Tr S_{θ} (x) \log S_{θ} (x) & = \sum_{x} p (x ∣ θ) Tr σ_{θ, x} (\log p (x ∣ θ) I + \log σ_{θ, x}) \\ = \sum_{x} p (x ∣ θ) \log p (x ∣ θ) + \sum_{x} p (x ∣ θ) Tr σ_{θ, x} \log σ_{θ, x} \end{matrix}

and

\begin{matrix} \sum_{x} Tr S_{π} (x) \log S_{π} (x) & = \sum_{x} p_{π} (x) Tr σ_{π} (x) (\log p_{π} (x) I + \log σ_{π} (x)) \\ = \sum_{x} p_{π} (x) \log p_{π} (x) + \sum_{x} p_{π} (x) Tr σ_{π} (x) \log σ_{π} (x) . \end{matrix}

The conditional Holevo mutual information provides an upper bound on the conditional mutual information as follows.

Proposition 1.

Let

σ_{θ}^{X Y}

be the state of the composed system

(X, Y)

. Suppose that a measurement is performed on X with the measurement outcome x and then another measurement is performed on Y with the measurement outcome y. Then,

\begin{matrix} I_{θ, σ ∣ x} (π) \geq I_{θ, y ∣ x} (π) . \end{matrix}

(12)

Proof.

Since any measurement is a trace-preserving completely positive map, inequality (12) follows from the monotonicity of the quantum relative entropy [13]. ☐

Analogous with the latent information priors [10] in classical statistics, we define latent information priors as priors that maximize the conditional Holevo mutual information. It is expected that the Bayesian predictive density operator

σ_{\hat{π}, x}

based on a latent information prior is a minimax predictive density operator. This is true from the following theorem, which is a quantum version of Theorem 2 of Komaki [10].

Theorem 2.

(1): Let $\hat{π} \in P$ be a prior maximizing $I_{θ, σ ∣ x} (π)$ . If $p_{\hat{π}} (x) > 0$ for all $x \in X$ ; then, $σ_{\hat{π}} (x)$ is a minimax predictive density operator.
(2): There exists a convergent prior sequence ${π_{n}}_{n = 1}^{\infty}$ such that $\lim_{n \to \infty} σ_{π_{n}} (x)$ is a minimax predictive density operator and the equality $I_{θ, σ ∣ x} (π_{\infty}) = \sup_{π \in P} I_{θ, σ ∣ x} (π)$ holds.

The proof of Theorems 1 and 2 are deferred to the Appendix A.

We note that the minimax risk

\inf_{ρ} \sup_{θ} R_{E} (θ, ρ)

depends on the measurement E on

X

. Therefore, the measurement E with minimum minimax risk is desirable from the viewpoint of minimaxity. We define a POVM

E^{*}

to be a minimax POVM if it satisfies

\inf_{ρ} \sup_{θ} R_{E^{*}} (θ, ρ) = \inf_{E} \inf_{ρ} \sup_{θ} R_{E} (θ, ρ) .

(13)

In the next section, we provide a class of minimax POVMs for one qubit system.

4. One Qubit System

In this section, we consider one qubit system and derive a class of minimax POVMs satisfying (13).

Qubit is a quantum system with a two-dimensional Hilbert space. It is the fundamental system in the quantum information theory. A general state of one qubit system is described by a density matrix

\begin{matrix} σ_{θ} = \frac{1}{2} (\begin{matrix} 1 + θ_{z} & θ_{x} - i θ_{y} \\ θ_{x} + i θ_{y} & 1 - θ_{z} \end{matrix}), \end{matrix}

where

θ = {(θ_{x}, θ_{y}, θ_{z})}^{⊤} \in Θ = {{(θ_{x}, θ_{y}, θ_{z})}^{⊤} \in R^{3} ∣ {∥ θ ∥}^{2} \leq 1}

. The parameter space

\partial Θ = {{(θ_{x}, θ_{y}, θ_{z})}^{⊤} \in R^{3} ∣ ∥ θ ∥^{2} = 1}

for pure states is called the Bloch sphere.

Let

σ_{θ}^{X Y} = σ_{θ} \otimes σ_{θ}

be a separable state. We consider the estimation of

σ_{θ}^{Y} = σ_{θ}

from the outcome of a measurement on

σ_{θ}^{X} = σ_{θ}

. Here, we assume that the state

σ_{θ}^{X Y}

is separable, since the state of Y changes according to the outcome of the measurement on X and so the estimation problem is not well-defined if the state

σ_{θ}^{X Y}

is not separable.

Let

Ω : = {{(x, y, z)}^{⊤} \in R^{3} ∣ x^{2} + y^{2} + z^{2} = 1}

and

B = B (Ω)

be Borel sets. From Haapasalo et al. [20], it is sufficient to consider POVMs on

Ω

. For every probability measure

μ

on

(Ω, B)

that satisfies

\int_{Ω} x d μ (ω) = \int_{Ω} y d μ (ω) = \int_{Ω} z d μ (ω) = 0,

we define a POVM

E : B \to L_{+}

by

E (B) = \int_{B} (\begin{matrix} 1 + z & x - i y \\ x + i y & 1 - z \end{matrix}) d μ (ω) .

In the following, we identify E with

μ

.

Let

E_{1 - qubit}^{*}

be a class of POVMs on

Ω

represented by measures that satisfy the conditions

\begin{matrix} E_{μ} [x] = E_{μ} [y] = E_{μ} [z] = 0, \\ E_{μ} [x y] = E_{μ} [y z] = E_{μ} [z x] = 0, \\ E_{μ} [x^{2}] = E_{μ} [y^{2}] = E_{μ} [z^{2}] = \frac{1}{3}, \end{matrix}

where

E_{μ}

is the expectation with respect to a measure

μ

. We provide two examples of POVMs in

E_{1 - qubit}^{*}

.

Proposition 2.

The POVM corresponding to

\begin{matrix} μ (d ω) = \frac{1}{4 π} d ω, \end{matrix}

(14)

where

d ω

is surface element, is in

E_{1 - qubit}^{*}

.

Proof.

From the symmetry of

μ

,

E_{μ} [x] = E_{μ} [y] = E_{μ} [z] = E_{μ} [x y] = E_{μ} [y z] = E_{μ} [z x] = 0

. Moreover, from

E_{μ} [1] = E_{μ} [x^{2} + y^{2} + z^{2}] = 1

and the symmetry of

μ

,

E_{μ} [x^{2}] = E_{μ} [y^{2}] = E_{μ} [z^{2}] = 1 / 3

. ☐

Proposition 3.

Suppose that

ω_{i} (i = 1, 2, 3, 4) \in Ω

satisfies

| ω_{i} |^{2} = 1, ω_{i} \cdot ω_{j} = - 1 / 3 (i \neq j)

. Let μ be a four point discrete measure on Ω defined by

\begin{matrix} μ ({ω_{1}}) = μ ({ω_{2}}) = μ ({ω_{3}}) = μ ({ω_{4}}) = \frac{1}{4} . \end{matrix}

(15)

Then, the POVM corresponding to μ belongs to

E_{1 - qubit}^{*}

.

Proof.

Let

P = (ω_{1}, ω_{2}, ω_{3}, ω_{4}) \in R^{3 \times 4}

and

1 = {(1, 1, 1, 1)}^{⊤}

. From the assumption on

ω_{i} (i = 1, 2, 3, 4)

,

P^{⊤} P = \frac{4}{3} I_{4} - \frac{1}{3} J_{4},

(16)

where

I_{4} \in R^{4 \times 4}

is the identity matrix and

J_{4} = 1 1^{⊤} \in R^{4 \times 4}

is a matrix whose elements are all one. From (16), we have

1^{⊤} P^{⊤} P 1 = {∥ P 1 ∥}^{2} = 0

. Therefore,

P 1 = 0

and it implies

E_{μ} [x] = E_{μ} [y] = E_{μ} [z] = 0

.

In addition, from (16),

\begin{matrix} P^{⊤} P P^{⊤} P & = ((4 / 3) I_{4} - (1 / 3) J_{4}) ((4 / 3) I_{4} - (1 / 3) J_{4}) \\ = (4 / 3) ((4 / 3) I_{4} - (1 / 3) J_{4}) \\ = (4 / 3) P^{⊤} P . \end{matrix}

Therefore,

P^{⊤} (P P^{⊤} - (4 / 3) I_{3}) P = 0

. Since

rank P = 3

, it implies

P P^{⊤} = (4 / 3) I_{3}

. Then,

E_{μ} [x y] = E_{μ} [y z] = E_{μ} [z x] = 0

and

E_{μ} [x^{2}] = E_{μ} [y^{2}] = E_{μ} [z^{2}] = 1 / 3

. ☐

We note that the POVM (15) is a special case of the SIC-POVM (symmetric, informationally complete, positive operator valued measure) [14,15].

Let

P_{1 - qubit}^{*}

be a class of priors on

Θ

that satisfies the conditions

\begin{matrix} E_{π} [θ_{x}] = E_{π} [θ_{y}] = E_{π} [θ_{z}] = 0, \\ E_{π} [θ_{x} θ_{y}] = E_{π} [θ_{y} θ_{z}] = E_{π} [θ_{z} θ_{x}] = 0, \\ E_{π} [θ_{x}^{2}] = E_{π} [θ_{y}^{2}] = E_{π} [θ_{z}^{2}] = \frac{1}{3}, \end{matrix}

where

E_{π}

is the expectation with respect to a prior

π

.

Proposition 4.

The uniform prior

\begin{matrix} π (d θ) = \frac{1}{4 π} d θ, \end{matrix}

where

d θ

is the surface element on the Bloch sphere, belongs to

P_{1 - qubit}^{*}

.

Proof.

Same as Proposition 2. ☐

Proposition 5.

Suppose that

θ_{i} (i = 1, 2, 3, 4) \in Θ

satisfies

| θ_{i} |^{2} = 1, θ_{i} \cdot θ_{j} = - 1 / 3 (i \neq j)

. Then, the four point discrete prior

\begin{matrix} π ({θ_{1}}) = π ({θ_{2}}) = π ({θ_{3}}) = π ({θ_{4}}) = \frac{1}{4} \end{matrix}

belongs to

P_{1 - qubit}^{*}

.

Proof.

Same as Proposition 3. ☐

We obtain the following result.

Lemma 4.

Suppose

π^{*} \in P_{1 - qubit}^{*}

. Then, for general measurement E, the risk function of the Bayesian predictive density operator

σ_{π^{*}}

is

\begin{matrix} R_{E} (θ, σ_{π^{*}}) = - & h (\frac{1 + ∥ θ ∥}{2}) + \frac{1}{2} \log \frac{9}{2} - \frac{\log 2}{2} ({θ_{x}}^{2} E_{μ} [x^{2}] + {θ_{y}}^{2} E_{μ} [y^{2}] + {θ_{z}}^{2} E_{μ} [z^{2}] \\ + 2 θ_{x} θ_{y} E_{μ} [x y] + 2 θ_{y} θ_{z} E_{μ} [y z] + 2 θ_{z} θ_{x} E_{μ} [z x]) . \end{matrix}

Proof.

The distribution of the measurement outcome

ω = {(x, y, z)}^{⊤}

is

\begin{matrix} p (B ∣ θ) & = Tr σ_{θ} E (B) = (1 + x θ_{x} + y θ_{y} + z θ_{z}) μ (B) . \end{matrix}

Then, since

π^{*} \in P_{1 - qubit}^{*}

, the marginal distribution of the measurement outcome is

\begin{matrix} p (B) & = \int_{Θ} p (B ∣ θ) d π^{*} (θ) = \int_{Θ} (1 + x θ_{x} + y θ_{y} + z θ_{z}) μ (B) d π^{*} (θ) = μ (B) . \end{matrix}

Therefore, the posterior distribution of

θ

is

\begin{matrix} d π^{*} (θ ∣ ω) & = (1 + x θ_{x} + y θ_{y} + z θ_{z}) d π^{*} (θ) . \end{matrix}

The posterior mean of

θ_{x}, θ_{y}

and

θ_{z}

are

x / 3, y / 3

and

z / 3

, respectively.

Thus, the Bayesian predictive density operator based on prior

π^{*}

is

\begin{matrix} σ_{π^{*}} (ω) & = \int σ_{θ} d π^{*} (θ ∣ ω) = \frac{1}{2} (\begin{matrix} 1 + z / 3 & x / 3 - i y / 3 \\ x / 3 + i y / 3 & 1 - z / 3 \end{matrix}), \end{matrix}

and we have

\begin{matrix} \log σ_{π^{*}} (ω) = (\begin{matrix} (\log \frac{1}{3}) (\frac{1 - z}{2}) & (\log \frac{1}{3}) (\frac{- x + i y}{2}) \\ (\log \frac{1}{3}) (\frac{- x - i y}{2}) & (\log \frac{1}{3}) (\frac{1 + z}{2}) \end{matrix}) + (\begin{matrix} (\log \frac{2}{3}) (\frac{1 + z}{2}) & (\log \frac{2}{3}) (\frac{x - i y}{2}) \\ (\log \frac{2}{3}) (\frac{x + i y}{2}) & (\log \frac{2}{3}) (\frac{1 - z}{2}) \end{matrix}) . \end{matrix}

Therefore, the quantum relative entropy loss is

\begin{matrix} D (σ_{θ}, σ_{π^{*}} (ω)) & = Tr σ_{θ} (\log σ_{θ} - \log σ_{π^{*}} (ω)) \\ = - h (\frac{1 + ∥ θ ∥}{2}) + \frac{1}{2} \log \frac{9}{2} - \frac{x θ_{x} + y θ_{y} + z θ_{z}}{2} \log 2 . \end{matrix}

Hence, the risk function is

\begin{matrix} R_{E} (θ, σ_{π^{*}}) & = \int_{Ω} D (σ_{θ}, σ_{π^{*}} (ω)) d p (ω ∣ θ) \\ = - h (\frac{1 + ∥ θ ∥}{2}) + \frac{1}{2} \log \frac{9}{2} - \frac{\log 2}{2} ({θ_{x}}^{2} E_{μ} [x^{2}] + {θ_{y}}^{2} E_{μ} [y^{2}] + {θ_{z}}^{2} E_{μ} [z^{2}] \\ + 2 θ_{x} θ_{y} E_{μ} [x y] + 2 θ_{y} θ_{z} E_{μ} [y z] + 2 θ_{z} θ_{x} E_{μ} [z x]) . \end{matrix}

☐

Theorem 3.

For a measurement

E \in E_{1 - qubit}^{*}

, every

π^{*} \in P_{1 - qubit}^{*}

is a latent information prior:

\begin{matrix} \max_{θ} R (θ, σ_{π^{*}}) = \min_{ρ} \max_{θ} R (θ, ρ) . \end{matrix}

In addition, the risk of the Bayesian predictive density operator based on

π^{*}

is

\begin{matrix} R (θ, σ_{π^{*}}) = - h (\frac{1 + ∥ θ ∥}{2}) + \frac{1}{2} \log \frac{9}{2} - \frac{\log 2}{6} {∥ θ ∥}^{2}, \end{matrix}

where h is the binary entropy function

h (p) = - p \log p - (1 - p) \log (1 - p)

.

Proof.

From Lemma 4 and

E^{*} \in E_{1 - qubit}^{*}

,

\begin{matrix} R_{E^{*}} (θ, σ_{π^{*}}) = - h (\frac{1 + | θ |}{2}) + \frac{1}{2} \log \frac{9}{2} - \frac{\log 2}{6} ({θ_{x}}^{2} + {θ_{y}}^{2} + {θ_{z}}^{2}) . \end{matrix}

Therefore, the risk depends only on

r = ∥ θ ∥

and we have

\begin{matrix} R_{E^{*}} (θ, σ_{π^{*}}) = g (r) = - h (\frac{1 + r}{2}) + \frac{1}{2} \log \frac{9}{2} - \frac{\log 2}{6} r^{2} . \end{matrix}

(17)

Since

\begin{matrix} g^{'} (r) = & \frac{1}{2} \log (\frac{1 + r}{1 - r}) - \frac{\log 2}{3} r, \end{matrix}

\begin{matrix} g^{″} (r) = & \frac{1}{1 - r^{2}} - \frac{\log 2}{3} \geq 1 - \frac{\log 2}{3} \geq 0, \end{matrix}

the function

g (r)

is convex. In addition, we have

g (1) = \log 3 - \frac{2}{3} \log 2 > g (0) = \log 3 - \frac{3}{2} \log 2

. Therefore,

g (r)

takes the maximum at

r = 1

.

In other words,

R_{E^{*}} (θ, σ_{π^{*}})

takes maximum on the Bloch sphere. In addition, since

\int (θ_{x}^{2} + θ_{y}^{2} + θ_{z}^{2}) d π^{*} (θ) = 1 / 3 + 1 / 3 + 1 / 3 = 1

, the support of

π^{*}

is included in the Bloch sphere

{∥ θ ∥}^{2} = 1

. Therefore,

\int R_{E^{*}} (θ, σ_{π^{*}}) d π^{*} (θ) = \sup_{θ} R_{E^{*}} (θ, σ_{π^{*}})

and it implies that

π^{*}

is a latent information prior. ☐

We note that the Bayesian predictive density operator is identical for every

π^{*} \in P_{1 - qubit}^{*}

. In fact, every

π^{*} \in P_{1 - qubit}^{*}

also provides the minimax estimation of density operator

σ_{θ}^{Y}

when there is no observation system X. Figure 1 shows the risk function

g (r)

in (17) and also the minimax risk function

g_{0} (r)

when there is no observation:

\begin{matrix} g_{0} (r) & = Tr ((\begin{matrix} (1 + r) / 2 & 0 \\ 0 & (1 - r) / 2 \end{matrix}) (\log (\begin{matrix} (1 + r) / 2 & 0 \\ 0 & (1 - r) / 2 \end{matrix}) - \log (\begin{matrix} 1 / 2 & 0 \\ 0 & 1 / 2 \end{matrix}))) \\ = - h (r) + \log 2 . \end{matrix}

Whereas

g (r) < g_{0} (r)

around

r = 1

, we can see that

g (r) > g_{0} (r)

around

r = 0

. Both risk functions take the maximum at

r = 1

and

g (1) = \log 3 - (2 / 3) \log 2 < g_{0} (1) = \log 2 .

The decrease

g_{0} (1) - g (1) > 0

in the maximum risk corresponds to the gain from the observation X.

Now, we consider the selection of the measurement E. As we discussed in the previous section, we define a POVM

E^{*}

to be a minimax POVM if it satisfies (13). We provide a sufficient condition on a POVM to be minimax. Let

ρ^{E}

be a minimax predictive density operator for the measurement E.

Lemma 5.

Suppose

π^{*}

is a latent information prior for the measurement

E^{*}

. If

\begin{matrix} \int R_{E^{*}} (θ, ρ^{E^{*}}) d π^{*} (θ) = \inf_{E} \int R_{E} (θ, ρ^{E}) d π^{*} (θ), \end{matrix}

then

E^{*}

is a minimax POVM.

Proof.

For every

(E, ρ)

, we have

\begin{matrix} \sup_{θ} R_{E} (θ, ρ) & \geq \inf_{ρ} \sup_{θ} R_{E} (θ, ρ) = \sup_{θ} R_{E} (θ, ρ^{E}) \\ = \int R_{E} (θ, ρ^{E}) d π^{*} (θ) \geq \inf_{E} \int R_{E} (θ, ρ^{E}) d π^{*} (θ) \\ = \int R_{E^{*}} (θ, ρ^{E^{*}}) d π^{*} (θ) = \sup_{θ} R_{E^{*}} (θ, σ_{π^{*}}) . \end{matrix}

The last equality is from the minimaxity of

σ_{π^{*}}

. Therefore,

E^{*}

is a minimax POVM. ☐

Theorem 4.

Every

E^{*} \in E_{1 - qubit}^{*}

is a minimax POVM.

Proof.

Let

π^{*} \in P_{1 - qubit}^{*}

. From Theorem 6,

π^{*}

is a latent information prior for

E^{*}

.

For general measurement E, from Lemma 4, the risk function of the Bayesian predictive density operator

σ_{π^{*}}

is

\begin{matrix} R_{E} (θ, σ_{π^{*}}) = & - h (\frac{1 + ∥ θ ∥}{2}) + \frac{1}{2} \log \frac{9}{2} - \frac{\log 2}{2} ({θ_{x}}^{2} E_{μ} [x^{2}] + {θ_{y}}^{2} E_{μ} [y^{2}] + {θ_{z}}^{2} E_{μ} [z^{2}] \\ + 2 θ_{x} θ_{y} E_{μ} [x y] + 2 θ_{y} θ_{z} E_{μ} [y z] + 2 θ_{z} θ_{x} E_{μ} [z x]) . \end{matrix}

Hence, the Bayes risk of

σ_{π^{*}}

with respect to

π^{*}

is

\begin{matrix} \int R_{E} (θ, σ_{π^{*}}) d π^{*} (θ) = \log 3 - \frac{2}{3} \log 2 . \end{matrix}

Now, since the Bayesian predictive density operator

σ_{π^{*}}

minimizes the Bayes risk with respect to

π^{*}

among all predictive density operators [4],

\begin{matrix} \int R_{E} (θ, ρ^{E}) d π^{*} (θ) \geq \int R_{E} (θ, σ_{π^{*}}) d π^{*} (θ) = \log 3 - \frac{2}{3} \log 2 \end{matrix}

for every E. Therefore,

\begin{matrix} \inf_{E} \int R_{E} (θ, ρ^{E}) d π^{*} (θ) \geq \log 3 - \frac{2}{3} \log 2 . \end{matrix}

On the other hand,

\begin{matrix} \inf_{E} \int R_{E} (θ, σ_{π^{*}}) d π^{*} (θ) \leq \int R_{E^{*}} (θ, σ_{π^{*}}) d π^{*} (θ) = \log 3 - \frac{2}{3} \log 2 \end{matrix}

is obvious.

Hence,

\begin{matrix} \int R_{E^{*}} (θ, σ_{π^{*}}) d π^{*} (θ) = \inf_{E} \int R_{E} (θ, σ_{π^{*}}) d π^{*} (θ) = \log 3 - \frac{2}{3} \log 2 . \end{matrix}

From Lemma 5,

E^{*}

is minimax. ☐

Whereas Theorems 1 and 2 are valid even when

σ_{θ}^{X Y}

is not separable, Theorems 3 and 4 assume the separability

σ_{θ}^{X Y} = σ_{θ}^{X} \otimes σ_{θ}^{Y}

.

From Theorem 4, the POVM (15) is a minimax POVM. Since this POVM is identical to the SIC-POVM [14,15], it is an interesting problem whether the SIC-POVM is a minimax POVM also in higher dimensions. This is a future work.

Acknowledgments

We thank the referees for many helpful comments. This work was supported by Japan Society for the Promotion of Science (JSPS) KAKENHI Grant Numbers 26280005 and 14J09148.

Author Contributions

All authors contributed significantly to the study and approved the final version of the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proofs

Proof of (4).

From the definition of

{\hat{p}}_{π}

in (1),

\begin{matrix} \int p (x, y ∣ θ) d π (θ) = p_{π} (x) {\hat{p}}_{π} (y ∣ x), \end{matrix}

where

\begin{matrix} p_{π} (x) = \int p (x ∣ θ) d π (θ) . \end{matrix}

Therefore, for arbitrary

\hat{p}

,

\begin{matrix} R (π, \hat{p}) - R (π, {\hat{p}}_{π}) & = \int \int \int p (x, y ∣ θ) (\log {\hat{p}}_{π} (y ∣ x) - \log \hat{p} (y ∣ x)) d π (θ) d x d y \\ = \int \int p_{π} (x) {\hat{p}}_{π} (y ∣ x) (\log {\hat{p}}_{π} (y ∣ x) - \log \hat{p} (y ∣ x)) d x d y \\ = \int \int p_{π} (x) L ({\hat{p}}_{π} (y ∣ x), \hat{p} (y ∣ x)) d x, \end{matrix}

which is nonnegative since the Kullback–Leibler divergence

L (q, p)

in (3) is always nonnegative. ☐

Proof of (9).

From the definition of

σ_{π}^{Y} (x)

in (7),

\begin{matrix} \int p (x ∣ θ) σ_{θ, x}^{Y} d π (θ) = p_{π} (x) σ_{π}^{Y} (x), \end{matrix}

where

\begin{matrix} p_{π} (x) = \int p (x ∣ θ) d π (θ) . \end{matrix}

Therefore, for arbitrary

\hat{p}

,

\begin{matrix} R (π, ρ) - R (π, σ_{π}^{Y}) & = \int \int p (x ∣ θ) Tr σ_{θ, x}^{Y} (\log σ_{π}^{Y} (x) - \log ρ (x)) d π (θ) d x \\ = \int p_{π} (x) Tr σ_{π}^{Y} (x) (\log σ_{π}^{Y} (x) - \log ρ (x)) d x \\ = \int p_{π} (x) L (σ_{π}^{Y} (x), ρ (x)) d x, \end{matrix}

which is nonnegative since the quantum relative entropy

L (σ, ρ)

in (8) is always nonnegative. ☐

Proof of Theorem 1.

(1) Let

Q_{x}^{ρ}

be the orthogonal projection matrix onto the eigenspace of

ρ (x)

corresponding to eigenvalue 0,

Θ^{ρ} = {θ \in Θ ∣ \sum_{x} p (x ∣ θ) Tr Q_{x}^{ρ} σ_{θ, x} = 0}

and

P^{ρ}

be the set of all probability measures on

Θ^{ρ}

.

If

Θ^{ρ} = \emptyset

, the assertion is obvious because

R (θ, ρ) = \infty

for

θ \notin Θ^{ρ}

. Therefore, we assume

Θ^{ρ} \neq \emptyset

in the following. In this case,

D_{ρ} ({\hat{π}}^{ρ}) < \infty

. Since

π \in P^{ρ}

if and only if

D_{ρ} (π) < \infty

, we have

{\hat{π}}^{ρ} \in P^{ρ}

.

Define

\begin{matrix} {\tilde{π}}_{θ, u} : = u δ_{θ} + (1 - u) {\hat{π}}^{ρ}, \end{matrix}

for

θ \in Θ^{ρ}

and

0 \leq u \leq 1

, where

δ_{θ}

is the probability measure satisfying

δ_{θ} ({θ}) = 1

. Then,

{\tilde{π}}_{θ, u} \in P^{ρ}

, and we have

\begin{matrix} \frac{\partial}{\partial u} D_{ρ} ({\tilde{π}}_{θ, u}) |_{u = 0} \\ = & \frac{\partial}{\partial u} \sum_{x} Tr S_{{\tilde{π}}_{θ, u}} (x) (\log S_{{\tilde{π}}_{θ, u}} (x) - \log (p_{{\tilde{π}}_{θ, u}} (x) ρ (x))) |_{u = 0} \\ = & \frac{\partial}{\partial u} \sum_{x} Tr (u S_{θ} (x) + (1 - u) S_{{\hat{π}}_{θ, u}} (x)) \\ \times (\log (u S_{θ} (x) + (1 - u) S_{{\hat{π}}^{ρ}} (x)) - \log (u p (x ∣ θ) + (1 - u) p_{{\hat{π}}^{ρ}} (x)) ρ (x) |_{u = 0} \\ = & \sum_{x} Tr \{\frac{\partial}{\partial u} (u S_{θ} (x) + (1 - u) S_{{\hat{π}}^{ρ}} (x)) |_{u = 0}\} (\log S_{{\hat{π}}^{ρ}} (x) - \log (p_{{\hat{π}}^{ρ}} (x) ρ_{x})) \\ + \sum_{x} Tr S_{{\hat{π}}^{ρ}} (x) \{\frac{\partial}{\partial u} \log (u S_{θ} (x) + (1 - u) S_{{\hat{π}}^{ρ}} (x)) |_{u = 0}\} \\ - \sum_{x} Tr S_{{\hat{π}}^{ρ}} (x) \{\frac{\partial}{\partial u} (\log (u p (x ∣ θ) + (1 - u) p_{{\hat{π}}^{ρ}} (x)) I + \log ρ_{x}) |_{u = 0}\} \\ = & \sum_{x} Tr (S_{θ} (x) - S_{{\hat{π}}^{ρ}} (x)) (\log S_{{\hat{π}}^{ρ}} (x) - \log (p_{{\hat{π}}^{ρ}} (x) ρ (x))) \\ + \sum_{x} Tr (S_{θ} (x) - p_{{\hat{π}}^{ρ}} (x) ρ (x)) - \sum_{x} Tr (S_{{\hat{π}}^{ρ}} (x) \frac{p (x ∣ θ) - p_{{\hat{π}}^{ρ}} (x)}{p_{{\hat{π}}^{ρ}} (x)}) \\ = & \sum_{x} Tr S_{θ} (x) (\log S_{{\hat{π}}^{ρ}} (x) - \log (p_{{\hat{π}}^{ρ}} (x) ρ (x))) \\ - \sum_{x} Tr S_{{\hat{π}}^{ρ}} (x) (\log S_{{\hat{π}}^{ρ}} (x) - \log (p_{{\hat{π}}^{ρ}} (x) ρ (x))) \geq 0 . \end{matrix}

Thus, if

θ \in Θ^{ρ}

,

\begin{matrix} R (θ, σ_{{\hat{π}}^{ρ}} (x)) & = \sum_{x} Tr S_{θ} (x) (\log σ_{θ, x} - \log σ_{{\hat{π}}^{ρ}} (x)) \\ \leq \sum_{x} Tr S_{θ} (x) (\log σ_{θ, x} - \log ρ (x)) = R (θ, ρ (x)) < \infty . \end{matrix}

If

θ \notin Θ^{ρ}

,

R (θ, ρ (x)) = \infty

. Therefore, for every

θ \in Θ

, the inequality

R (θ, σ_{{\hat{π}}^{ρ}} (x)) \leq R (θ, ρ (x))

holds.

(2) We note that

Θ^{ρ}

and

P^{ρ}

are compact subsets of

Θ

and

P

, respectively.

If

Θ^{ρ} = \emptyset

, the assertion is obvious, because

R (θ, ρ_{x}) = \infty

for every

θ \notin Θ^{ρ}

. Therefore, we assume

Θ^{ρ} \neq \emptyset

in the following. Let

X^{ρ} : = {x \in X ∣ \exists θ \in Θ^{ρ}, p (x ∣ θ) > 0}

and

μ^{ρ}

be a probability measure on

Θ^{ρ}

such that

p_{μ^{ρ}} (x) : = \int p (x ∣ θ) d μ^{ρ} (θ) > 0

for every

x \in X^{ρ}

.

Because

D_{ρ} (π)

is continuous as a function of

π \in P^{ρ}

, there exists

π_{n} \in P_{μ^{ρ} / n}^{ρ} : = {(1 / n) μ^{ρ} + (1 - 1 / n) π ∣ π \in P^{ρ}}

such that

D_{ρ} (π_{n}) = \inf_{π \in P_{μ / n}^{ρ}} D_{ρ} (π)

. From Lemma 3, there exists a convergent subsequence

{π_{m}^{'}}_{m = 1}^{\infty}

of

{π_{n}}_{n = 1}^{\infty}

such that

D_{ρ} (π_{\infty}^{'}) = \inf_{π \in P^{ρ}} D_{ρ} (π),

where

\lim π_{m}^{'} \Rightarrow π_{\infty}^{'}

.

Let

n_{m}

be the integer satisfying

π_{m}^{'} = π_{n_{m}}

. We can make the subsequence

{π_{m}^{'}}_{m = 1}^{\infty}

satisfy

0 < n_{m} / (n_{m + 1} - n_{m}) < c

for some positive constant c.

Since

\begin{matrix} \frac{n_{m}}{n_{m + 1}} π_{m}^{'} + (1 - \frac{n_{m}}{n_{m + 1}}) δ_{θ} = \frac{n_{m}}{n_{m + 1}} π_{n_{m}} + (1 - \frac{n_{m}}{n_{m + 1}}) δ_{θ} \in P_{μ^{ρ} / n_{m + 1}}^{ρ} \end{matrix}

for every

θ \in Θ

, we have

\begin{matrix} {\tilde{π}}_{m, θ, u} : = u \{\frac{n_{m}}{n_{m + 1}} π_{m}^{'} + (1 - \frac{n_{m}}{n_{m + 1}}) δ_{θ}\} + (1 - u) π_{m + 1}^{'} \in P_{μ^{ρ} / n_{m + 1}}^{ρ} \end{matrix}

for every

θ \in Θ^{ρ}

and

0 \leq u \leq 1

. Thus,

\begin{matrix} \frac{\partial}{\partial u} D ({\tilde{π}}_{m, θ, u}) |_{u = 0} \\ = & \frac{\partial}{\partial u} \sum_{x} Tr p_{{\tilde{π}}_{m, θ, u}} (x) (\log S_{{\tilde{π}}_{m, θ, u}} (x) - \log (p_{{\tilde{π}}_{m, θ, u}} (x) ρ (x))) (I - Q_{x}^{ρ}) |_{u = 0} \\ = & \sum_{x} Tr {\frac{\partial}{\partial u} S_{{\tilde{π}}_{m, θ, u}} (x) |_{u = 0}} (\log S_{{\tilde{π}}_{m, θ, u}} (x) - \log (p_{{\tilde{π}}_{m, θ, u}} (x) ρ (x))) (I - Q_{x}^{ρ}) \\ = & \frac{n_{m}}{n_{m + 1}} \sum_{x} Tr S_{π_{m}^{'}} (x) (\log S_{π_{m + 1}^{'}} (x) - \log (p_{π_{m + 1}^{'}} (x) ρ (x))) (I - Q_{x}^{ρ}) \\ - \sum_{x} Tr S_{π_{m + 1}^{'}} (x) (\log S_{π_{m + 1}^{'}} (x) - \log (p_{π_{m + 1}^{'}} (x) ρ (x))) (I - Q_{x}^{ρ}) \\ + \frac{n_{m + 1} - n_{m}}{n_{m + 1}} \sum_{x} Tr S_{θ} (x) (\log S_{π_{m + 1}^{'}} (x) - \log (p_{π_{m + 1}^{'}} (X) ρ (x))) (I - Q_{x}^{ρ}) \\ \geq & 0 . \end{matrix}

Hence,

\begin{matrix} \sum_{x} Tr S_{θ} (x) (\log S_{π_{m + 1}^{'}} (x) - \log (p_{π_{m + 1}^{'}} (x) ρ (x))) (I - Q_{x}^{ρ}) \\ \geq \frac{n_{m + 1}}{n_{m + 1} - n_{m}} \sum_{x} Tr S_{π_{m + 1}^{'}} (x) (\log S_{π_{m + 1}^{'}} (x) - \log (p_{π_{m + 1}^{'}} (x) ρ (x))) (I - Q_{x}^{ρ}) \\ - \frac{n_{m}}{n_{m + 1} - n_{m}} \sum_{x} Tr S_{π_{m}^{'}} (x) (\log S_{π_{m + 1}^{'}} (x) - \log (p_{π_{m + 1}^{'}} (x) ρ (x))) (I - Q_{x}^{ρ}) \\ = \frac{n_{m + 1}}{n_{m + 1} - n_{m}} \sum_{x} Tr S_{π_{m + 1}^{'}} (x) (\log S_{π_{m + 1}^{'}} (x) - \log (p_{π_{m + 1}^{'}} (x) ρ (x))) (I - Q_{x}^{ρ}) \\ + \frac{n_{m}}{n_{m + 1} - n_{m}} {- \sum_{x} Tr S_{π_{m}^{'}} (x) (\log S_{π_{m + 1}^{'}} (x) - \log (p_{π_{m + 1}^{'}} (x) ρ (x))) (I - Q_{x}^{ρ}) (I - Q_{x}^{π_{\infty}^{'}}) \\ - \sum_{x} Tr S_{π_{m}^{'}} (x) (\log S_{π_{m + 1}^{'}} (x) - \log (p_{π_{m + 1}^{'}} (x) ρ (x))) Q_{x}^{π_{\infty}^{'}} (I - Q_{x}^{ρ})} \\ \geq \frac{n_{m + 1}}{n_{m + 1} - n_{m}} \sum_{x} Tr S_{π_{m + 1}^{'}} (x) (\log S_{π_{m + 1}^{'}} (x) - \log (p_{π_{m + 1}^{'}} (x) ρ (x))) (I - Q_{x}^{ρ}) \\ + \frac{n_{m}}{n_{m + 1} - n_{m}} {- \sum_{x} Tr S_{π_{m}^{'}} (x) (\log S_{π_{m + 1}^{'}} (x) - \log (p_{π_{m + 1}^{'}} (x) ρ (x))) (I - Q_{x}^{ρ}) (I - Q_{x}^{π_{\infty}^{'}}) \\ + \sum_{x} Tr S_{π_{m}^{'}} (x) \log ρ (x) Q_{x}^{π_{\infty}^{'}} (I - Q_{x}^{ρ})}, \end{matrix}

(A1)

where

Q_{x}^{π_{\infty}^{'}}

is the orthogonal projection matrix onto the eigenspace of

\sum_{θ} π_{\infty}^{'} (θ) p (x ∣ θ) σ_{θ, x}

corresponding to the eigenvalue 0. Here, we have

\begin{matrix} \lim_{m \to \infty} \sum_{x} Tr S_{π_{m}^{'}} (x) (\log S_{π_{m + 1}^{'}} (x) - \log (p_{π_{m + 1}^{'}} ρ (x))) (I - Q_{x}^{ρ}) (I - Q_{x}^{π_{\infty}^{'}}) \\ = \sum_{x} Tr S_{π_{\infty}^{'}} (x) (\log S_{π_{\infty}^{'}} (x) - \log (p_{π_{\infty}^{'}} (x) ρ (x))) (I - Q_{x}^{ρ}) (I - Q_{x}^{π_{\infty}^{'}}), \end{matrix}

(A2)

and

\begin{matrix} \lim_{m \to \infty} \sum_{x} Tr S_{π_{m}^{'}} (x) \log ρ (x) Q_{x}^{π_{\infty}^{'}} (I - Q_{x}^{ρ}) = 0 \\ = - \sum_{x} Tr S_{π_{\infty}^{'}} (x) (\log S_{π_{\infty}^{'}} (x) - \log (p_{π_{\infty}^{'}} (x) ρ (x))) Q_{x}^{π_{\infty}^{'}} (I - Q_{x}^{ρ}) . \end{matrix}

(A3)

Therefore, from (A1)–(A3) and

0 < n_{m} / (n_{m + 1} - n_{m}) < c

for every

θ \in Θ^{ρ}

,

\begin{matrix} {\lim \inf}_{m \to \infty} \sum_{x} Tr S_{θ} (x) (\log S_{π_{m}^{'}} (x) - \log (p_{π_{m}^{'}} (x) ρ (x))) (I - Q_{x}^{ρ}) \\ \geq \sum_{x} Tr S_{π_{\infty}^{'}} (x) (\log S_{π_{\infty}^{'}} (x) - \log (p_{π_{\infty}^{'}} (x) ρ (x))) (I - Q_{x}^{ρ}) \geq 0 . \end{matrix}

(A4)

By taking an appropriate subsequence

{π_{k}^{″}}

of

{π_{m}^{'}}

, we can make the subsequence of density operators

{σ_{π_{k}^{″}, x}}_{k = 1}^{\infty}

converge for all

x \in X^{ρ}

because

p_{π_{m}^{'}} (x) > 0

(x \in X^{ρ})

and

0 \leq S_{π_{m}^{'}} / p_{π_{m}^{'}} (x) \leq I

.

Then, from (A4), if

θ \in Θ^{ρ}

,

\begin{matrix} R (θ, \lim_{k \to \infty} σ_{π_{k}^{″}} (x)) & = \sum_{x} Tr S_{θ} (x) (\log σ_{θ, x} - \log \lim_{k \to \infty} σ_{π_{k}^{″}} (x)) \\ = \sum_{x} Tr S_{θ} (x) (\log σ_{θ, x} - \log \lim_{k \to \infty} σ_{π_{k}^{″}} (x)) (I - Q_{x}^{ρ}) \\ \leq \sum_{x} Tr S_{θ} (x) (\log σ_{θ, x} - \log ρ (x)) (I - Q_{x}^{ρ}) \\ = \sum_{x} Tr S_{θ} (x) (\log σ_{θ, x} - \log ρ (x)) = R (θ, ρ (x)) < \infty . \end{matrix}

If

θ \notin Θ^{ρ}

,

R (θ, ρ) = \infty

because

- \sum_{x} S_{θ} (x) \log ρ (x) Q_{x}^{ρ} = \infty

.

Hence, the risk of the predictive density operator defined by

\begin{matrix} \{\begin{matrix} \lim_{k \to \infty} σ_{π_{k}^{″}} (x), & x \in X^{ρ}, \\ τ_{x}, & x \notin X^{ρ}, \end{matrix} \end{matrix}

where

τ_{x}

is an arbitrary predictive density, is not greater than that of

ρ (x)

for every

θ \in Θ

.

Therefore, by taking a sequence

{ε_{n} \in (0, 1)}_{n = 1}^{\infty}

that converges rapidly enough to 0, we can construct a predictive density operator

\begin{matrix} \lim_{k \to \infty} σ_{ε_{k} \bar{μ} + (1 - ε_{k}) π_{k}^{″}} (x) = \{\begin{matrix} \lim_{k \to \infty} σ_{π_{k}^{″}} (x), & x \in X^{ρ}, \\ σ_{\bar{μ}} (x), & x \notin X^{ρ}, \end{matrix} \end{matrix}

(A5)

as a limit of Bayesian predictive density operators based on priors

{ε_{k} \bar{μ} + (1 - ε_{k}) π_{k}^{″}}

, where

\bar{μ}

is a measure on

Θ

such that

p_{\bar{μ}} (x) > 0

for every

x \in X

.

Hence, the risk of the predictive density operator (A5) is not greater than that of

ρ (x)

for every

θ \in Θ

. ☐

Proof of Theorem 2.

(1) Define

{\tilde{π}}_{\bar{θ}, u} : = u δ_{\bar{θ}} + (1 - u) \hat{π}

for all

θ \in Θ

and

u \in [0, 1]

. Then,

\begin{matrix} \frac{\partial}{\partial u} I_{θ, σ ∣ x} ({\tilde{π}}_{\bar{θ}, u}) |_{u = 0} \\ = \frac{\partial}{\partial u} (\int \sum_{x} Tr S_{θ} (x) \log S_{θ} (x) d {\tilde{π}}_{\bar{θ}, u} (θ) - \sum_{x} S_{{\tilde{π}}_{\bar{θ}, u}} (x) \log S_{{\tilde{π}}_{\bar{θ}, u}} (x) \\ - \int \sum_{x} p (x ∣ θ) \log p (x ∣ θ) d {\tilde{π}}_{\bar{θ}, u} + \sum_{x} p_{{\tilde{π}}_{\bar{θ}, u}} (x) \log p_{{\tilde{π}}_{\bar{θ}, u}} (x)) |_{u = 0} \\ = \sum_{x} Tr S_{\bar{θ}} (x) (\log S_{\bar{θ}} (x) - \log p_{\bar{θ}} (x) I) - \sum_{x} Tr S_{\bar{θ}} (x) (\log S_{\hat{π}} (x) - \log p_{\hat{π}} (x) I) \\ - \int \sum_{x} Tr S_{θ} (x) (\log S_{θ} (x) - \log p (x ∣ θ) I) d \hat{π} (θ) \\ + \sum_{x} Tr S_{\hat{π}} (x) (\log S_{\hat{π}} (x) - \log p_{\hat{π}} (x) I) \leq 0 . \end{matrix}

Since

p_{\hat{π}} (x) > 0

for every

x \in X

and

Tr p (x ∣ θ) σ_{θ, x} \log σ_{θ, x} = 0

if

p (x | θ) = 0

, we have

\begin{matrix} \sum_{x} Tr S_{\bar{θ}} (x) (\log σ_{\bar{θ}, x} - \log σ_{\hat{π}} (x)) \leq \int \sum_{x} Tr S_{θ} (x) (\log σ_{θ, x} - \log σ_{\hat{π}} (x)) d \hat{π} (θ) \end{matrix}

(A6)

for every

θ \in Θ

.

On the other hand, we have

\begin{matrix} \int \sum_{x} Tr S_{θ} (x) (\log σ_{θ, x} - \log σ_{\hat{π}} (x)) d \hat{π} (θ) \\ = \inf_{ρ} \int \sum_{x} Tr S_{θ} (x) (\log σ_{θ, x} - \log ρ (x)) d \hat{π} (θ) \\ \leq \sup_{π \in P} \inf_{ρ} \int \sum_{x} Tr S_{θ} (x) (\log σ_{θ, x} - \log ρ (x)) d \hat{π} (θ) \\ \leq \inf_{ρ} \sup_{π \in P} \int \sum_{x} Tr S_{θ} (x) (\log σ_{θ, x} - \log ρ (x)) d \hat{π} (θ) \\ = \inf_{ρ} \sup_{θ \in Θ} \sum_{x} Tr S_{θ} (x) (\log σ_{θ, x} - \log ρ (x)) \\ \leq \sup_{θ \in Θ} \sum_{x} Tr S_{θ} (x) (\log σ_{θ, x} - \log σ_{\hat{π}} (x)) . \end{matrix}

(A7)

Here, the first equality is from the fact [4] that the Bayes risk with respect to

\hat{π} \in P

\begin{matrix} \int R (θ; ρ (x)) d \hat{π} (θ) = \int \sum_{x} p (x ∣ θ) Tr σ_{θ, x} (\log σ_{θ, x} - \log ρ (x)) d \hat{π} (θ) \end{matrix}

is minimized when

\begin{matrix} ρ (x) = σ_{\hat{π}} (x) : = \frac{\int p (x ∣ θ) σ_{θ, x} d \hat{π} (θ)}{\int p (x ∣ θ) d \hat{π} (θ)} . \end{matrix}

From (A6) and (A7), we have

\begin{matrix} \inf_{ρ} \sup_{θ \in Θ} \sum_{x} Tr S_{θ} (x) (\log σ_{θ, x} - ρ (x)) = \sup_{θ \in Θ} \sum_{x} Tr S_{θ} (x) (\log σ_{θ, x} - \log σ_{\hat{π}} (x)) . \end{matrix}

Therefore, the predictive density operator

σ_{\hat{π}} (x)

is minimax.

(2) Let

μ

be a probability measure on

Θ

such that

p_{μ} (x) : = \int p (x ∣ θ) d μ (θ) > 0

for every

x \in X

, and let

π_{n} \in P_{μ / n} : = {μ / n + (1 - 1 / n) π ∣ π \in P}

be a prior satisfying

I_{θ, σ | x} (π_{n}) = \sup_{π \in P_{μ / n}} I_{θ, σ | x} (π)

. From Lemma 3, there exists a convergent subsequence

{π_{m}^{'}}

of

{π_{n}}

and

I_{θ, σ | x} (π_{\infty}^{'}) = \sup_{π \in P} I_{θ, σ ∣ x} (π)

where

π_{m}^{'} \Rightarrow π_{\infty}^{'}

. Let

n_{m}

be the integer satisfying

π_{m}^{'} = π_{n_{m}}

. As in the proof of Theorem 1, we can make the subsequence

{π_{m}^{'}}

satisfy

0 < n_{m} / (n_{m + 1} - n_{m}) < c

for some positive constant c.

Then, for every

\bar{θ} \in Θ

,

\begin{matrix} {\tilde{π}}_{m, \bar{θ}, u} : = u \{\frac{n_{m}}{n_{m + 1}} π_{m}^{'} + (1 - \frac{n_{m}}{n_{m + 1}}) δ_{\bar{θ}}\} + (1 - u) π_{m + 1}^{'} \end{matrix}

belongs to

P_{μ / n_{m + 1}}

for

0 \leq u \leq 1

because

(n_{m} / n_{m + 1}) π_{m}^{'} + (1 - n_{m} / n_{m + 1}) δ_{\bar{θ}} \in P_{μ / n_{m + 1}}

and

π_{m + 1}^{'} \in P_{μ / n_{m + 1}}

.

Thus,

\begin{matrix} \frac{\partial}{\partial u} I_{θ, ρ ∣ x} ({\tilde{π}}_{m, \bar{θ}, u}) |_{u = 0} \\ = \frac{\partial}{\partial u} (\int \sum_{x} Tr S_{θ} (x) \log S_{θ} (x) d {\tilde{π}}_{m, \bar{θ}, u} (θ) - \sum_{x} Tr S_{{\tilde{π}}_{m, \bar{θ}, u}} (x) \log S_{{\tilde{π}}_{m, \bar{θ}, u}} (x) \\ - \int \sum_{x} p (x ∣ θ) \log p (x ∣ θ) d {\tilde{π}}_{m, \bar{θ}, u} (θ) + \sum_{x} p_{{\tilde{π}}_{m, \bar{θ}, u}} (x) \log p_{{\tilde{π}}_{m, \bar{θ}, u}}) |_{u = 0} \\ = \frac{n_{m}}{n_{m + 1}} \int \sum_{x} Tr S_{θ} (x) \log S_{θ} (x) d π_{m}^{'} (θ) + (1 - \frac{n_{m}}{n_{m + 1}}) \sum_{x} Tr S_{\bar{θ}} (x) \log S_{\bar{θ}} (x) \\ - \int \sum_{x} Tr S_{θ} (x) \log S_{θ} (x) d π_{m + 1}^{'} (θ) - \sum_{x} Tr \frac{\partial}{\partial u} S_{{\tilde{π}}_{m, \bar{θ}, u}} (x) |_{u = 0} \log S_{π_{m + 1}^{'}} (x) \\ - \frac{n_{m}}{n_{m + 1}} \int \sum_{x} p (x ∣ θ) \log p (x ∣ θ) d π_{m + 1}^{'} (θ) - (1 - \frac{n_{m}}{n_{m + 1}}) \sum_{x} p_{\bar{θ}} (x) \log p_{\bar{θ}} (x) \\ + \int \sum_{x} p (x ∣ θ) \log p (x ∣ θ) d π_{m + 1}^{'} (θ) + \sum_{x} \frac{\partial}{\partial u} p_{{\tilde{π}}_{m, \bar{θ}, u}} (x) |_{u = 0} \log p_{π_{m + 1}^{'}} (x) \\ = (1 - \frac{n_{m}}{n_{m + 1}}) \sum_{x} Tr S_{\bar{θ}} (x) (\log S_{\bar{θ}} (x) - \log p (x ∣ \bar{θ}) I) \\ - (1 - \frac{n_{m}}{n_{m + 1}}) \sum_{x} Tr S_{\bar{θ}} (x) (\log S_{π_{m + 1}^{'}} (x) - \log p_{π_{m + 1}^{'}} (x) I) \\ + \frac{n_{m}}{n_{m + 1}} \int \sum_{x} Tr S_{θ} (x) (\log S_{θ} (x) - \log p (x ∣ θ) I) d π_{m}^{'} (θ) \\ - \int \sum_{x} Tr S_{θ} (x) (\log S_{θ} (x) - \log p (x ∣ θ)) d π_{m + 1}^{'} (θ) \\ - \frac{n_{m}}{n_{m + 1}} \sum_{x} Tr S_{π_{m}^{'}} (x) (\log S_{π_{m + 1}^{'}} (x) - \log p_{π_{m + 1}^{'}} (x) I) \\ + \sum_{x} Tr S_{π_{m + 1}^{'}} (x) (\log S_{π_{m + 1}^{'}} (x) - \log p_{π_{m + 1}^{'}} (x) I) \\ \leq 0 . \end{matrix}

Since

p_{{\hat{π}}_{m}} (x) > 0

for every m and

p (x ∣ θ) σ_{θ, x} \log σ_{θ, x} = 0

if

p (x ∣ θ) = 0

, we have

\begin{matrix} (1 - \frac{n_{m}}{n_{m + 1}}) \sum_{x} Tr S_{\bar{θ}} (x) (\log σ_{\bar{θ}, x} - \log σ_{π_{m + 1}^{'}} (x)) \\ + \frac{n_{m}}{n_{m + 1}} \int \sum_{x} Tr S_{θ} (x) (\log σ_{θ, x} - \log σ_{π_{m + 1}^{'}} (x)) d π_{m}^{'} (θ) \\ - \int \sum_{x} Tr S_{θ} (x) (\log σ_{θ, x} - \log σ_{π_{m + 1}^{'}} (x)) d π_{m + 1}^{'} (θ) \leq 0 . \end{matrix}

Hence,

\begin{matrix} \sum_{x} Tr S_{\bar{θ}} (x) (\log σ_{\bar{θ}} (x) - \log σ_{π_{m + 1}^{'}} (x)) \\ \leq - \frac{n_{m}}{n_{m + 1} - n_{m}} {\int \sum_{x} Tr S_{θ} (x) (\log σ_{θ, x} - \log σ_{π_{m + 1}^{'}} (x)) (1 - Q_{x}^{π_{\infty}^{'}}) d π_{m}^{'} (θ) \\ + \int \sum_{x} Tr S_{θ} (x) (\log σ_{θ, x} - \log σ_{π_{m + 1}^{'}} (x)) Q_{x}^{π_{\infty}^{'}} d π_{m}^{'} (θ)} \\ + \frac{n_{m + 1}}{n_{m + 1} - n_{m}} \int \sum_{x} Tr S_{θ} (x) (\log σ_{θ, x} - \log σ_{π_{m + 1}^{'}} (x)) d π_{m + 1}^{'} (θ) \\ \leq - \frac{n_{m}}{n_{m + 1} - n_{m}} {\int \sum_{x} Tr S_{θ} (x) (\log σ_{θ, x} - \log σ_{π_{m + 1}^{'}} (x)) (1 - Q_{x}^{π_{\infty}^{'}}) d π_{m}^{'} (θ) \\ + \int \sum_{x} Tr S_{θ} (x) \log σ_{θ, x} Q_{x}^{π_{\infty}^{'}} d π_{m}^{'} (θ)} \\ + \frac{n_{m + 1}}{n_{m + 1} - n_{m}} \int \sum_{x} Tr S_{θ} (x) (\log σ_{θ, x} - \log σ_{π_{m + 1}^{'}} (x)) d π_{m + 1}^{'} (θ), \end{matrix}

(A8)

where

Q_{x}^{π_{\infty}^{'}}

is the orthogonal projection matrix onto the eigenspace of

S_{π_{\infty}^{'}} (x)

corresponding to the eigenvalue 0. Here, we used two equalities

\begin{matrix} \lim_{m \to \infty} \int \sum_{x} Tr S_{θ} (x) (\log σ_{θ, x} - \log σ_{π_{m + 1}^{'}} (x)) (1 - Q_{x}^{π_{\infty}^{'}}) d π_{m}^{'} (θ) \\ = \int \sum_{x} Tr S_{θ} (x) (\log (p_{π_{\infty}^{'}} (x) σ_{θ, x}) - \log S_{π_{\infty}^{'}} (x)) d π_{\infty}^{'} (θ) \end{matrix}

(A9)

and

\begin{matrix} \lim_{m \to \infty} \int \sum_{x} Tr S_{θ} (x) \log σ_{θ, x} Q_{x}^{π_{\infty}^{'}} d π_{m}^{'} (θ) \\ = & \int \sum_{x} Tr S_{θ} (x) \log σ_{θ, x} Q_{x}^{π_{\infty}^{'}} d π_{\infty}^{'} (θ) \\ = & \int \sum_{x} Tr S_{θ} (x) (\log (p_{π_{\infty}^{'}} (x)) σ_{θ, x}) - \log S_{π_{\infty}^{'}, x}) Q_{x}^{π_{\infty}^{'}} d π_{\infty}^{'} (θ) = 0, \end{matrix}

(A10)

since

Tr S_{θ} (x) \log σ_{θ, x}

is a bounded continuous function of

θ

.

From (A8)–(A11), and

0 < n_{m} / (n_{m + 1} - n_{m}) < c

, we have, for every

\bar{θ} \in Θ

,

\begin{matrix} \underset{m \to \infty}{\lim \sup} \sum_{x} Tr S_{\bar{θ}} (x) (\log σ_{\bar{θ}} (x) - \log σ_{π_{m}^{'}} (x)) \\ \leq & \int \sum_{x} Tr S_{θ} (x) (\log (p_{π_{\infty}^{'}} (x) σ_{θ} (x)) - \log S_{π_{\infty}^{'}} (x)) d π_{\infty}^{'} (θ) . \end{matrix}

By taking an appropriate subsequence

{π_{k}^{″}}

of

{π_{m}^{'}}

, we can make

{σ_{π_{k}^{″}} (x)}_{k = 1}^{\infty}

converge for every x. Then, for every

\bar{θ} \in Θ

,

\begin{matrix} \sum_{x} Tr S_{θ} (x) (\log σ_{\bar{θ}, x} - \log \lim_{k \to \infty} σ_{π_{k}^{″}} (x)) \end{matrix}

(A11)

\begin{matrix} \leq & \int \sum_{x} S_{θ} (x) (\log (σ_{θ, x} - \log \lim_{k \to \infty} σ_{π_{k}^{″}} (x)) d π_{\infty}^{″} (θ), \end{matrix}

(A12)

since

\lim_{k \to \infty} σ_{π_{k}^{″}} (x) = σ_{π_{\infty}^{″}} (x)

for x with

p_{π_{\infty}^{″}} (x) > 0

.

On the other hand, we have

\begin{matrix} \int \sum_{x} Tr S_{θ} (x) (\log σ_{θ, x} - \log \lim_{k \to \infty} σ_{π_{k}^{″}} (x)) d π_{\infty}^{″} (θ) \\ = \inf_{ρ} \int \sum_{x} Tr S_{θ} (x) (\log σ_{θ, x} - \log ρ (x)) d π_{\infty}^{″} (θ) \\ \leq \sup_{π \in P} \inf_{ρ} \int \sum_{x} Tr S_{θ} (x) (\log σ_{θ, x} - \log ρ (x)) d π (θ) \\ \leq \inf_{ρ} \sup_{π \in P} \int \sum_{x} Tr S_{θ} (x) (\log σ_{θ, x} - \log ρ (x)) d π (θ) \\ = \inf_{ρ} \sup_{θ \in Θ} \sum_{x} Tr S_{θ} (x) (\log σ_{θ, x} - \log ρ (x)) \\ \leq \sup_{θ \in Θ} \sum_{x} Tr S_{θ} (x) (\log σ_{θ, x} - \log \lim_{k \to \infty} σ_{π_{k}^{″}} (x)) . \end{matrix}

(A13)

Here, the first equality is from the fact [4] that the Bayes risk

\begin{matrix} \int R (θ; ρ) d π_{\infty}^{″} (θ) = \int \sum_{x} Tr p (x ∣ θ) σ_{θ, x} (\log σ_{θ, x} - \log ρ (x)) d π_{\infty}^{″} (θ) \end{matrix}

is minimized when

ρ (x) = σ_{π_{\infty}^{″}} (x)

. Although

p_{π_{\infty}^{″}} (x)

is not uniquely determined for x with

p_{π_{\infty}^{″}} (x) = 0

, the Bayes risk does not depend on the choice of

σ_{π_{\infty}^{″}} (x)

for such x.

From (A12) and (A13),

\begin{matrix} \inf_{ρ} \sup_{θ \in Θ} \sum_{x} Tr p (x ∣ θ) σ_{θ, x} (\log σ_{θ, x} - ρ (x)) \\ = & \sup_{θ \in Θ} \sum_{x} Tr p (x ∣ θ) σ_{θ, x} (\log σ_{θ, x} - \log \lim_{k \to \infty} σ_{π_{k}^{″}} (x)) . \end{matrix}

Therefore, the predictive density operator

\lim_{k \to \infty} σ_{π_{k}^{″}} (x)

is minimax. ☐

References

Barndorff-Nielsen, O.E.; Gill, R.D.; Jupp, P.E. On quantum statistical inference. J. R. Stat. Soc. B 2003, 65, 775–804. [Google Scholar] [CrossRef]
Holevo, A.S. Probabilistic and Statistical Aspects of Quantum Theory; Elsevier: Amsterdam, The Netherlands, 1982. [Google Scholar]
Paris, M.; Rehacek, J. Quantum State Estimation; Springer: Berlin, Germany, 2004. [Google Scholar]
Tanaka, F.; Komaki, F. Bayesian predictive density operators for exchangeable quantum-statistical models. Phys. Rev. A 2005, 71, 052323. [Google Scholar] [CrossRef]
Tanaka, F. Bayesian estimation of the wave function. Phys. Lett. A 2012, 376, 2471–2476. [Google Scholar] [CrossRef]
Tanaka, F. Noninformative prior in the quantum statistical model of pure states. Phys. Rev. A 2012, 85, 062305. [Google Scholar] [CrossRef]
Geisser, S. Predictive Inference: An Introduction; Chapman & Hall: London, UK, 1993. [Google Scholar]
Csiszar, I. Axiomatic characterizations of information measures. Entropy 2008, 10, 261–273. [Google Scholar] [CrossRef]
Aitchison, J. Goodness of prediction fit. Biometrika 1975, 62, 547–554. [Google Scholar] [CrossRef]
Komaki, F. Bayesian predictive densities based on latent information priors. J. Stat. Plan. Inference 2011, 141, 3705–3715. [Google Scholar] [CrossRef]
Bernardo, J.M. Reference posterior distributions for Bayesian inference. J. R. Stat. Soc. B 1979, 41, 113–147. [Google Scholar]
Petz, D. Quantum Information and Quantum Statistics; Springer: New York, NY, USA, 2008. [Google Scholar]
Holevo, A.S. Quantum Systems, Channels, Information: A Mathematical Introduction; Walter de Gruyter: Berlin, Germany, 2013. [Google Scholar]
Appleby, D.M. SIC-POVMs and the extended Clifford group. J. Math. Phys. 2004, 46, 547–554. [Google Scholar]
Renes, J.M.; Blume-Kohout, R.; Scott, A.J.; Caves, C.M. Symmetric informationally complete quantum measurements. J. Math. Phys. 2004, 45, 2171–2180. [Google Scholar] [CrossRef]
Ferrie, C.; Blume-Kohout, R. Minimax quantum tomography: Estimators and relative entropy bounds. Phys. Rev. Lett. 2016, 116, 090407. [Google Scholar] [CrossRef] [PubMed]
Tanaka, F. Quantum minimax theorem. arXiv 2014, arXiv:1410:3639. [Google Scholar]
Billingsley, P. Convergence of Probability Measures; Wiley: New York, NY, USA, 1999. [Google Scholar]
Hiai, F.; Petz, D. Introduction to Matrix Analysis and Applications; Springer: New York, NY, USA, 2014. [Google Scholar]
Haapasalo, E.; Heinosaari, T.; Pellonpää, J.P. Quantum measurements on finite dimensional systems: Relabeling and mixing. Quantum Inf. Process. 2012, 11, 1751–1763. [Google Scholar] [CrossRef]

Figure 1. Risk functions of predictive density operators. solid line:

g (r)

, dashed line:

g_{0} (r)

.

Figure 1. Risk functions of predictive density operators. solid line:

g (r)

, dashed line:

g_{0} (r)

.

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Koyama, T.; Matsuda, T.; Komaki, F. Minimax Estimation of Quantum States Based on the Latent Information Priors. Entropy 2017, 19, 618. https://doi.org/10.3390/e19110618

AMA Style

Koyama T, Matsuda T, Komaki F. Minimax Estimation of Quantum States Based on the Latent Information Priors. Entropy. 2017; 19(11):618. https://doi.org/10.3390/e19110618

Chicago/Turabian Style

Koyama, Takayuki, Takeru Matsuda, and Fumiyasu Komaki. 2017. "Minimax Estimation of Quantum States Based on the Latent Information Priors" Entropy 19, no. 11: 618. https://doi.org/10.3390/e19110618

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Minimax Estimation of Quantum States Based on the Latent Information Priors

Abstract

1. Introduction

2. Preliminaries

2.1. Quantum States and Measurements

2.2. Quantum State Estimation

2.3. Notations

3. Minimax Estimation of Quantum States

4. One Qubit System

Acknowledgments

Author Contributions

Conflicts of Interest

Appendix A. Proofs

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI