Next Article in Journal
Information Fusion in a Multi-Source Incomplete Information System Based on Information Entropy
Next Article in Special Issue
Transfer Entropy as a Tool for Hydrodynamic Model Validation
Previous Article in Journal
Maximum Entropy-Copula Method for Hydrological Risk Analysis under Uncertainty: A Case Study on the Loess Plateau, China
Previous Article in Special Issue
Use of Mutual Information and Transfer Entropy to Assess Interaction between Parasympathetic and Sympathetic Activities of Nervous System from HRV
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Minimax Estimation of Quantum States Based on the Latent Information Priors

1
FANUC Corporation, 3580 Furubaba Shibokusa Oshino-mura, Yamanashi 401-0597, Japan
2
Department of Mathematical Informatics, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-8656, Japan
3
RIKEN Brain Science Institute, 2-1 Hirosawa, Wako-shi, Saitama 351-0198, Japan
*
Author to whom correspondence should be addressed.
Entropy 2017, 19(11), 618; https://doi.org/10.3390/e19110618
Submission received: 13 September 2017 / Revised: 12 November 2017 / Accepted: 13 November 2017 / Published: 16 November 2017
(This article belongs to the Special Issue Transfer Entropy II)

Abstract

:
We develop priors for Bayes estimation of quantum states that provide minimax state estimation. The relative entropy from the true density operator to a predictive density operator is adopted as a loss function. The proposed prior maximizes the conditional Holevo mutual information, and it is a quantum version of the latent information prior in classical statistics. For one qubit system, we provide a class of measurements that is optimal from the viewpoint of minimax state estimation.

1. Introduction

In quantum mechanics, the outcome of a measurement is subject to a probability distribution determined from the quantum state of the measured system and the measurement performed. The task of estimating the quantum state from the outcome of measurement is called the quantum estimation and it is a fundamental problem in quantum statistics [1,2,3]. Tanaka and Komaki [4] and Tanaka [5] discussed quantum estimation using the framework of statistical decision theory and showed that Bayesian methods provide better estimation than the maximum likelihood method. In Bayesian methods, we need to specify a prior distribution on the unknown parameters of the quantum states. However, the problem of prior selection has not been fully discussed for quantum estimation [6].
The quantum state estimation problem is related to the predictive density estimation problem in classical statistics [7]. This is a problem of predicting the distribution of an unobserved variable y based on an observed variable x. Suppose ( x , y ) p ( x , y θ ) , where θ denotes an unknown parameter. Based on the observed x, we predict the distribution p ( y x , θ ) of y using a predictive density p ^ ( y x ) . The plug-in predictive density is defined as p ^ plug in ( y x ) = p ( y x , θ ^ ( x ) ) , where θ ^ ( x ) is some estimate of θ from x. The Bayesian predictive density with respect to a prior distribution d π ( θ ) is defined as
p ^ π ( y x ) = p ( y x , θ ) d π ( θ x ) = p ( y x , θ ) p ( x θ ) d π ( θ ) p ( x θ ) d π ( θ ) ,
where d π ( θ x ) is the posterior distribution. We compare predictive densities using the framework of statistical decision theory. Specifically, a loss function L ( q , p ) is introduced that evaluates the difference between the true density q and the predictive density p. Then, the risk function R ( θ , p ^ ) is defined as the average loss when the true value of the parameter is θ :
R ( θ , p ^ ) = L ( p ( y x , θ ) , p ^ ( y x ) ) p ( x θ ) d x .
A predictive density p ^ is called minimax if it minimizes the maximum risk among all predictive densities:
max θ R ( θ , p ^ ) = min p ^ max θ R ( θ , p ^ ) .
We adopt the Kullback–Leibler divergence
L ( q , p ) = q ( x ) log q ( x ) p ( x ) d x
as a loss function, since it satisfies many desirable properties compared to other loss functions such as the Hellinger distance and the total variation distance [8]. Under this setting, Aitchison [9] proved
R ( π , p ^ π ) = min p ^ R ( π , p ^ ) ,
where
R ( π , p ^ ) = R ( θ , p ^ ) π ( θ ) d θ
is called the Bayes risk. Namely, the Bayesian predictive density p ^ π ( y x ) minimizes the Bayes risk. We provide the proof of Equation (4) in the Appendix A. Therefore, it is sufficient to consider only Bayesian predictive densities from the viewpoint of Kullback–Leibler risk, and the selection of the prior π becomes important.
For the predictive density estimation problem above, Komaki [10] developed a class of priors called the latent information priors. The latent information prior π LIP is defined as a prior that maximizes the conditional mutual information I θ , y x ( π ) between the parameter θ and the unobserved variable y given the observed variable x. Namely,
I θ , y x ( π LIP ) = max π I θ , y x ( π ) ,
where
I θ , y x ( π ) = x , y p ( x , y θ ) log p ( x , y θ ) d π ( θ ) x , y p π ( x , y ) log p π ( x , y ) x p ( x θ ) log p ( x θ ) d π ( θ ) + x p π ( x ) log p π ( x )
is the conditional mutual information between y and θ given x. Here,
p π ( x , y ) = p ( x , y θ ) d π ( θ ) , p π ( x ) = p ( x θ ) d π ( θ )
are marginal densities. The Bayesian predictive densities based on the latent information priors are minimax under the Kullback–Leibler risk:
max θ R ( θ , p ^ π ) = min p ^ max θ R ( θ , p ^ ) .
The latent information prior is a generalization of the reference prior [11] that is a prior maximizing the unconditional mutual information I θ , y ( π ) between θ and y.
Now, we consider the problem of estimating the quantum state of a system Y based on the outcome of a measurement on a system X . Suppose the quantum state of the composed system ( X , Y ) be σ θ X Y where θ denotes an unknown parameter. We perform a measurement on the system X and obtain the outcome x. Based on the measurement outcome x, we estimate the state of the system Y by a predictive density operator ρ ( x ) . Similarly to the Bayesian predictive density (1), the Bayesian predictive density operator with respect to the prior d π ( θ ) is defined as
σ π Y ( x ) = σ θ , x Y d π ( θ x ) = σ θ , x Y p ( x θ ) d π ( θ ) p ( x θ ) d π ( θ ) ,
where d π ( θ x ) is the posterior distribution. Like the predictive density estimation problem discussed above, we compare predictive density operators using the framework of statistical decision theory. There are several possibilities for the loss function L ( σ , ρ ) in quantum estimation such as the fidelity and the trace norm [12]. In this paper, we adopt the quantum relative entropy
L ( σ , ρ ) = Tr σ ( log σ log ρ )
as a loss function, since it is a quantum analogue of the Kullback–Leibler divergence (3). Note that the fidelity and the trace norm correspond to the Hellinger distance and the total variation distance in the classical statistics, respectively. Under this setting, Tanaka and Komaki [4] proved that the Bayesian predictive density operators minimize the Bayes risk:
R ( θ , σ π Y ) d π ( θ ) = min ρ R ( θ , ρ ) d π ( θ ) .
This is a quantum version of Equation (4).
From Tanaka and Komaki [4], the selection of the prior becomes important also in quantum estimation. However, this problem has not been fully discussed [6]. In this paper, we provide a quantum version of the latent information priors and prove that they provide minimax predictive density operators. Whereas the latent information prior in the classical case maximizes the conditional Shannon mutual information, the proposed prior maximizes the conditional Holevo mutual information. The Holevo mutual information, which is a quantum version of the Shannon mutual information, is a fundamental quantity in the classical-quantum communication [13]. Our result shows that the conditional Holevo mutual information also has a natural meaning in terms of quantum estimation.
Unlike the classical statistics, the measurement is not unique in quantum statistics. Therefore, selection of the measurement also becomes important. From the viewpoint of minimax state estimation, measurements that minimize the minimax risk are considered to be optimal. We provide a class of optimal measurements for one qubit system. This class includes the symmetric informationally complete measurement [14,15]. These measurements and latent information priors provide robust quantum estimation.

2. Preliminaries

2.1. Quantum States and Measurements

We briefly summarize several notations of quantum states and measurements. Let H be a separable Hilbert space of a quantum system. A Hermitian operator ρ on H is called a density operator if it satisfies
Tr ρ = 1 , ρ 0 .
The state of a quantum system is described by a density operator. We denote the set of all density operators on H as S ( H ) .
Denote the set of all linear operators on Hilbert space H by L ( H ) and the set of all positive linear operators by L + ( H ) L ( H ) . Let Ω be a measurable space of all possible outcomes of a measurement and B ( Ω ) be a σ -algebra of Ω . A map E : B ( Ω ) L + ( H ) is called a positive operator-valued measure (POVM) if it satisfies E ( ) = O , E ( Ω ) = I , and E ( i B i ) = i E ( B i ) , where B i B j = , B i B ( H ) . Any quantum measurement is represented by a POVM on Ω . In this paper, we mainly assume Ω is finite. In such case, we denote Ω = X = { 1 , , N } and any POVM is represented by a set of positive Hermitian operators E = { E x x X } such that x X E x = I .
The outcome of a measurement E on a quantum system with the state ρ S ( H ) is distributed with a probability measure
Pr ( B ) = Tr E ( B ) ρ , B B ( Ω ) .
Let X , Y be quantum systems with Hilbert spaces H X and H Y . The Hilbert space of the composed system ( X , Y ) is given by the tensor product H X H Y . Suppose the state of this composed system is σ X Y . Then, the states of two subsystems can be yielded by the partial trace:
σ X = Tr Y σ X Y , σ Y = Tr X σ X Y .
If a measurement E = { E x x X } is performed on the system X and the measurement outcome is x, then the state of the system Y becomes
σ x Y = 1 p x Tr X ( E x I Y ) σ X Y ,
where the normalization constant
p x = Tr ( E x I Y ) σ X Y
is the probability of the outcome x. Here, I Y is the identity operator on the space H Y . We call the operator σ x Y the conditional density operator.

2.2. Quantum State Estimation

We formulate the quantum state estimation problem using the framework of statistical decision theory. Let X and Y be quantum systems with finite-dimensional Hilbert spaces H X and H Y , where dim H X = d X and dim H Y = d Y .
Suppose the state of the composed system ( X , Y ) be σ θ X Y , where θ Θ denotes unknown parameters. We perform a measurement E = { E x x X } on X , observe the outcome x X , and estimate the conditional density operator σ θ , x Y of Y by a predictive density operator ρ ( x ) . As discussed in the introduction (1) and (7), the Bayesian predictive density operator based on a prior π ( θ ) is defined by
σ π Y ( x ) = σ θ , x Y d π ( θ x ) = σ θ , x Y p ( x θ ) d π ( θ ) p ( x θ ) d π ( θ ) ,
where d π ( θ x ) is the posterior distribution.
To evaluate predictive density operators, we introduce a loss function L ( σ , ρ ) that evaluates the difference between the true conditional density operator σ and the predictive density operator ρ . In this paper, we adopt the quantum relative entropy (8) since it is a quantum analogue of the Kullback–Leibler divergence (3). Then, the risk function R ( θ , ρ ) of a predictive density operator ρ is defined by
R ( θ , ρ ) = x X p ( x θ ) Tr σ θ , x Y ( log σ θ , x Y log ρ ( x ) ) ,
where
p ( x θ ) = Tr ( E x I Y ) σ θ X Y = Tr E x σ θ X
is the probability of the outcome x. Similarly to the classical case (2), a predictive density operator ρ is called minimax if it minimizes the maximum risk among all predictive density operators [16,17]:
max θ R ( θ , ρ ) = min ρ max θ R ( θ , ρ ) .
Tanaka and Komaki [4] showed
R ( π , σ π Y ) = min ρ R ( π , ρ ) ,
where
R ( π , ρ ) = R ( θ , ρ ) d π ( θ )
is called the Bayes risk. Namely, the Bayesian predictive density operator minimizes the Bayes risk. This result is a quantum version of Equation (4). Although Tanaka and Komaki [4] considered separable models ( σ θ X Y = σ θ X σ θ Y ), the relation (9) holds also for non-separable models as shown in the Appendix A. Therefore, it is sufficient to consider only Bayesian predictive density operators and the problem of prior selection becomes crucial.

2.3. Notations

For a quantum state family { σ θ X Y θ Θ } , we define another quantum state family
M = { x p ( x θ ) σ θ , x Y θ Θ } ,
where
x p ( x θ ) σ θ , x Y = p ( 1 θ ) σ θ , 1 Y O O O p ( 2 θ ) σ θ , 2 Y O O O p ( N θ ) σ θ , N Y
is a density operator in C N H Y . Since dim C N H Y = N d Y , the state family M can be regarded as a subset of the Euclidean space R N 2 d Y 2 1 . By identifying Θ with M , the parameter space Θ is endowed with the induced topology as a subset of R N d Y 2 1 .
Any measurement on the system X is represented by a projective measurement { e x x = | x x | x = 1 , , N } , where { | 1 , , | N } is an orthonormal basis of C N . For every x X , we define S θ ( x ) L + ( H Y ) as
S θ ( x ) : = Tr C N ( e x x I Y ) ( x p ( x θ ) σ θ , x Y ) = p ( x θ ) σ θ , x ,
which is the unnormalized state of Y conditional on the measurement outcome x. We also define
S π ( x ) = S θ ( x ) d π ( θ ) , p π ( x ) = Tr S π ( x ) , σ π ( x ) = S π ( x ) p π ( x ) .

3. Minimax Estimation of Quantum States

In this section, we develop the latent information prior for quantum state estimation and show that this prior provides a minimax predictive density operator.
In the following, we assume the following conditions:
  • Θ is compact.
  • For every x X , E x O .
  • For every x X , there exists θ Θ such that p ( x θ ) = Tr E x σ θ X > 0 .
The third assumption is achieved by adopting sufficiently small Hilbert space. Namely, if there exists x X such that p ( x θ ) = Tr E x σ θ X = 0 for every θ Θ , then we redefine the state space H as the orthogonal complement of Ker E x .
Let P be the set of all probability measures on Θ endowed with the weak convergence topology and the corresponding Borel algebra. By the Prohorov theorem [18] and the first assumption, P is compact.
When x is fixed, the function θ Θ S θ ( x ) is bounded and continuous. Thus, for every fixed x X , the function
π P S π ( x ) = S θ ( x ) d π ( θ )
is continuous because P is endowed with the weak convergence topology and dim H Y < . Let { λ x , i } i , { | ϕ x , i } i be the eigenvalues and the normalized eigenvectors of the predictive density operator ρ ( x ) . For every predictive density operator ρ , consider the function from P to [ 0 , ] defined by
D ρ ( π ) = x Tr S π ( x ) ( log S π ( x ) log ( p π ( x ) ρ ( x ) ) ) = x Tr S π ( x ) ( log S π ( x ) ( log p π ( x ) ) I log ρ ( x ) ) = x Tr S π ( x ) log S π ( x ) x p π ( x ) log p π ( x ) + i : λ x , i 0 p π ( x ) ϕ x , i | σ π ( x ) | ϕ x , i log λ x , i + i : λ x , i = 0 p π ( x ) ϕ x , i | σ π ( x ) | ϕ x , i log λ x , i .
The last term in (10) is lower semicontinuous under the definition 0 log 0 = 0 [10], since each summand takes either zero or infinity and so the set of π P such that this term takes zero is closed. In addition, the other terms in (10) are continuous since the von Neumann entropy is continuous [12]. Therefore, the function D ρ ( π ) in (10) is lower-semicontinuous.
Now, we prove that the class of predictive density operators that are limits of Bayesian predictive density operators is an essentially complete class. We prepare three lemmas. Lemma 1 is useful for differentiation of quantum relative entropy (see Hiai and Petz [19]). Lemmas 2 and 3 are from Komaki [10].
Lemma 1.
Let A , B be n-dimensional self-adjoint matrices and t be a real number. Assume that f : ( α , β ) R is a continuously differentiable function defined on an interval and assume that the eigenvalues of A + t B are in ( α , β ) if t is sufficiently close to t 0 . Then,
d d t Tr f ( A + t B ) | t = t 0 = Tr ( B f ( A + t 0 B ) ) .
Lemma 2
([10]). Let μ be a probability measure on Θ. Then,
P ε μ = { ε μ + ( 1 ε ) π π P }
is a closed subset of P for 0 ε 1 .
Lemma 3
([10]). Let f : P [ 0 , ] be continuous, and let μ be a probability measure on Θ such that p μ ( x ) : = p ( x θ ) d μ ( θ ) > 0 for every x X . Then, there is a probability measure π n in
P μ / n = 1 n μ + 1 1 n π π P
for every n, such that f ( π n ) = inf π P μ / n f ( π ) . Furthermore, there exists a convergent subsequence { π m } m = 1 of { π n } n = 1 and the equality f ( π ) = inf π P f ( π ) holds, where π = lim m π m .
By using these results, we obtain the following theorem, which is a quantum version of Theorem 1 of Komaki [10].
Theorem 1.
(1) 
Let ρ ( x ) be a predictive density operator. If there exists a prior π ^ ρ P such that D ρ ( π ^ ρ ) = inf π P D ρ ( π ) and p π ^ ρ ( x ) > 0 for every x X , then R ( θ , σ π ^ ρ ( x ) ) R ( θ , ρ ( x ) ) for every θ Θ .
(2) 
For every predictive density operator ρ, there exists a convergent prior sequence { π n ρ } n = 1 such that D ρ ( lim n π n ρ ) = inf π P D ρ ( π ) , lim n σ π n ρ ( x ) exists, and R ( θ , lim n σ π n ρ ( x ) ) R ( θ , ρ ) for every θ Θ .
Next, we develop priors that provide minimax predictive density operators. Let x be a random variable, which represents the outcome of the measurement, i.e., x p ( · θ ) . Then, as a quantum analogue of the conditional mutual information (5), we define the conditional Holevo mutual information [13] between the quantum state σ x Y of Y and the parameter θ given the measurement outcome x as
I θ , σ x ( π ) = x Tr S θ ( x ) log S θ ( x ) d π ( θ ) x Tr S π ( x ) log S π ( x ) x p ( x θ ) log p ( x θ ) d π ( θ ) + x p π ( x ) log p π ( x ) = x p ( x θ ) Tr σ θ , x ( log σ θ , x log σ π , x ) d π ( θ ) ,
which is a function of π P . Here, we used
x Tr S θ ( x ) log S θ ( x ) = x p ( x θ ) Tr σ θ , x ( log p ( x θ ) I + log σ θ , x ) = x p ( x θ ) log p ( x θ ) + x p ( x θ ) Tr σ θ , x log σ θ , x
and
x Tr S π ( x ) log S π ( x ) = x p π ( x ) Tr σ π ( x ) ( log p π ( x ) I + log σ π ( x ) ) = x p π ( x ) log p π ( x ) + x p π ( x ) Tr σ π ( x ) log σ π ( x ) .
The conditional Holevo mutual information provides an upper bound on the conditional mutual information as follows.
Proposition 1.
Let σ θ X Y be the state of the composed system ( X , Y ) . Suppose that a measurement is performed on X with the measurement outcome x and then another measurement is performed on Y with the measurement outcome y. Then,
I θ , σ x ( π ) I θ , y x ( π ) .
Proof. 
Since any measurement is a trace-preserving completely positive map, inequality (12) follows from the monotonicity of the quantum relative entropy [13]. ☐
Analogous with the latent information priors [10] in classical statistics, we define latent information priors as priors that maximize the conditional Holevo mutual information. It is expected that the Bayesian predictive density operator σ π ^ , x based on a latent information prior is a minimax predictive density operator. This is true from the following theorem, which is a quantum version of Theorem 2 of Komaki [10].
Theorem 2.
(1) 
Let π ^ P be a prior maximizing I θ , σ x ( π ) . If p π ^ ( x ) > 0 for all x X ; then, σ π ^ ( x ) is a minimax predictive density operator.
(2) 
There exists a convergent prior sequence { π n } n = 1 such that lim n σ π n ( x ) is a minimax predictive density operator and the equality I θ , σ x ( π ) = sup π P I θ , σ x ( π ) holds.
The proof of Theorems 1 and 2 are deferred to the Appendix A.
We note that the minimax risk inf ρ sup θ R E ( θ , ρ ) depends on the measurement E on X . Therefore, the measurement E with minimum minimax risk is desirable from the viewpoint of minimaxity. We define a POVM E to be a minimax POVM if it satisfies
inf ρ sup θ R E ( θ , ρ ) = inf E inf ρ sup θ R E ( θ , ρ ) .
In the next section, we provide a class of minimax POVMs for one qubit system.

4. One Qubit System

In this section, we consider one qubit system and derive a class of minimax POVMs satisfying (13).
Qubit is a quantum system with a two-dimensional Hilbert space. It is the fundamental system in the quantum information theory. A general state of one qubit system is described by a density matrix
σ θ = 1 2 1 + θ z θ x i θ y θ x + i θ y 1 θ z ,
where θ = ( θ x , θ y , θ z ) Θ = { ( θ x , θ y , θ z ) R 3 θ 2 1 } . The parameter space Θ = { ( θ x , θ y , θ z ) R 3 θ 2 = 1 } for pure states is called the Bloch sphere.
Let σ θ X Y = σ θ σ θ be a separable state. We consider the estimation of σ θ Y = σ θ from the outcome of a measurement on σ θ X = σ θ . Here, we assume that the state σ θ X Y is separable, since the state of Y changes according to the outcome of the measurement on X and so the estimation problem is not well-defined if the state σ θ X Y is not separable.
Let Ω : = { ( x , y , z ) R 3 x 2 + y 2 + z 2 = 1 } and B = B ( Ω ) be Borel sets. From Haapasalo et al. [20], it is sufficient to consider POVMs on Ω . For every probability measure μ on ( Ω , B ) that satisfies
Ω x d μ ( ω ) = Ω y d μ ( ω ) = Ω z d μ ( ω ) = 0 ,
we define a POVM E : B L + by
E ( B ) = B 1 + z x i y x + i y 1 z d μ ( ω ) .
In the following, we identify E with μ .
Let E 1 qubit be a class of POVMs on Ω represented by measures that satisfy the conditions
E μ [ x ] = E μ [ y ] = E μ [ z ] = 0 , E μ [ x y ] = E μ [ y z ] = E μ [ z x ] = 0 , E μ [ x 2 ] = E μ [ y 2 ] = E μ [ z 2 ] = 1 3 ,
where E μ is the expectation with respect to a measure μ . We provide two examples of POVMs in E 1 qubit .
Proposition 2.
The POVM corresponding to
μ ( d ω ) = 1 4 π d ω ,
where d ω is surface element, is in E 1 qubit .
Proof. 
From the symmetry of μ , E μ [ x ] = E μ [ y ] = E μ [ z ] = E μ [ x y ] = E μ [ y z ] = E μ [ z x ] = 0 . Moreover, from E μ [ 1 ] = E μ [ x 2 + y 2 + z 2 ] = 1 and the symmetry of μ , E μ [ x 2 ] = E μ [ y 2 ] = E μ [ z 2 ] = 1 / 3 . ☐
Proposition 3.
Suppose that ω i ( i = 1 , 2 , 3 , 4 ) Ω satisfies | ω i | 2 = 1 , ω i · ω j = 1 / 3 ( i j ) . Let μ be a four point discrete measure on Ω defined by
μ ( { ω 1 } ) = μ ( { ω 2 } ) = μ ( { ω 3 } ) = μ ( { ω 4 } ) = 1 4 .
Then, the POVM corresponding to μ belongs to E 1 qubit .
Proof. 
Let P = ( ω 1 , ω 2 , ω 3 , ω 4 ) R 3 × 4 and 1 = ( 1 , 1 , 1 , 1 ) . From the assumption on ω i ( i = 1 , 2 , 3 , 4 ) ,
P P = 4 3 I 4 1 3 J 4 ,
where I 4 R 4 × 4 is the identity matrix and J 4 = 1 1 R 4 × 4 is a matrix whose elements are all one. From (16), we have 1 P P 1 = P 1 2 = 0 . Therefore, P 1 = 0 and it implies E μ [ x ] = E μ [ y ] = E μ [ z ] = 0 .
In addition, from (16),
P P P P = ( ( 4 / 3 ) I 4 ( 1 / 3 ) J 4 ) ( ( 4 / 3 ) I 4 ( 1 / 3 ) J 4 ) = ( 4 / 3 ) ( ( 4 / 3 ) I 4 ( 1 / 3 ) J 4 ) = ( 4 / 3 ) P P .
Therefore, P ( P P ( 4 / 3 ) I 3 ) P = 0 . Since rank P = 3 , it implies P P = ( 4 / 3 ) I 3 . Then, E μ [ x y ] = E μ [ y z ] = E μ [ z x ] = 0 and E μ [ x 2 ] = E μ [ y 2 ] = E μ [ z 2 ] = 1 / 3 . ☐
We note that the POVM (15) is a special case of the SIC-POVM (symmetric, informationally complete, positive operator valued measure) [14,15].
Let P 1 qubit be a class of priors on Θ that satisfies the conditions
E π [ θ x ] = E π [ θ y ] = E π [ θ z ] = 0 , E π [ θ x θ y ] = E π [ θ y θ z ] = E π [ θ z θ x ] = 0 , E π [ θ x 2 ] = E π [ θ y 2 ] = E π [ θ z 2 ] = 1 3 ,
where E π is the expectation with respect to a prior π .
Proposition 4.
The uniform prior
π ( d θ ) = 1 4 π d θ ,
where d θ is the surface element on the Bloch sphere, belongs to P 1 qubit .
Proof. 
Same as Proposition 2. ☐
Proposition 5.
Suppose that θ i ( i = 1 , 2 , 3 , 4 ) Θ satisfies | θ i | 2 = 1 , θ i · θ j = 1 / 3 ( i j ) . Then, the four point discrete prior
π ( { θ 1 } ) = π ( { θ 2 } ) = π ( { θ 3 } ) = π ( { θ 4 } ) = 1 4
belongs to P 1 qubit .
Proof. 
Same as Proposition 3. ☐
We obtain the following result.
Lemma 4.
Suppose π P 1 qubit . Then, for general measurement E, the risk function of the Bayesian predictive density operator σ π is
R E ( θ , σ π ) = h 1 + θ 2 + 1 2 log 9 2 log 2 2 ( θ x 2 E μ [ x 2 ] + θ y 2 E μ [ y 2 ] + θ z 2 E μ [ z 2 ] + 2 θ x θ y E μ [ x y ] + 2 θ y θ z E μ [ y z ] + 2 θ z θ x E μ [ z x ] ) .
Proof. 
The distribution of the measurement outcome ω = ( x , y , z ) is
p ( B θ ) = Tr σ θ E ( B ) = ( 1 + x θ x + y θ y + z θ z ) μ ( B ) .
Then, since π P 1 qubit , the marginal distribution of the measurement outcome is
p ( B ) = Θ p ( B θ ) d π ( θ ) = Θ ( 1 + x θ x + y θ y + z θ z ) μ ( B ) d π ( θ ) = μ ( B ) .
Therefore, the posterior distribution of θ is
d π ( θ ω ) = ( 1 + x θ x + y θ y + z θ z ) d π ( θ ) .
The posterior mean of θ x , θ y and θ z are x / 3 , y / 3 and z / 3 , respectively.
Thus, the Bayesian predictive density operator based on prior π is
σ π ( ω ) = σ θ d π ( θ ω ) = 1 2 1 + z / 3 x / 3 i y / 3 x / 3 + i y / 3 1 z / 3 ,
and we have
log σ π ( ω ) = ( log 1 3 ) ( 1 z 2 ) ( log 1 3 ) ( x + i y 2 ) ( log 1 3 ) ( x i y 2 ) ( log 1 3 ) ( 1 + z 2 ) + ( log 2 3 ) ( 1 + z 2 ) ( log 2 3 ) ( x i y 2 ) ( log 2 3 ) ( x + i y 2 ) ( log 2 3 ) ( 1 z 2 ) .
Therefore, the quantum relative entropy loss is
D ( σ θ , σ π ( ω ) ) = Tr σ θ ( log σ θ log σ π ( ω ) ) = h 1 + θ 2 + 1 2 log 9 2 x θ x + y θ y + z θ z 2 log 2 .
Hence, the risk function is
R E ( θ , σ π ) = Ω D ( σ θ , σ π ( ω ) ) d p ( ω θ ) = h 1 + θ 2 + 1 2 log 9 2 log 2 2 ( θ x 2 E μ [ x 2 ] + θ y 2 E μ [ y 2 ] + θ z 2 E μ [ z 2 ] + 2 θ x θ y E μ [ x y ] + 2 θ y θ z E μ [ y z ] + 2 θ z θ x E μ [ z x ] ) .
 ☐
Theorem 3.
For a measurement E E 1 qubit , every π P 1 qubit is a latent information prior:
max θ R ( θ , σ π ) = min ρ max θ R ( θ , ρ ) .
In addition, the risk of the Bayesian predictive density operator based on π is
R ( θ , σ π ) = h 1 + θ 2 + 1 2 log 9 2 log 2 6 θ 2 ,
where h is the binary entropy function h ( p ) = p log p ( 1 p ) log ( 1 p ) .
Proof. 
From Lemma 4 and E E 1 qubit ,
R E ( θ , σ π ) = h 1 + | θ | 2 + 1 2 log 9 2 log 2 6 ( θ x 2 + θ y 2 + θ z 2 ) .
Therefore, the risk depends only on r = θ and we have
R E ( θ , σ π ) = g ( r ) = h 1 + r 2 + 1 2 log 9 2 log 2 6 r 2 .
Since
g ( r ) = 1 2 log 1 + r 1 r log 2 3 r ,
g ( r ) = 1 1 r 2 log 2 3 1 log 2 3 0 ,
the function g ( r ) is convex. In addition, we have g ( 1 ) = log 3 2 3 log 2 > g ( 0 ) = log 3 3 2 log 2 . Therefore, g ( r ) takes the maximum at r = 1 .
In other words, R E ( θ , σ π ) takes maximum on the Bloch sphere. In addition, since ( θ x 2 + θ y 2 + θ z 2 ) d π ( θ ) = 1 / 3 + 1 / 3 + 1 / 3 = 1 , the support of π is included in the Bloch sphere θ 2 = 1 . Therefore, R E ( θ , σ π ) d π ( θ ) = sup θ R E ( θ , σ π ) and it implies that π is a latent information prior. ☐
We note that the Bayesian predictive density operator is identical for every π P 1 qubit . In fact, every π P 1 qubit also provides the minimax estimation of density operator σ θ Y when there is no observation system X. Figure 1 shows the risk function g ( r ) in (17) and also the minimax risk function g 0 ( r ) when there is no observation:
g 0 ( r ) = Tr ( 1 + r ) / 2 0 0 ( 1 r ) / 2 log ( 1 + r ) / 2 0 0 ( 1 r ) / 2 log 1 / 2 0 0 1 / 2 = h ( r ) + log 2 .
Whereas g ( r ) < g 0 ( r ) around r = 1 , we can see that g ( r ) > g 0 ( r ) around r = 0 . Both risk functions take the maximum at r = 1 and
g ( 1 ) = log 3 ( 2 / 3 ) log 2 < g 0 ( 1 ) = log 2 .
The decrease g 0 ( 1 ) g ( 1 ) > 0 in the maximum risk corresponds to the gain from the observation X.
Now, we consider the selection of the measurement E. As we discussed in the previous section, we define a POVM E to be a minimax POVM if it satisfies (13). We provide a sufficient condition on a POVM to be minimax. Let ρ E be a minimax predictive density operator for the measurement E.
Lemma 5.
Suppose π is a latent information prior for the measurement E . If
R E ( θ , ρ E ) d π ( θ ) = inf E R E ( θ , ρ E ) d π ( θ ) ,
then E is a minimax POVM.
Proof. 
For every ( E , ρ ) , we have
sup θ R E ( θ , ρ ) inf ρ sup θ R E ( θ , ρ ) = sup θ R E ( θ , ρ E ) = R E ( θ , ρ E ) d π ( θ ) inf E R E ( θ , ρ E ) d π ( θ ) = R E ( θ , ρ E ) d π ( θ ) = sup θ R E ( θ , σ π ) .
The last equality is from the minimaxity of σ π . Therefore, E is a minimax POVM. ☐
Theorem 4.
Every E E 1 qubit is a minimax POVM.
Proof. 
Let π P 1 qubit . From Theorem 6, π is a latent information prior for E .
For general measurement E, from Lemma 4, the risk function of the Bayesian predictive density operator σ π is
R E ( θ , σ π ) = h 1 + θ 2 + 1 2 log 9 2 log 2 2 ( θ x 2 E μ [ x 2 ] + θ y 2 E μ [ y 2 ] + θ z 2 E μ [ z 2 ] + 2 θ x θ y E μ [ x y ] + 2 θ y θ z E μ [ y z ] + 2 θ z θ x E μ [ z x ] ) .
Hence, the Bayes risk of σ π with respect to π is
R E ( θ , σ π ) d π ( θ ) = log 3 2 3 log 2 .
Now, since the Bayesian predictive density operator σ π minimizes the Bayes risk with respect to π among all predictive density operators [4],
R E ( θ , ρ E ) d π ( θ ) R E ( θ , σ π ) d π ( θ ) = log 3 2 3 log 2
for every E. Therefore,
inf E R E ( θ , ρ E ) d π ( θ ) log 3 2 3 log 2 .
On the other hand,
inf E R E ( θ , σ π ) d π ( θ ) R E ( θ , σ π ) d π ( θ ) = log 3 2 3 log 2
is obvious.
Hence,
R E ( θ , σ π ) d π ( θ ) = inf E R E ( θ , σ π ) d π ( θ ) = log 3 2 3 log 2 .
From Lemma 5, E is minimax. ☐
Whereas Theorems 1 and 2 are valid even when σ θ X Y is not separable, Theorems 3 and 4 assume the separability σ θ X Y = σ θ X σ θ Y .
From Theorem 4, the POVM (15) is a minimax POVM. Since this POVM is identical to the SIC-POVM [14,15], it is an interesting problem whether the SIC-POVM is a minimax POVM also in higher dimensions. This is a future work.

Acknowledgments

We thank the referees for many helpful comments. This work was supported by Japan Society for the Promotion of Science (JSPS) KAKENHI Grant Numbers 26280005 and 14J09148.

Author Contributions

All authors contributed significantly to the study and approved the final version of the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proofs

Proof of (4).
From the definition of p ^ π in (1),
p ( x , y θ ) d π ( θ ) = p π ( x ) p ^ π ( y x ) ,
where
p π ( x ) = p ( x θ ) d π ( θ ) .
Therefore, for arbitrary p ^ ,
R ( π , p ^ ) R ( π , p ^ π ) = p ( x , y θ ) ( log p ^ π ( y x ) log p ^ ( y x ) ) d π ( θ ) d x d y = p π ( x ) p ^ π ( y x ) ( log p ^ π ( y x ) log p ^ ( y x ) ) d x d y = p π ( x ) L ( p ^ π ( y x ) , p ^ ( y x ) ) d x ,
which is nonnegative since the Kullback–Leibler divergence L ( q , p ) in (3) is always nonnegative. ☐
Proof of (9).
From the definition of σ π Y ( x ) in (7),
p ( x θ ) σ θ , x Y d π ( θ ) = p π ( x ) σ π Y ( x ) ,
where
p π ( x ) = p ( x θ ) d π ( θ ) .
Therefore, for arbitrary p ^ ,
R ( π , ρ ) R ( π , σ π Y ) = p ( x θ ) Tr σ θ , x Y ( log σ π Y ( x ) log ρ ( x ) ) d π ( θ ) d x = p π ( x ) Tr σ π Y ( x ) ( log σ π Y ( x ) log ρ ( x ) ) d x = p π ( x ) L ( σ π Y ( x ) , ρ ( x ) ) d x ,
which is nonnegative since the quantum relative entropy L ( σ , ρ ) in (8) is always nonnegative. ☐
Proof of Theorem 1.
(1) Let Q x ρ be the orthogonal projection matrix onto the eigenspace of ρ ( x ) corresponding to eigenvalue 0, Θ ρ = { θ Θ x p ( x θ ) Tr Q x ρ σ θ , x = 0 } and P ρ be the set of all probability measures on Θ ρ .
If Θ ρ = , the assertion is obvious because R ( θ , ρ ) = for θ Θ ρ . Therefore, we assume Θ ρ in the following. In this case, D ρ ( π ^ ρ ) < . Since π P ρ if and only if D ρ ( π ) < , we have π ^ ρ P ρ .
Define
π ˜ θ , u : = u δ θ + ( 1 u ) π ^ ρ ,
for θ Θ ρ and 0 u 1 , where δ θ is the probability measure satisfying δ θ ( { θ } ) = 1 . Then, π ˜ θ , u P ρ , and we have
u D ρ ( π ˜ θ , u ) | u = 0 = u x Tr S π ˜ θ , u ( x ) ( log S π ˜ θ , u ( x ) log ( p π ˜ θ , u ( x ) ρ ( x ) ) ) | u = 0 = u x Tr ( u S θ ( x ) + ( 1 u ) S π ^ θ , u ( x ) ) × ( log ( u S θ ( x ) + ( 1 u ) S π ^ ρ ( x ) ) log ( u p ( x θ ) + ( 1 u ) p π ^ ρ ( x ) ) ρ ( x ) | u = 0 = x Tr u ( u S θ ( x ) + ( 1 u ) S π ^ ρ ( x ) ) | u = 0 ( log S π ^ ρ ( x ) log ( p π ^ ρ ( x ) ρ x ) ) + x Tr S π ^ ρ ( x ) u log ( u S θ ( x ) + ( 1 u ) S π ^ ρ ( x ) ) | u = 0 x Tr S π ^ ρ ( x ) u log ( u p ( x θ ) + ( 1 u ) p π ^ ρ ( x ) ) I + log ρ x | u = 0 = x Tr ( S θ ( x ) S π ^ ρ ( x ) ) ( log S π ^ ρ ( x ) log ( p π ^ ρ ( x ) ρ ( x ) ) ) + x Tr S θ ( x ) p π ^ ρ ( x ) ρ ( x ) x Tr S π ^ ρ ( x ) p ( x θ ) p π ^ ρ ( x ) p π ^ ρ ( x ) = x Tr S θ ( x ) ( log S π ^ ρ ( x ) log ( p π ^ ρ ( x ) ρ ( x ) ) ) x Tr S π ^ ρ ( x ) ( log S π ^ ρ ( x ) log ( p π ^ ρ ( x ) ρ ( x ) ) ) 0 .
Thus, if θ Θ ρ ,
R ( θ , σ π ^ ρ ( x ) ) = x Tr S θ ( x ) ( log σ θ , x log σ π ^ ρ ( x ) ) x Tr S θ ( x ) ( log σ θ , x log ρ ( x ) ) = R ( θ , ρ ( x ) ) < .
If θ Θ ρ , R ( θ , ρ ( x ) ) = . Therefore, for every θ Θ , the inequality R ( θ , σ π ^ ρ ( x ) ) R ( θ , ρ ( x ) ) holds.
(2) We note that Θ ρ and P ρ are compact subsets of Θ and P , respectively.
If Θ ρ = , the assertion is obvious, because R ( θ , ρ x ) = for every θ Θ ρ . Therefore, we assume Θ ρ in the following. Let X ρ : = { x X θ Θ ρ , p ( x θ ) > 0 } and μ ρ be a probability measure on Θ ρ such that p μ ρ ( x ) : = p ( x θ ) d μ ρ ( θ ) > 0 for every x X ρ .
Because D ρ ( π ) is continuous as a function of π P ρ , there exists π n P μ ρ / n ρ : = { ( 1 / n ) μ ρ + ( 1 1 / n ) π π P ρ } such that D ρ ( π n ) = inf π P μ / n ρ D ρ ( π ) . From Lemma 3, there exists a convergent subsequence { π m } m = 1 of { π n } n = 1 such that D ρ ( π ) = inf π P ρ D ρ ( π ) , where lim π m π .
Let n m be the integer satisfying π m = π n m . We can make the subsequence { π m } m = 1 satisfy 0 < n m / ( n m + 1 n m ) < c for some positive constant c.
Since
n m n m + 1 π m + 1 n m n m + 1 δ θ = n m n m + 1 π n m + 1 n m n m + 1 δ θ P μ ρ / n m + 1 ρ
for every θ Θ , we have
π ˜ m , θ , u : = u n m n m + 1 π m + 1 n m n m + 1 δ θ + ( 1 u ) π m + 1 P μ ρ / n m + 1 ρ
for every θ Θ ρ and 0 u 1 . Thus,
u D ( π ˜ m , θ , u ) | u = 0 = u x Tr p π ˜ m , θ , u ( x ) ( log S π ˜ m , θ , u ( x ) log ( p π ˜ m , θ , u ( x ) ρ ( x ) ) ) ( I Q x ρ ) | u = 0 = x Tr { u S π ˜ m , θ , u ( x ) | u = 0 } ( log S π ˜ m , θ , u ( x ) log ( p π ˜ m , θ , u ( x ) ρ ( x ) ) ) ( I Q x ρ ) = n m n m + 1 x Tr S π m ( x ) ( log S π m + 1 ( x ) log ( p π m + 1 ( x ) ρ ( x ) ) ) ( I Q x ρ ) x Tr S π m + 1 ( x ) ( log S π m + 1 ( x ) log ( p π m + 1 ( x ) ρ ( x ) ) ) ( I Q x ρ ) + n m + 1 n m n m + 1 x Tr S θ ( x ) ( log S π m + 1 ( x ) log ( p π m + 1 ( X ) ρ ( x ) ) ) ( I Q x ρ ) 0 .
Hence,
x Tr S θ ( x ) ( log S π m + 1 ( x ) log ( p π m + 1 ( x ) ρ ( x ) ) ) ( I Q x ρ ) n m + 1 n m + 1 n m x Tr S π m + 1 ( x ) ( log S π m + 1 ( x ) log ( p π m + 1 ( x ) ρ ( x ) ) ) ( I Q x ρ ) n m n m + 1 n m x Tr S π m ( x ) ( log S π m + 1 ( x ) log ( p π m + 1 ( x ) ρ ( x ) ) ) ( I Q x ρ ) = n m + 1 n m + 1 n m x Tr S π m + 1 ( x ) ( log S π m + 1 ( x ) log ( p π m + 1 ( x ) ρ ( x ) ) ) ( I Q x ρ ) + n m n m + 1 n m { x Tr S π m ( x ) ( log S π m + 1 ( x ) log ( p π m + 1 ( x ) ρ ( x ) ) ) ( I Q x ρ ) ( I Q x π ) x Tr S π m ( x ) ( log S π m + 1 ( x ) log ( p π m + 1 ( x ) ρ ( x ) ) ) Q x π ( I Q x ρ ) } n m + 1 n m + 1 n m x Tr S π m + 1 ( x ) ( log S π m + 1 ( x ) log ( p π m + 1 ( x ) ρ ( x ) ) ) ( I Q x ρ ) + n m n m + 1 n m { x Tr S π m ( x ) ( log S π m + 1 ( x ) log ( p π m + 1 ( x ) ρ ( x ) ) ) ( I Q x ρ ) ( I Q x π ) + x Tr S π m ( x ) log ρ ( x ) Q x π ( I Q x ρ ) } ,
where Q x π is the orthogonal projection matrix onto the eigenspace of θ π ( θ ) p ( x θ ) σ θ , x corresponding to the eigenvalue 0. Here, we have
lim m x Tr S π m ( x ) ( log S π m + 1 ( x ) log ( p π m + 1 ρ ( x ) ) ) ( I Q x ρ ) ( I Q x π ) = x Tr S π ( x ) ( log S π ( x ) log ( p π ( x ) ρ ( x ) ) ) ( I Q x ρ ) ( I Q x π ) ,
and
lim m x Tr S π m ( x ) log ρ ( x ) Q x π ( I Q x ρ ) = 0 = x Tr S π ( x ) ( log S π ( x ) log ( p π ( x ) ρ ( x ) ) ) Q x π ( I Q x ρ ) .
Therefore, from (A1)–(A3) and 0 < n m / ( n m + 1 n m ) < c for every θ Θ ρ ,
lim   inf m x Tr S θ ( x ) ( log S π m ( x ) log ( p π m ( x ) ρ ( x ) ) ) ( I Q x ρ ) x Tr S π ( x ) ( log S π ( x ) log ( p π ( x ) ρ ( x ) ) ) ( I Q x ρ ) 0 .
By taking an appropriate subsequence { π k } of { π m } , we can make the subsequence of density operators { σ π k , x } k = 1 converge for all x X ρ because p π m ( x ) > 0 ( x X ρ ) and 0 S π m / p π m ( x ) I .
Then, from (A4), if θ Θ ρ ,
R ( θ , lim k σ π k ( x ) ) = x Tr S θ ( x ) ( log σ θ , x log lim k σ π k ( x ) ) = x Tr S θ ( x ) ( log σ θ , x log lim k σ π k ( x ) ) ( I Q x ρ ) x Tr S θ ( x ) ( log σ θ , x log ρ ( x ) ) ( I Q x ρ ) = x Tr S θ ( x ) ( log σ θ , x log ρ ( x ) ) = R ( θ , ρ ( x ) ) < .
If θ Θ ρ , R ( θ , ρ ) = because x S θ ( x ) log ρ ( x ) Q x ρ = .
Hence, the risk of the predictive density operator defined by
lim k σ π k ( x ) , x X ρ , τ x , x X ρ ,
where τ x is an arbitrary predictive density, is not greater than that of ρ ( x ) for every θ Θ .
Therefore, by taking a sequence { ε n ( 0 , 1 ) } n = 1 that converges rapidly enough to 0, we can construct a predictive density operator
lim k σ ε k μ ¯ + ( 1 ε k ) π k ( x ) = lim k σ π k ( x ) , x X ρ , σ μ ¯ ( x ) , x X ρ ,
as a limit of Bayesian predictive density operators based on priors { ε k μ ¯ + ( 1 ε k ) π k } , where μ ¯ is a measure on Θ such that p μ ¯ ( x ) > 0 for every x X .
Hence, the risk of the predictive density operator (A5) is not greater than that of ρ ( x ) for every θ Θ . ☐
Proof of Theorem 2.
(1) Define π ˜ θ ¯ , u : = u δ θ ¯ + ( 1 u ) π ^ for all θ Θ and u [ 0 , 1 ] . Then,
u I θ , σ x ( π ˜ θ ¯ , u ) | u = 0 = u ( x Tr S θ ( x ) log S θ ( x ) d π ˜ θ ¯ , u ( θ ) x S π ˜ θ ¯ , u ( x ) log S π ˜ θ ¯ , u ( x ) x p ( x θ ) log p ( x θ ) d π ˜ θ ¯ , u + x p π ˜ θ ¯ , u ( x ) log p π ˜ θ ¯ , u ( x ) ) | u = 0 = x Tr S θ ¯ ( x ) ( log S θ ¯ ( x ) log p θ ¯ ( x ) I ) x Tr S θ ¯ ( x ) ( log S π ^ ( x ) log p π ^ ( x ) I ) x Tr S θ ( x ) ( log S θ ( x ) log p ( x θ ) I ) d π ^ ( θ ) + x Tr S π ^ ( x ) ( log S π ^ ( x ) log p π ^ ( x ) I ) 0 .
Since p π ^ ( x ) > 0 for every x X and Tr   p ( x θ ) σ θ , x log σ θ , x = 0 if p ( x | θ ) = 0 , we have
x Tr S θ ¯ ( x ) ( log σ θ ¯ , x log σ π ^ ( x ) ) x Tr S θ ( x ) ( log σ θ , x log σ π ^ ( x ) ) d π ^ ( θ )
for every θ Θ .
On the other hand, we have
x Tr S θ ( x ) ( log σ θ , x log σ π ^ ( x ) ) d π ^ ( θ ) = inf ρ x Tr S θ ( x ) ( log σ θ , x log ρ ( x ) ) d π ^ ( θ ) sup π P inf ρ x Tr S θ ( x ) ( log σ θ , x log ρ ( x ) ) d π ^ ( θ ) inf ρ sup π P x Tr S θ ( x ) ( log σ θ , x log ρ ( x ) ) d π ^ ( θ ) = inf ρ sup θ Θ x Tr S θ ( x ) ( log σ θ , x log ρ ( x ) ) sup θ Θ x Tr S θ ( x ) ( log σ θ , x log σ π ^ ( x ) ) .
Here, the first equality is from the fact [4] that the Bayes risk with respect to π ^ P
R ( θ ; ρ ( x ) ) d π ^ ( θ ) = x p ( x θ ) Tr σ θ , x ( log σ θ , x log ρ ( x ) ) d π ^ ( θ )
is minimized when
ρ ( x ) = σ π ^ ( x ) : = p ( x θ ) σ θ , x d π ^ ( θ ) p ( x θ ) d π ^ ( θ ) .
From (A6) and (A7), we have
inf ρ sup θ Θ x Tr S θ ( x ) ( log σ θ , x ρ ( x ) ) = sup θ Θ x Tr S θ ( x ) ( log σ θ , x log σ π ^ ( x ) ) .
Therefore, the predictive density operator σ π ^ ( x ) is minimax.
(2) Let μ be a probability measure on Θ such that p μ ( x ) : = p ( x θ ) d μ ( θ ) > 0 for every x X , and let π n P μ / n : = { μ / n + ( 1 1 / n ) π π P } be a prior satisfying I θ , σ | x ( π n ) = sup π P μ / n I θ , σ | x ( π ) . From Lemma 3, there exists a convergent subsequence { π m } of { π n } and I θ , σ | x ( π ) = sup π P I θ , σ x ( π ) where π m π . Let n m be the integer satisfying π m = π n m . As in the proof of Theorem 1, we can make the subsequence { π m } satisfy 0 < n m / ( n m + 1 n m ) < c for some positive constant c.
Then, for every θ ¯ Θ ,
π ˜ m , θ ¯ , u : = u n m n m + 1 π m + ( 1 n m n m + 1 ) δ θ ¯ + ( 1 u ) π m + 1
belongs to P μ / n m + 1 for 0 u 1 because ( n m / n m + 1 ) π m + ( 1 n m / n m + 1 ) δ θ ¯ P μ / n m + 1 and π m + 1 P μ / n m + 1 .
Thus,
u I θ , ρ x ( π ˜ m , θ ¯ , u ) | u = 0 = u ( x Tr S θ ( x ) log S θ ( x ) d π ˜ m , θ ¯ , u ( θ ) x Tr S π ˜ m , θ ¯ , u ( x ) log S π ˜ m , θ ¯ , u ( x ) x p ( x θ ) log p ( x θ ) d π ˜ m , θ ¯ , u ( θ ) + x p π ˜ m , θ ¯ , u ( x ) log p π ˜ m , θ ¯ , u ) | u = 0 = n m n m + 1 x Tr S θ ( x ) log S θ ( x ) d π m ( θ ) + ( 1 n m n m + 1 ) x Tr S θ ¯ ( x ) log S θ ¯ ( x ) x Tr S θ ( x ) log S θ ( x ) d π m + 1 ( θ ) x Tr u S π ˜ m , θ ¯ , u ( x ) | u = 0 log S π m + 1 ( x ) n m n m + 1 x p ( x θ ) log p ( x θ ) d π m + 1 ( θ ) ( 1 n m n m + 1 ) x p θ ¯ ( x ) log p θ ¯ ( x ) + x p ( x θ ) log p ( x θ ) d π m + 1 ( θ ) + x u p π ˜ m , θ ¯ , u ( x ) | u = 0 log p π m + 1 ( x ) = ( 1 n m n m + 1 ) x Tr S θ ¯ ( x ) ( log S θ ¯ ( x ) log p ( x θ ¯ ) I ) ( 1 n m n m + 1 ) x Tr S θ ¯ ( x ) ( log S π m + 1 ( x ) log p π m + 1 ( x ) I ) + n m n m + 1 x Tr S θ ( x ) ( log S θ ( x ) log p ( x θ ) I ) d π m ( θ ) x Tr S θ ( x ) ( log S θ ( x ) log p ( x θ ) ) d π m + 1 ( θ ) n m n m + 1 x Tr S π m ( x ) ( log S π m + 1 ( x ) log p π m + 1 ( x ) I ) + x Tr S π m + 1 ( x ) ( log S π m + 1 ( x ) log p π m + 1 ( x ) I ) 0 .
Since p π ^ m ( x ) > 0 for every m and p ( x θ ) σ θ , x log σ θ , x = 0 if p ( x θ ) = 0 , we have
1 n m n m + 1 x Tr S θ ¯ ( x ) ( log σ θ ¯ , x log σ π m + 1 ( x ) ) + n m n m + 1 x Tr S θ ( x ) ( log σ θ , x log σ π m + 1 ( x ) ) d π m ( θ ) x Tr S θ ( x ) ( log σ θ , x log σ π m + 1 ( x ) ) d π m + 1 ( θ ) 0 .
Hence,
x Tr S θ ¯ ( x ) ( log σ θ ¯ ( x ) log σ π m + 1 ( x ) ) n m n m + 1 n m { x Tr S θ ( x ) ( log σ θ , x log σ π m + 1 ( x ) ) ( 1 Q x π ) d π m ( θ ) + x Tr S θ ( x ) ( log σ θ , x log σ π m + 1 ( x ) ) Q x π d π m ( θ ) } + n m + 1 n m + 1 n m x Tr S θ ( x ) ( log σ θ , x log σ π m + 1 ( x ) ) d π m + 1 ( θ ) n m n m + 1 n m { x Tr S θ ( x ) ( log σ θ , x log σ π m + 1 ( x ) ) ( 1 Q x π ) d π m ( θ ) + x Tr S θ ( x ) log σ θ , x Q x π d π m ( θ ) } + n m + 1 n m + 1 n m x Tr S θ ( x ) ( log σ θ , x log σ π m + 1 ( x ) ) d π m + 1 ( θ ) ,
where Q x π is the orthogonal projection matrix onto the eigenspace of S π ( x ) corresponding to the eigenvalue 0. Here, we used two equalities
lim m x Tr S θ ( x ) ( log σ θ , x log σ π m + 1 ( x ) ) ( 1 Q x π ) d π m ( θ ) = x Tr S θ ( x ) ( log ( p π ( x ) σ θ , x ) log S π ( x ) ) d π ( θ )
and
lim m x Tr S θ ( x ) log σ θ , x Q x π d π m ( θ ) = x Tr S θ ( x ) log σ θ , x Q x π d π ( θ ) = x Tr S θ ( x ) ( log ( p π ( x ) ) σ θ , x ) log S π , x ) Q x π d π ( θ ) = 0 ,
since Tr S θ ( x ) log σ θ , x is a bounded continuous function of θ .
From (A8)–(A11), and 0 < n m / ( n m + 1 n m ) < c , we have, for every θ ¯ Θ ,
lim   sup m x Tr S θ ¯ ( x ) ( log σ θ ¯ ( x ) log σ π m ( x ) ) x Tr S θ ( x ) ( log ( p π ( x ) σ θ ( x ) ) log S π ( x ) ) d π ( θ ) .
By taking an appropriate subsequence { π k } of { π m } , we can make { σ π k ( x ) } k = 1 converge for every x. Then, for every θ ¯ Θ ,
x Tr S θ ( x ) ( log σ θ ¯ , x log lim k σ π k ( x ) )
x S θ ( x ) ( log ( σ θ , x log lim k σ π k ( x ) ) d π ( θ ) ,
since lim k σ π k ( x ) = σ π ( x ) for x with p π ( x ) > 0 .
On the other hand, we have
x Tr S θ ( x ) ( log σ θ , x log lim k σ π k ( x ) ) d π ( θ ) = inf ρ x Tr S θ ( x ) ( log σ θ , x log ρ ( x ) ) d π ( θ ) sup π P inf ρ x Tr S θ ( x ) ( log σ θ , x log ρ ( x ) ) d π ( θ ) inf ρ sup π P x Tr S θ ( x ) ( log σ θ , x log ρ ( x ) ) d π ( θ ) = inf ρ sup θ Θ x Tr S θ ( x ) ( log σ θ , x log ρ ( x ) ) sup θ Θ x Tr S θ ( x ) ( log σ θ , x log lim k σ π k ( x ) ) .
Here, the first equality is from the fact [4] that the Bayes risk
R ( θ ; ρ ) d π ( θ ) = x Tr p ( x θ ) σ θ , x ( log σ θ , x log ρ ( x ) ) d π ( θ )
is minimized when ρ ( x ) = σ π ( x ) . Although p π ( x ) is not uniquely determined for x with p π ( x ) = 0 , the Bayes risk does not depend on the choice of σ π ( x ) for such x.
From (A12) and (A13),
inf ρ sup θ Θ x Tr p ( x θ ) σ θ , x ( log σ θ , x ρ ( x ) ) = sup θ Θ x Tr p ( x θ ) σ θ , x ( log σ θ , x log lim k σ π k ( x ) ) .
Therefore, the predictive density operator lim k σ π k ( x ) is minimax. ☐

References

  1. Barndorff-Nielsen, O.E.; Gill, R.D.; Jupp, P.E. On quantum statistical inference. J. R. Stat. Soc. B 2003, 65, 775–804. [Google Scholar] [CrossRef]
  2. Holevo, A.S. Probabilistic and Statistical Aspects of Quantum Theory; Elsevier: Amsterdam, The Netherlands, 1982. [Google Scholar]
  3. Paris, M.; Rehacek, J. Quantum State Estimation; Springer: Berlin, Germany, 2004. [Google Scholar]
  4. Tanaka, F.; Komaki, F. Bayesian predictive density operators for exchangeable quantum-statistical models. Phys. Rev. A 2005, 71, 052323. [Google Scholar] [CrossRef]
  5. Tanaka, F. Bayesian estimation of the wave function. Phys. Lett. A 2012, 376, 2471–2476. [Google Scholar] [CrossRef]
  6. Tanaka, F. Noninformative prior in the quantum statistical model of pure states. Phys. Rev. A 2012, 85, 062305. [Google Scholar] [CrossRef]
  7. Geisser, S. Predictive Inference: An Introduction; Chapman & Hall: London, UK, 1993. [Google Scholar]
  8. Csiszar, I. Axiomatic characterizations of information measures. Entropy 2008, 10, 261–273. [Google Scholar] [CrossRef]
  9. Aitchison, J. Goodness of prediction fit. Biometrika 1975, 62, 547–554. [Google Scholar] [CrossRef]
  10. Komaki, F. Bayesian predictive densities based on latent information priors. J. Stat. Plan. Inference 2011, 141, 3705–3715. [Google Scholar] [CrossRef]
  11. Bernardo, J.M. Reference posterior distributions for Bayesian inference. J. R. Stat. Soc. B 1979, 41, 113–147. [Google Scholar]
  12. Petz, D. Quantum Information and Quantum Statistics; Springer: New York, NY, USA, 2008. [Google Scholar]
  13. Holevo, A.S. Quantum Systems, Channels, Information: A Mathematical Introduction; Walter de Gruyter: Berlin, Germany, 2013. [Google Scholar]
  14. Appleby, D.M. SIC-POVMs and the extended Clifford group. J. Math. Phys. 2004, 46, 547–554. [Google Scholar]
  15. Renes, J.M.; Blume-Kohout, R.; Scott, A.J.; Caves, C.M. Symmetric informationally complete quantum measurements. J. Math. Phys. 2004, 45, 2171–2180. [Google Scholar] [CrossRef]
  16. Ferrie, C.; Blume-Kohout, R. Minimax quantum tomography: Estimators and relative entropy bounds. Phys. Rev. Lett. 2016, 116, 090407. [Google Scholar] [CrossRef] [PubMed]
  17. Tanaka, F. Quantum minimax theorem. arXiv 2014, arXiv:1410:3639. [Google Scholar]
  18. Billingsley, P. Convergence of Probability Measures; Wiley: New York, NY, USA, 1999. [Google Scholar]
  19. Hiai, F.; Petz, D. Introduction to Matrix Analysis and Applications; Springer: New York, NY, USA, 2014. [Google Scholar]
  20. Haapasalo, E.; Heinosaari, T.; Pellonpää, J.P. Quantum measurements on finite dimensional systems: Relabeling and mixing. Quantum Inf. Process. 2012, 11, 1751–1763. [Google Scholar] [CrossRef]
Figure 1. Risk functions of predictive density operators. solid line: g ( r ) , dashed line: g 0 ( r ) .
Figure 1. Risk functions of predictive density operators. solid line: g ( r ) , dashed line: g 0 ( r ) .
Entropy 19 00618 g001

Share and Cite

MDPI and ACS Style

Koyama, T.; Matsuda, T.; Komaki, F. Minimax Estimation of Quantum States Based on the Latent Information Priors. Entropy 2017, 19, 618. https://doi.org/10.3390/e19110618

AMA Style

Koyama T, Matsuda T, Komaki F. Minimax Estimation of Quantum States Based on the Latent Information Priors. Entropy. 2017; 19(11):618. https://doi.org/10.3390/e19110618

Chicago/Turabian Style

Koyama, Takayuki, Takeru Matsuda, and Fumiyasu Komaki. 2017. "Minimax Estimation of Quantum States Based on the Latent Information Priors" Entropy 19, no. 11: 618. https://doi.org/10.3390/e19110618

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop