Next Article in Journal
Quantifying Chaos by Various Computational Methods. Part 1: Simple Systems
Previous Article in Journal
Efficient Algorithms for Searching the Minimum Information Partition in Integrated Information Theory
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

First- and Second-Order Hypothesis Testing for Mixed Memoryless Sources

1
National Institute of Information and Communications Technology (NICT), Tokyo 184-8795, Japan
2
School of Network and Information, Senshu University, Kanagawa 214-8580, Japan
*
Author to whom correspondence should be addressed.
Entropy 2018, 20(3), 174; https://doi.org/10.3390/e20030174
Submission received: 23 January 2018 / Revised: 2 March 2018 / Accepted: 5 March 2018 / Published: 6 March 2018
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

:
The first- and second-order optimum achievable exponents in the simple hypothesis testing problem are investigated. The optimum achievable exponent for type II error probability, under the constraint that the type I error probability is allowed asymptotically up to ε , is called the ε -optimum exponent. In this paper, we first give the second-order ε -optimum exponent in the case where the null hypothesis and alternative hypothesis are a mixed memoryless source and a stationary memoryless source, respectively. We next generalize this setting to the case where the alternative hypothesis is also a mixed memoryless source. Secondly, we address the first-order ε -optimum exponent in this setting. In addition, an extension of our results to the more general setting such as hypothesis testing with mixed general source and a relationship with the general compound hypothesis testing problem are also discussed.

1. Introduction

Let X = { X n } n = 1 and X ¯ = { X ¯ n } n = 1 be two general sources (cf. Han [1]), where we use the term of general source to denote a sequence of random variables X n (respectively, X ¯ n ) indexed by block length n, where each component of X n (respectively, X ¯ n ) takes values in alphabet X and may vary depending on n.
We consider the hypothesis testing problem with null hypothesis X , alternative hypothesis X ¯ and acceptance region A n X n . The probabilities of type I error and type II error are defined, respectively, as
μ n : = Pr X n A n , λ n : = Pr X ¯ n A n .
We focus mainly on how to determine the ε -optimum exponent, defined as the supremum of achievable exponents R for the type II error probability λ n e n R under the constraint that the type I error probability is allowed asymptotically up to a constant ε ( 0 ε < 1 ) . The classical but fundamental result in this setting is so-called Stein’s lemma [2], which gives the ε -optimum exponent in the case where both the null and alternative hypotheses are stationary memoryless sources. The lemma shows that the ε -optimum exponent is given by D ( P X | | P X ¯ ) , the divergence between stationary memoryless sources X and X ¯ . Chen [3] has generalized this lemma to the case where both of X and X ¯ are general sources, and established the general formula of ε -optimum exponent in terms of divergence spectra. The ε -optimum exponent derived in [3] is called in this paper the first-order ε -optimum exponent.
On the other hand, the second-order asymptotics have also been investigated in several contexts of information theory [4,5,6,7,8,9] to analyze the finer asymptotic behavior of the form λ n e n R n S .
Strassen [4] has first introduced the notion of ε -optimum achievable exponent of the second-order in hypothesis testing problem in the case where both of X and X ¯ are stationary memoryless sources. The results in [4] have also revealed that the asymptotic normality of divergence density rate (or likelihood ratio rate) plays an important role in computing the second-order ε -optimum exponent.
In this paper, on the other hand, we investigate the hypothesis testing for mixed memoryless sources. The class of mixed sources is quite important, because all stationary sources can be regarded as mixed sources consisting of stationary ergodic sources. Therefore, the analysis for mixed sources is primitive but fundamental and thus we first focus on the case where the null hypothesis is a mixed memoryless source and the alternative hypothesis is a memoryless source. In this direction, Han [1] has first derived the single-letter formula for the first-order ε -optimum exponent in the case with mixed memoryless source X and stationary memoryless source X ¯ . The first main result in this paper is to establish the single-letter second-order ε -optimum exponent in the same setting by invoking the relevant asymptotic normality. The result is a substantial generalization of that of Strassen [4]. Second, we generalize this setting to the case where both null and alternative hypotheses are mixed memoryless X , X ¯ to establish the single-letter first-order ε -optimum exponent.
It should be emphasized that our results described here are valid for mixed memoryless sources with general mixture in the sense that the mixing weight for component sources may be an arbitrary probability measure. For the case of mixed general sources with finite discrete mixture, we reveal the deep relationship with the compound hypothesis testing problem. We notice that the compound hypothesis testing problem is important from both of theoretical and practical points of view. We show that the first-order 0-optimum (respectively, exponentially r-optimum) exponent for the mixed general hypothesis testing coincides with that for the 0-optimum (respectively, exponentially r-optimum) exponent in the compound general hypothesis testing.
The present paper is organized as follows. In Section 2, we fix the problem setting and review the general formula (Theorem 1) for the first-order ε -optimum exponent. This is used to prove Theorem 5 to establish a first-order single-letter formula for hypothesis testing in the case where both the null and alternative hypotheses are mixed memoryless. Moreover, we give the general formula (Theorem 2) for the second-order ε -optimum exponent, which is used to prove Theorem 4 to establish a second-order single-letter formula for hypothesis testing in the case where the null hypothesis is mixed memoryless and the alternative hypothesis is stationary memoryless. In Section 3, we establish the single-letter second-order ε -optimum exponent in the case with mixed memoryless source X and stationary memoryless source X ¯ (cf. Theorem 4). Furthermore, in Section 4, we consider the case where both of null and alternative hypotheses are mixed memoryless sources, and derive the single-letter first-order ε -optimum exponent (cf. Theorem 5). Section 5 is devoted to an extension of mixed memoryless sources to mixed general sources. Finally, in Section 6, we define the optimum exponent for the compound general hypothesis testing problem to discuss a relevant relationship with the hypothesis testing with mixed general sources. We conclude the paper in Section 7.

2. General Formulas for ε -Hypothesis Testing

In this section, we first review the first-order general formula and then give the second-order general formula. Throughout in this paper, the following lemmas play the important role, where we use the notation that P Z indicates the probability distribution of random variable Z.
Lemma 1
([1] (Lemma 4.1.1)). For any t > 0 , define the acceptance region as
A n = x X n 1 n log P X n ( x ) P X ¯ n ( x ) t ,
then, it holds that
Pr X ¯ n A n e n t .
Lemma 2
([1] (Lemma 4.1.2)). For any t > 0 and any A n , it holds that
Pr X n A n + e n t Pr X ¯ n A n Pr 1 n log P X n ( X n ) P X ¯ n ( X n ) t .
Proofs of these lemmas are found in [1].
We define the first and second-order ε -optimum exponents as follows.
Definition 1.
Rate R is said to be ε-achievable, if there exists an acceptance region A n such that
lim sup n μ n ε a n d lim inf n 1 n log 1 λ n R .
Definition 2
(First-order ε-optimum exponent).
B ε ( X | | X ¯ ) : = sup { R | R   i s   ε a c h i e v a b l e } .
The right-hand side of Equation (5) specifies the asymptotic behavior of the form λ n e n R . Chen [3] has derived the general limiting formula for B ε ( X | | X ¯ ) as follows, which is utilized to establish Theorem 5 in Section 4.
Theorem 1
(Chen [3] (Theorem 1)).
B ε ( X | | X ¯ ) = sup { R | K ( R ) ε } ( 0 ε < 1 ) ,
where
K ( R ) = lim sup n Pr 1 n log P X n ( X n ) P X ¯ n ( X n ) R .
Moreover, we consider the second-order ( ε , R ) -optimum exponent as follows.
Definition 3.
Rate S is said to be ( ε , R ) -achievable, if there exists an acceptance region A n such that
lim sup n μ n ε a n d lim inf n 1 n log 1 λ n e n R S .
Definition 4
(Second-order ( ε , R ) -optimum exponent).
B ε ( R | X | | X ¯ ) : = sup { S | S   i s   ( ε , R ) a c h i e v a b l e } .
The right-hand side of Equation (9) specifies the asymptotic behavior of the form λ n e n R n S . The general limiting formula for B ε ( R | X | | X ¯ ) is given as follows, which is the second-order counterpart of Theorem 1, and is utilized to establish Theorem 4 in the next Section 3.2 to give a second-order single-letter formula for hypothesis testing in the case where the null hypothesis is mixed memoryless and the alternative hypothesis is stationary memoryless.
Theorem 2.
B ε ( R | X | | X ¯ ) = sup { S | K ( R , S ) ε } ( 0 ε < 1 ) ,
where
K ( R , S ) = lim sup n Pr 1 n log P X n ( X n ) P X ¯ n ( X n ) R + S n .
Proof. 
See Appendix A. □

3. Mixed Memoryless Sources

3.1. First-Order ε -Optimum Exponent

In the previous section, we have demonstrated the “limiting” formulas for general hypothesis testing. In this and subsequent sections, we consider special but insightful cases and compute the optimum exponents in single-letter forms.
Let Θ be an arbitrary probability space with general probability measure w ( θ ) ( θ Θ ) . Then, the hypothesis testing problem to be considered in this section is stated as follows:
  • The null hypothesis is a mixed stationary memoryless source X = { X n } n = 1 , that is, for x = ( x 1 , , x n ) X n
    P X n ( x ) = Θ P X θ n ( x ) d w ( θ ) ,
    where X θ n is a stationary memoryless source for each θ Θ and
    P X θ n ( x ) = i = 1 n P X θ ( x i )
    with generic random variable X θ ( θ Θ ) taking values in X .
  • The alternative hypothesis is a stationary memoryless source X ¯ = X ¯ n n = 1 with generic random variable X ¯ taking values in X , that is,
    P X ¯ n ( x ) = i = 1 n P X ¯ ( x i ) .
We assume X to be a finite alphabet hereafter.
To investigate this special case, first we introduce an expurgated parameter set on the basis of types, where the type T of sequence x X n is the empirical distribution of x , that is, T = ( N ( x | x ) / n ) x X with the number N ( x | x ) of i such that x i = x ( i = 1 , 2 , , n ) .
Let T 1 , T 2 , , T N n denote all possible types of sequences of length n. Then, it is well-known that
N n ( n + 1 ) | X | .
Now, for each x X n , we define the set
Θ ( x ) : = θ Θ P X θ n ( x ) e n 4 P X n ( x ) .
Since P X θ n is an i.i.d. source for each θ Θ , the set Θ ( x ) depends only on the type T k of sequence x , and therefore, we may write Θ ( T k ) instead of Θ ( x ) . Moreover, we define the “expurgated” set
Θ n * : = k = 1 N n Θ ( T k ) .
Then, we have the following lemma:
Lemma 3
(Han [1]). Let X = { X n } n = 1 denote a mixed memoryless source defined in Equation (13), then we have
Θ n * d w ( θ ) 1 ( n + 1 ) | X | e n 4 .
Next, we introduce two basic “decomposition” lemmas as follows.
Lemma 4
(Upper Decomposition Lemma). Let X = { X n } n = 1 be a mixed memoryless source and X ¯ = X ¯ n n = 1 be an arbitrary general source. Then, for any θ Θ n * and any real z n it holds that
Pr 1 n log P X n ( X θ n ) P X ¯ n ( X θ n ) z n Pr 1 n log P X θ n ( X θ n ) P X ¯ n ( X θ n ) z n + 1 n 4 3 .
Proof. 
See Appendix B. □
Lemma 5
(Lower Decomposition Lemma). Let X = { X n } n = 1 be a mixed memoryless source and X ¯ = X ¯ n n = 1 be an arbitrary general source. Then, for any θ Θ , z n and γ > 0 it holds that
Pr 1 n log P X n ( X θ n ) P X ¯ n ( X θ n ) z n Pr 1 n log P X θ n ( X θ n ) P X ¯ n ( X θ n ) z n γ n e n γ .
Proof. 
See Appendix C. □
These Lemmas 3–5 are used later in order to establish Theorems 3–5. First, Theorem 3 concerning the first-order ε -optimum exponent for mixed memoryless sources has earlier been given as follows:
Theorem 3
(First-order ε-optimum exponent: Han [1]). For 0 ε < 1 ,
B ε ( X | | X ¯ ) = sup R { θ | D ( P X θ | | P X ¯ ) < R } d w ( θ ) ε
where D ( P X | | P X ¯ ) denotes the Kullback–Leibler divergence between P X and P X ¯ .
Remark 1.
If Θ is a singleton, the above formula reduces to
B ε ( X | | X ¯ ) = D ( P X | | P X ¯ ) ( 0 ε < 1 ) ,
which is nothing but Stein’s lemma [2].
Remark 2.
B ε ( X | | X ¯ ) can be expressed also as
B ε ( X | | X ¯ ) = sup R { θ | D ( P X θ | | P X ¯ ) R } d w ( θ ) ε .
This can be verified as follows. Set
β ε : = sup R { θ | D ( P X θ | | P X ¯ ) < R } d w ( θ ) ε ,
β ˜ ε : = sup R { θ | D ( P X θ | | P X ¯ ) R } d w ( θ ) ε .
Then, clearly β ˜ ε β ε . Here, we assume that β ˜ ε < β ε to show a contradiction. From the assumption, there exists a constant γ > 0 satisfying β ˜ ε + 2 γ < β ε . On the other hand, from the definition of β ε , for any η > 0
ε { θ | D ( P X θ | | P X ¯ ) < β ε η } d w ( θ )
holds. Thus, setting η < γ leads to
ε { θ | D ( P X θ | | P X ¯ ) < β ε η } d w ( θ ) { θ | D ( P X θ | | P X ¯ ) < β ε γ } d w ( θ ) { θ | D ( P X θ | | P X ¯ ) β ˜ ε + γ } d w ( θ ) > ε ,
which is a contradiction, where the last inequality is due to the definition of β ˜ ε .

3.2. Second-Order ε -Optimum Exponent

Next, we establish the second-order ε -optimum exponent for mixed sources, which is the first main result in this paper.
Theorem 4
(Second-order ε-optimum exponent). For 0 ε < 1 ,
B ε ( R | X | | X ¯ ) = sup S { θ | D ( P X θ | | P X ¯ ) < R } d w ( θ ) + { θ | D ( P X θ | | P X ¯ ) = R } Φ θ ( S ) d w ( θ ) ε ,
where
Φ θ ( S ) : = G S V θ ,
G ( x ) : = 1 2 π x e x 2 2 d x ,
V θ : = x X P X θ ( x ) log P X θ ( x ) P X ¯ ( x ) D ( P X θ | | P X ¯ ) 2 .
Proof. 
See Appendix D. □
Remark 3.
If Θ is a singleton ( Θ = { θ 0 } ) , Theorem 4 reduces to B ε ( R | X | | X ¯ ) = V θ 0 Φ θ 0 1 ( ε ) for R = B ε ( X | | X ¯ ) , which is originally due to Strassen [4].
Remark 4.
From Theorem 3 with R = B ε ( X | | X ¯ ) , it is not difficult to verify that
{ θ | D ( P X θ | | P X ¯ ) < R } d w ( θ ) ε
and
{ θ | D ( P X θ | | P X ¯ ) R } d w ( θ ) ε .
Here, let us consider the following canonical equation for S
Θ d w ( θ ) lim n Φ θ ( n ( B ε ( X | | X ¯ ) D ( P X θ | | P X ¯ ) ) + S ) = ε .
In view of Equations (33) and (34), this equation always has a solution S = S ( ε ) . It should be noted that if { θ | D ( P X θ | | P X ¯ ) = B ε ( X | | X ¯ ) } d w ( θ ) = 0 holds, the solution is not unique and so S ( ε ) = + . By using the solution S ( ε ) , it is not difficult to check that Theorem 4 with R = B ε ( X | | X ¯ ) can be expressed as
B ε ( R | X | | X ¯ ) = S ( ε ) .
The canonical equation is a useful expression for the second-order ε-optimum rate [7,10,11,12]. Equation (35) is the hypothesis testing counterpart of these results.

4. Mixed Memoryless Alternative Hypothesis

In this section, we consider the case where not only the null hypothesis but also the alternative hypothesis are mixed memoryless sources to establish the single-letter formula for the first-order ε -optimum exponent, by which we intend to generalize Theorem 3.
Let P X ¯ σ σ Σ be a family of probability distributions on X , where Σ is a probability space with probability measure v ( σ ) . We assume here that Σ is a compact space and P X ¯ σ is continuous as a function of σ Σ .
The hypothesis testing problem considered in this section is stated as follows:
  • The null hypothesis is a mixed memoryless source X = { X n } n = 1 as defined by Equations (13) and (14) in Section 3.1.
  • The alternative hypothesis is another mixed memoryless source X ¯ = X ¯ n n = 1 , that is, for x X n
    P X ¯ n ( x ) = Σ P X ¯ σ n ( x ) d v ( σ ) ,
    where
    P X ¯ σ n ( x ) = i = 1 n P X ¯ σ ( x i ) .
Let us now consider, for each P P ( X ) (the set of probability distributions on X ), the equation with respect to σ Σ as follows:
D ( P | | P ¯ σ ) = v - ess . inf D ( P | | P ¯ σ ) ( for   each P P ( X ) )
with v - ess . inf f σ : = sup { β | Pr { f σ < β } = 0 } (the essential infimum of f σ with respect to v ( σ ) ), where Pr is measured with respect to the probability measure v ( σ ) .
Since the solution σ of this equation depends on P, we may write as σ = σ ( P ) ( σ ( · ) : P ( X ) Σ ) . Notice here that D ( P | | P ¯ σ ) is continuous in ( P , P ¯ σ ) , and as we have assumed that Σ is compact and P ¯ σ is continuous in σ , there indeed exists such a function σ ( P ) . Now, to avoid technical subtleties, we assume here that the function σ ( P ) may be chosen so as to be continuous. For example, if we consider a special case such that Σ is a closed convex subset of P ( X ) , then it is not difficult to verify that the function σ ( P ) is uniquely determined and continuous (or even differentiable), which follows from the strict convexity of D ( P | | P ¯ ) in ( P , P ¯ ) . Another simple example will be the case that Σ is a countable set.
Hereafter, for simplicity, we write P θ , P θ n (respectively, P ¯ σ , P ¯ σ n ) instead of P X θ , P X θ n (respectively, P X ¯ σ , P X ¯ σ n ), then we have the second main result in this paper as
Theorem 5
(First order ε-optimum exponent). For 0 ε < 1 ,
B ε ( X | | X ¯ ) = sup R { θ | D ( P θ | | P ¯ σ ( P θ ) ) < R } d w ( θ ) ε .
Remark 5.
In the case that Σ is a singleton, the above theorem coincides with Theorem 3. Therefore, this theorem is a direct generalization of Theorem 3. This means also that both Θ and Σ are singletons, the theorem coincides with Stein’s lemma (see Remark 1).
Remark 6.
Remark 2 is also valid in this theorem. That is, B ε ( X | | X ¯ ) can be expressed also as
B ε ( X | | X ¯ ) = sup R { θ | D ( P θ | | P ¯ σ ( P θ ) ) R } d w ( θ ) ε .
Proof of Theorem 5.
To show the theorem, let T θ , ν n X n be the set of ν -typical sequence with respect to P X θ , that is, let T θ , ν n be the set of all x = ( x 1 , x 2 , , x n ) X n such that
N ( x | x ) / n P X θ ( x ) ν P X θ ( x ) ( x X ) ,
where N ( x | x ) is the number of i such that x i = x , and ν > 0 is an arbitrary constant. Then, it is well known that
Pr X θ n T θ , ν n 1 ( n ) .
In the sequel, we use the upper and lower bounds of the probability
P X ¯ n ( x ) = Σ P ¯ σ n ( x ) d v ( σ )
in the form
1 n log 1 P X ¯ n ( x ) 1 n log 1 P ¯ σ ( P θ ) n ( x ) δ θ ( ν ) ,
1 n log 1 P X ¯ n ( x ) 1 n log 1 P ¯ σ ( P θ ) n ( x ) + 1 n log 1 c τ m ( P θ ) + ( τ δ θ ( ν ) ) ,
for each x T θ , ν n , where δ θ ( ν ) satisfies δ θ ( ν ) 0 as ν 0 , and τ > 0 and c τ m ( P θ ) > 0 are some constants independent of n. Proofs of Equations (45) and (46) appear in Appendix E.
We then prove the theorem by using Equations (45) and (46) as follows. In view of Theorem 1 and Remark 6, it suffices to show two inequalities:
lim sup n Pr 1 n log P X n ( X n ) P X ¯ n ( X n ) R { θ | D ( P θ | | P ¯ σ ( P θ ) ) R } d w ( θ ) ,
lim sup n Pr 1 n log P X n ( X n ) P X ¯ n ( X n ) R { θ | D ( P θ | | P ¯ σ ( P θ ) ) < R } d w ( θ ) .
  • Proof of Equation (47):
Similar to the derivation of Equation (A23) with Lemma 4, we have
lim sup n Pr 1 n log P X n ( X n ) P X ¯ n ( X n ) R = lim sup n Θ d w ( θ ) Pr 1 n log P X n ( X θ n ) P X ¯ n ( X θ n ) R Θ d w ( θ ) lim sup n Pr 1 n log P X θ n ( X θ n ) P X ¯ n ( X θ n ) R + 1 n 4 3 .
From the definition of the ν -typical set and Equation (45), we also have
lim sup n Pr 1 n log P X θ n ( X θ n ) P X ¯ n ( X θ n ) R + 1 n 4 3 lim sup n Pr 1 n log P X θ n ( X θ n ) P X ¯ n ( X θ n ) R + 1 n 4 3 , X θ n T θ , ν n + lim sup n Pr X θ n T θ , ν n lim sup n Pr 1 n log P X θ n ( X θ n ) P ¯ σ ( P θ ) n ( X θ n ) R + 1 n 4 3 + δ θ ( ν ) , X θ n T θ , ν n lim sup n Pr 1 n log P X θ n ( X θ n ) P ¯ σ ( P θ ) n ( X θ n ) R + 1 n 4 3 + δ θ ( ν ) ,
for any θ Θ . Here, we define two sets:
Θ 1 : = θ Θ D ( P θ | | P ¯ σ ( P θ ) ) R ,
Θ 2 : = θ Θ D ( P θ | | P ¯ σ ( P θ ) ) > R .
Then, from the definition of Θ 2 there exists a small constant γ > 0 satisfying
D ( P θ | | P ¯ σ ( P θ ) ) R + 3 γ
for θ Θ 2 . Thus, it holds that
lim sup n Pr 1 n log P X θ n ( X θ n ) P ¯ σ ( P θ ) n ( X θ n ) R + 1 n 4 3 + δ θ ( ν ) lim sup n Pr 1 n log P X θ n ( X θ n ) P ¯ σ ( P θ ) n ( X θ n ) D ( P θ | | P ¯ σ ( P θ ) ) + 1 n 4 3 + δ θ ( ν ) 3 γ lim sup n Pr 1 n log P X θ n ( X θ n ) P ¯ σ ( P θ ) n ( X θ n ) D ( P θ | | P ¯ σ ( P θ ) ) γ ,
where we have used the relation 1 n 4 3 < γ , and δ θ ( ν ) < γ for sufficiently large n and sufficiently small ν > 0 .
Therefore, noting that, with X θ n = ( X θ , 1 , X θ , 2 , , X θ , n ) ,
1 n log P X θ n ( X θ n ) P ¯ σ ( P θ ) n ( X θ n ) = 1 n i = 1 n log P X θ ( X θ , i ) P ¯ σ ( P θ ) ( X θ , i )
gives the arithmetic average of n i.i.d. variables with expectation
E 1 n i = 1 n log P X θ ( X θ ) P ¯ σ ( P θ ) n ( X θ ) = D ( P θ | | P ¯ σ ( P θ ) ) .
Then, the weak law of large numbers yields that for θ Θ 2 ,
lim sup n Pr 1 n log P X θ n ( X θ n ) P ¯ σ ( P θ ) n ( X θ n ) D ( P θ | | P ¯ σ ( P θ ) ) γ = 0 .
Thus, from Equations (54) and (57), the right-hand side of Equation (49) is upper bounded by
Θ d w ( θ ) lim sup n Pr 1 n log P X θ n ( X θ n ) P ¯ σ ( P θ ) n ( X θ n ) R + 1 n 4 3 + δ θ ( ν ) Θ 1 d w ( θ ) lim sup n Pr 1 n log P X θ n ( X θ n ) P ¯ σ ( P θ ) n ( X θ n ) D ( P θ | | P ¯ σ ( P θ ) ) γ + Θ 2 d w ( θ ) lim sup n Pr 1 n log P X θ n ( X θ n ) P ¯ σ ( P θ ) n ( X θ n ) D ( P θ | | P ¯ σ ( P θ ) ) γ Θ 1 d w ( θ ) = { θ | D ( P θ | | P ¯ σ ( P θ ) ) R } d w ( θ ) ,
which completes the proof of (47).
  • Proof of Equation (48):
Similar to the derivation of Equation (A32) with Lemma 5, we have
lim sup n Pr 1 n log P X n ( X n ) P X ¯ n ( X n ) R Θ d w ( θ ) lim inf n Pr 1 n log P X θ n ( X θ n ) P X ¯ n ( X θ n ) R γ n .
From the definition of the ν -typical set and Equation (46), we also have
lim inf n Pr 1 n log P X θ n ( X θ n ) P X ¯ n ( X θ n ) R γ n lim inf n Pr 1 n log P X θ n ( X θ n ) P X ¯ n ( X θ n ) R γ n , X θ n T θ , ν n lim inf n Pr 1 n log P X θ n ( X θ n ) P ¯ σ ( P θ ) n ( X θ n ) R γ n 1 n log 1 c τ m ( P θ ) τ + δ θ ( ν ) lim sup n Pr X θ n T θ , ν n = lim inf n Pr 1 n log P X θ n ( X θ n ) P ¯ σ ( P θ ) n ( X θ n ) R γ n 1 n log 1 c τ m ( P θ ) τ + δ θ ( ν )
for any θ Θ .
We also partition the parameter space Θ into two sets.
Θ 1 : = θ Θ D ( P θ | | P ¯ σ ( P θ ) ) < R ,
Θ 2 : = θ Θ D ( P θ | | P ¯ σ ( P θ ) ) R .
Then, for θ Θ 1 , if we set ν > 0 and τ > 0 sufficiently small, then there exists a constant η > 0 satisfying
R γ n 1 n log 1 c τ m ( P θ ) τ + δ θ ( ν ) > D ( P θ | | P ¯ σ ( P θ ) ) + η ( n > n 0 ) .
Thus, again by invoking the weak law of large numbers, we have for θ Θ 1
lim inf n Pr 1 n log P X θ n ( X θ n ) P ¯ σ ( P θ ) n ( X θ n ) R γ n 1 n log 1 c τ m ( P θ ) τ + δ θ ( ν ) lim inf n Pr 1 n log P X θ n ( X θ n ) P ¯ σ ( P θ ) n ( X θ n ) D ( P θ | | P ¯ σ ( P θ ) ) + η = 1 .
Summarizing up, we obtain
lim sup n Pr 1 n log P X n ( X n ) P X ¯ n ( X n ) R Θ d w ( θ ) lim inf n Pr 1 n log P X θ n ( X θ n ) P ¯ σ ( P θ ) n ( X θ n ) R γ n 1 n log 1 c τ m ( P θ ) τ + δ θ ( ν ) Θ 1 d w ( θ ) lim inf n Pr 1 n log P X θ n ( X θ n ) P ¯ σ ( P θ ) n ( X θ n ) D ( P θ | | P ¯ σ ( P θ ) ) + η = Θ 1 d w ( θ ) = { θ | D ( P θ | | P ¯ σ ( P θ ) ) < R } d w ( θ ) .
This completes the proof of Equation (48). □
Remark 7.
Theorem 3 is a special case of Theorem 5 when Σ is a singleton.
To illustrate a significance of Theorem 5, let us now consider the special case with ε = 0 . Then, by virtue of Theorem 5, we have the following simplified result:
Corollary 1.
In the special case of ε = 0 , we have
B 0 ( X | | X ¯ ) = w - ess . inf θ Θ v - ess . inf σ Σ D ( P θ | | P ¯ σ ) .
Proof. 
The formula (40) can be written in this case as
B 0 ( X | | X ¯ ) = sup R { θ | D ( P θ | | P ¯ σ ( P θ ) ) < R } d w ( θ ) = 0 .
Let
R 1 < sup R { θ | D ( P θ | | P ¯ σ ( P θ ) ) < R } d w ( θ ) = 0 ,
then this means that
R 1 w - ess . inf θ Θ D ( P θ | | P ¯ σ ( P θ ) ) = w - ess . inf θ Θ v - ess . inf σ Σ D ( P θ | | P ¯ σ ) .
Contrarily, let
R 2 > sup R { θ | D ( P θ | | P ¯ σ ( P θ ) ) < R } d w ( θ ) = 0 ,
then this means that
R 2 w - ess . inf θ Θ D ( P θ | | P ¯ σ ( P θ ) ) = w - ess . inf θ Θ v - ess . inf σ Σ D ( P θ | | P ¯ σ ) .
As a consequence, (66) follows from (67), (69) and (71). □
Remark 8.
One may wonder if it might be possible to deal with the second-order ε-optimum problem too using the arguments as developed in the above for the first-order ε-optimum problem with mixed memoryless sources X and X ¯ . To do so, however, it seems that we need some novel techniques, which remain to be studied.

5. Hypothesis Testing with Mixed General Sources

We have so far investigated the ε -hypothesis testing for mixed memoryless sources. In this section, we deal with more general settings such as hypothesis testings with mixed general sources, which inherits the crux of that for mixed memoryless sources (cf. Theorem 5). This leads us to a primitive but insightful “general” observation.
To do so, we consider the case where both of null hypothesis X and alternative hypothesis X ¯ are finite mixtures of general sources as follows:
  • The null hypothesis is a mixed general source X = { X n } n = 1 consisting of K general (not necessarily memoryless) sources X i = { X i n } n = 1 ( i = 1 , 2 , , K ) , that is, x X n ,
    P X n ( x ) = i = 1 K α i P X i n ( x ) ,
    where α i > 0 ( i = 1 , 2 , , K ) and i = 1 K α i = 1 .
  • The alternative hypothesis is another mixed general source X ¯ = X ¯ n n = 1 consisting of L general (not necessarily memoryless) sources X ¯ j = { X ¯ j n } n = 1 ( j = 1 , , L ) , that is, x X n ,
    P X ¯ n ( x ) = j = 1 L β j P X ¯ j n ( x ) ,
    where β j > 0 ( j = 1 , 2 , , L ) and j = 1 L β j = 1 .
In this general setting, it is hard to derive a compact formula for the first-order ε -optimum exponent (with 0 < ε < 1 ). Instead, we can obtain the following theorem in the special case of ε = 0 .
Theorem 6.
B 0 ( X | | X ¯ ) = min 1 i K , 1 j L B 0 ( X i | | X ¯ j ) .
In particular, if X i and X ¯ j are all stationary memoryless sources specified by X i ( i = 1 , 2 , , K ) and X ¯ j ( j = 1 , 2 , , L ) , respectively, then
B 0 ( X | | X ¯ ) = min 1 i K , 1 j L D ( P X i | | P X ¯ j ) ,
which is a special case of Corollary 1.
Proof. 
See Appendix F. □
Furthermore, we can also consider the following exponentially r-optimum exponent in hypothesis testing with two mixed general sources X and X ¯ as above.
Definition 5.
Let r > 0 be any fixed constant. Rate R is said to be exponentially r-achievable if there exists an acceptance region A n such that
lim inf n 1 n log 1 μ n r , lim inf n 1 n log 1 λ n R .
Definition 6
(First-order exponentially r-optimum exponent).
B e ( r | X | | X ¯ ) : = sup { R | R   i s   e x p o n e n t i a l l y   r a c h i e v a b l e } .
Then, it is not difficult to verify that a result analogous to Theorem 6 holds, which is a generalization of [1] (Remark 4.4.3):
Theorem 7.
B e ( r | X | | X ¯ ) = min 1 i K , 1 j L B e ( r | X i | | X ¯ j ) .
In particular, if the null and alternative hypotheses consist of stationary memoryless sources X i ( i = 1 , 2 , , K ) and X ¯ j ( j = 1 , 2 , , L ) , respectively, then
B e ( r | X | | X ¯ ) = min 1 i K , 1 j L inf P : D ( P | | P X i ) < r D ( P | | P X ¯ j ) ,
by virtue of Hoeffding’s theorem.

6. Hypothesis Testing with Compound General Sources

In this section, let us consider the compound hypothesis testing problem with finite null hypotheses X i = { X i n } n = 1 ( i = 1 , 2 , , K ) and finite alternative hypotheses X ¯ j = { X ¯ j n } n = 1 ( j = 1 , 2 , , L ) , where X i and X ¯ j are general sources. As is well-known, this problem is expected to have a primitive but “general” relationship to that of mixed hypothesis at the structural level.
Specifically, the compound hypothesis testing is the problem in which a pair of general sources ( X i , X ¯ j ) occurs as a pair (null hypothesis, alternative hypothesis), and the tester does not know which pair ( X i , X ¯ j ) is actually working. This means that the acceptance region A n cannot depend on i and j. The type I error probabilities of the compound hypothesis testing are given by
μ n ( i ) : = Pr X i n A n ,
for each general null hypothesis X i . The type II error probabilities are also given by
λ n ( j ) : = Pr X ¯ j n A n ,
for each general alternative hypothesis X ¯ j . Then, the following achievability is of our interest.
Definition 7.
Rate R is said to be 0-achievable for the compound hypothesis testing, if there exists an acceptance region A n such that
lim n μ n ( i ) = 0 a n d lim inf n 1 n log 1 λ n ( j ) R ,
for all i = 1 , 2 , , K and j = 1 , 2 , , L .
Definition 8
(First-order 0-optimum exponent).
B ( { X i } i = 1 K | | { X ¯ j } j = 1 L ) : = sup { R | R   i s   0 a c h i e v a b l e } .
Now, we have
Theorem 8.
Assuming that α i > 0 and β j > 0 hold for all i = 1 , 2 , , K and j = 1 , 2 , , L , it holds that
B ( { X i } i = 1 K | | { X ¯ j } j = 1 L ) = B ( { α i , X i } i = 1 K | | { β j , X ¯ j } j = 1 L ) ,
where with sources Equations (72) and (73), we use here the notation
B ( { α i , X i } i = 1 K | | { β j , X ¯ j } j = 1 L )
to denote B 0 ( X | | X ¯ ) to make explicit dependence on α i , β j .
Proof. 
See Appendix G. □
From Theorems 6 and 8, we immediately obtain the first-order 0-optimum exponent for the compound hypothesis testing as:
Corollary 2.
Assuming that α i > 0 and β j > 0 hold for all i = 1 , 2 , , K and j = 1 , 2 , , L , we have
B ( { X i } i = 1 K | | { X ¯ j } j = 1 L ) = min 1 i K , 1 j L B 0 ( X i | | X ¯ j ) .
In particular, if X i and X ¯ j are all stationary memoryless sources specified by X i and X ¯ j , respectively, Equation (86) reduces to
B ( { X i } i = 1 K | | { X ¯ j } j = 1 L ) = min 1 i K , 1 j L D ( P X i | | P X ¯ j ) .
Remark 9.
Similar to Definition 5, we can define the exponentially r-optimum exponent also for the compound hypothesis testing problem as follows.
Definition 9.
Let r > 0 be any fixed constant. Rate R is said to be exponentially r-achievable for the compound hypothesis testing, if there exists an acceptance region A n such that
lim inf n 1 n log 1 μ n ( i ) r ,
lim inf n 1 n log 1 λ n ( j ) R ,
for all i = 1 , 2 , , K and j = 1 , 2 , , L .
Definition 10
(First-order exponentially r-optimum exponent).
B e ( r | { X i } i = 1 K | | { X ¯ j } j = 1 L ) : = sup { R | R   i s   e x p o n e n t i a l l y   r a c h i e v a b l e } .
Then, using an argument similar to the proof of Theorem 8, the following theorem can be shown:
Theorem 9.
Let α i > 0 and β j > 0 hold for all i = 1 , 2 , , K and j = 1 , 2 , , L , then it holds that
B e ( r | { X i } i = 1 K | | { X ¯ j } j = 1 L ) = B e ( r | { α i , X i } i = 1 K | | { β j , X ¯ j } j = 1 L ) ,
where with sources Equations (72) and (73) we use the notation
B e ( r | { α i , X i } i = 1 K | | { β j , X ¯ j } j = 1 L )
to denote B e ( r | X | | X ¯ ) (cf. Definitions 5 and 6).
Combining Theorems 7 and 9, we immediately obtain the following corollary:
Corollary 3.
Let α i > 0 and β j > 0 hold for all i = 1 , 2 , , K and j = 1 , 2 , , L , then it holds that
B e ( r | { X i } i = 1 K | | { X ¯ j } j = 1 L ) = min 1 i K , 1 j L B e ( r | X i | | X ¯ j ) .
In particular, if the null and alternative hypotheses consist of stationary memoryless sources specified by X i ( i = 1 , 2 , , K ) and X ¯ j ( j = 1 , 2 , , L ) , respectively, as in Theorem 7, then
B e ( r | { X i } i = 1 K | | { X ¯ j } j = 1 L ) = min 1 i K , 1 j L inf P : D ( P | | P X i ) < r D ( P | | P X ¯ j ) ,
which corresponds to Equation (79).

7. Concluding Remarks

Thus far, we have investigated the first- and second-order ε -optimum exponents in the hypothesis testing problem. First, we have studied the second-order ε -optimum problem with mixed memoryless null hypothesis and stationary memoryless alternative hypothesis. As we have shown in the analysis of the second-order ε -optimum exponent, we use, as a key property, the asymptotic normality of divergence density rate for each of the component sources. We also observe that the canonical representation, first introduced in [11], is still efficient to express the second-order ε -optimum exponent for mixed memoryless sources in the hypothesis testing problem.
The first-order ε -optimum exponent in the case with mixed memoryless null and alternative hypotheses has also been established. One may wonder whether we can apply the same approach in the derivation of the second-order ε -optimum exponent in this setting. Notice that one of our key techniques to derive the first-order ε -optimum exponent is an expansion P x around P θ . More careful evaluation of this expansion would be needed to compute the second-order ε -optimum exponent. This remains to be a future work. Our final goal is the problem of hypothesis testing in which both of null and alternative hypotheses are general stationary sources. This paper characterizes the first- and second-order performance of hypothesis testing for mixed memoryless sources as a simple but crucial step toward this goal.
Finally, the relationship between the first-order 0-optimum (respectively, exponentially r-optimum) exponent in the hypothesis testing with mixed general sources and the 0-optimum (respectively, exponentially r-optimum) exponent in the compound hypothesis testing has also been demonstrated.

Acknowledgments

We are grateful to the reviewers for useful comments. In particular, we greatly appreciate the unusually thorough and insightful comments by Reviewer 3, which indeed helped us enhance the quality of the paper. The second author of this work was supported in part by Japan Society for the Promotion of Science (JSPS) KAKENHI Grant Number JP26420371.

Author Contributions

Te Sun Han first presented Theorem 4, part of the paper, at IEEE Information Theory Workshop, Jeju, 2015, and subsequently, Te Sun Han and Ryo Nomura discussed together and collaborated to establish Theorem 5.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Theorem 2

The proof consists of two parts.
(1)
Direct Part:
Set S 0 = sup { S | K ( R , S ) ε } . Then, we show that S = S 0 γ is ( ε , R ) -achievable for γ > 0 .
Define the acceptance region A n as
A n = 1 n log P X n ( x ) P X ¯ n ( x ) > R + S n .
Then, from Lemma 4 with t = R + S n we have the upper bound for the type II error probability λ n :
λ n = Pr X ¯ n A n e n R n S ,
from which it follows that
lim inf n 1 n log 1 λ n e n R S .
We next evaluate the type I error probability μ n . Noting that
μ n = Pr X n A n = Pr 1 n log P X n ( X n ) P X ¯ n ( X n ) R + S n ,
we have
lim sup n μ n = lim sup n Pr 1 n log P X n ( X n ) P X ¯ n ( X n ) R + S n ε ,
because S = S 0 γ by the definition. Hence, from Equations (A3) and (A5), S = S 0 γ is ( ε , R ) -achievable. Since γ > 0 is arbitrary, the direct part has been proved.
(2)
Converse Part:
Suppose that S is ( ε , R ) -achievable. Then, there exists an acceptance region A n such that
lim sup n μ n ε and lim inf n 1 n log 1 λ n e n R S .
We fix this acceptance region A n . The second inequality means that for any γ > 0
λ n e n R n ( S γ )
holds for sufficiently large n. On the other hand, from Lemma 2 with t = R + S 2 γ n it holds that
μ n + e n R + n ( S 2 γ ) λ n Pr 1 n log P X n ( X n ) P X ¯ n ( X n ) R + S 2 γ n .
Substituting Equation (A7) into this inequality, we have
μ n + e n γ Pr 1 n log P X n ( X n ) P X ¯ n ( X n ) R + S 2 γ n ,
for sufficiently large n. Thus, we have
lim sup n μ n lim sup n Pr 1 n log P X n ( X n ) P X ¯ n ( X n ) R + S 2 γ n .
Here, from Equation (A6) we have
ε lim sup n μ n lim sup n Pr 1 n log P X n ( X n ) P X ¯ n ( X n ) R + S 2 γ n ,
which means that
S 2 γ B ε ( R | X | | X ¯ ) .
Since γ > 0 is arbitrarily, the proof of the converse part has been completed. □

Appendix B. Proof of Lemma 4

Since P X θ n ( x ) e n 4 P X n ( x ) holds for θ Θ n * , we have
Pr 1 n log P X n ( X θ n ) z n Pr 1 n log P X θ n ( X θ n ) 1 n 4 3 z n = Pr 1 n log P X θ n ( X θ n ) z n + 1 n 4 3
for any z n . By using this inequality with z n + 1 n log P X ¯ n ( X θ n ) instead of z n , we have
Pr 1 n log P X n ( X θ n ) P X ¯ n ( X θ n ) z n Pr 1 n log P X θ n ( X θ n ) P X ¯ n ( X θ n ) z n + 1 n 4 3
which completes the proof. □

Appendix C. Proof of Lemma 5

Setting γ > 0 , we define a set
D n = x X n 1 n log P X θ n ( x ) 1 n log P X n ( x ) γ n ,
for θ Θ . Then, it holds that
Pr X θ n D n = x D n P X θ n ( x ) x D n P X n ( x ) e n γ e n γ .
Thus, for any real number z n it holds that
Pr 1 n log P X θ n ( X θ n ) z n γ n = Pr 1 n log P X θ n ( X θ n ) z n γ n , X θ n D n + Pr 1 n log P X θ n ( X θ n ) z n γ n , X θ n D n Pr 1 n log P X n ( X θ n ) z n + Pr X θ n D n Pr 1 n log P X n ( X θ n ) z n + e n γ .
Hence, we obtain the inequality
Pr 1 n log P X n ( X θ n ) z n Pr 1 n log P X θ n ( X θ n ) z n γ n e n γ ,
from which with z n + 1 n log P X ¯ n ( X θ n ) instead of z n it follows that
Pr 1 n log P X n ( X θ n ) P X ¯ n ( X θ n ) z n Pr 1 n log P X θ n ( X θ n ) P X ¯ n ( X θ n ) z n γ n e n γ
for all θ Θ . This completes the proof. □

Appendix D. Proof of Theorem 4

Setting
B ¯ ε ( R , S ) : = { θ | D ( P X θ | | P X ¯ ) < R } d w ( θ ) + { θ | D ( P X θ | | P X ¯ ) = R } Φ θ ( S ) d w ( θ ) ,
it suffices, in view of Theorem 2, to show two inequalities:
B ¯ ε ( R , S ) lim sup n Pr 1 n log P X n ( X n ) P X ¯ n ( X n ) R + S n ,
B ¯ ε ( R , S ) lim sup n Pr 1 n log P X n ( X n ) P X ¯ n ( X n ) R + S n .
  • Proof of Equation (A21):
By the definitions of X and X ¯ , it holds that
lim sup n Pr 1 n log P X n ( X n ) P X ¯ n ( X n ) R + S n = lim sup n Θ d w ( θ ) Pr 1 n log P X n ( X θ n ) P X ¯ n ( X θ n ) R + S n lim sup n Θ n * d w ( θ ) Pr 1 n log P X n ( X θ n ) P X ¯ n ( X θ n ) R + S n + lim sup n Θ Θ n * d w ( θ ) = lim sup n Θ n * d w ( θ ) Pr 1 n log P X n ( X θ n ) P X ¯ n ( X θ n ) R + S n lim sup n Θ n * d w ( θ ) Pr 1 n log P X θ n ( X θ n ) P X ¯ n ( X θ n ) R + S n + 1 n 4 3 lim sup n Θ d w ( θ ) Pr 1 n log P X θ n ( X θ n ) P X ¯ n ( X θ n ) R + S n + 1 n 4 3 Θ d w ( θ ) lim sup n Pr 1 n log P X θ n ( X θ n ) P X ¯ n ( X θ n ) R + S n + 1 n 4 3 ,
where the second equality and the second inequality are due to Lemmas 3 and 4, respectively, and the last inequality is from the reverse Fatou’s lemma.
Here, we define three sets:
Θ 0 : = θ Θ D ( P X θ | | P X ¯ ) = R ,
Θ 1 : = θ Θ D ( P X θ | | P X ¯ ) < R ,
Θ 2 : = θ Θ D ( P X θ | | P X ¯ ) > R .
Noting that, setting X θ n = ( X θ , 1 , X θ , 2 , , X θ , n ) ,
1 n log P X θ n ( X θ n ) P X ¯ n ( X θ n ) = 1 n i = 1 n log P X θ ( X θ , i ) P X ¯ ( X θ , i )
gives the arithmetic average of n i.i.d. variables with expectation
E 1 n i = 1 n log P X θ ( X θ ) P X ¯ ( X θ ) = D ( P X θ | | P X ¯ ) .
Then, the weak law of large numbers yields that for θ Θ 2
lim sup n Pr 1 n log P X θ n ( X θ n ) P X ¯ n ( X θ n ) R + S n + 1 n 4 3 = 0 .
Moreover, for θ Θ 0 , the central limit theorem leads to
lim sup n Pr 1 n log P X θ n ( X θ n ) P X ¯ n ( X θ n ) R + S n + 1 n 4 3 = lim sup n Pr 1 n log P X θ n ( X θ n ) P X ¯ n ( X θ n ) n D ( P X θ | | P X ¯ ) S + 1 n 4 = Φ θ S .
Summarizing these equalities, we obtain
Θ d w ( θ ) lim sup n Pr 1 n log P X θ n ( X θ n ) P X ¯ n ( X θ n ) R + S n + 1 n 4 3 = Θ 1 d w ( θ ) lim sup n Pr 1 n log P X θ n ( X θ n ) P X ¯ n ( X θ n ) R + S n + 1 n 4 3 + Θ 0 d w ( θ ) lim sup n Pr 1 n log P X θ n ( X θ n ) P X ¯ n ( X θ n ) R + S n + 1 n 4 3 { θ | D ( P X θ | | P X ¯ ) < R } d w ( θ ) + { θ | D ( P X θ | | P X ¯ ) = R } Φ θ S d w ( θ ) .
Plugging Equation (A31) into Equation (A23) yields Equation (A21).
  • Proof of Equation (A22):
By the definitions of X and X ¯ , and Lemma 5 with z n = R + S n , it holds that
lim sup n Pr 1 n log P X n ( X n ) P X ¯ n ( X n ) R + S n lim inf n Θ d w ( θ ) Pr 1 n log P X n ( X θ n ) P X ¯ n ( X θ n ) R + S n lim inf n Θ d w ( θ ) Pr 1 n log P X θ n ( X θ n ) P X ¯ n ( X θ n ) R + S n γ n = lim inf n Θ d w ( θ ) Pr 1 n log P X θ n ( X θ n ) P X ¯ n ( X θ n ) R + S γ n Θ d w ( θ ) lim inf n Pr 1 n log P X θ n ( X θ n ) P X ¯ n ( X θ n ) R + S γ n ,
for any γ > 0 , where the last inequality is due to Fatou’s lemma. We also partition the parameter space Θ into three sets as in Equations (A24)–(A26).
Then, similarly to the derivation of Equations (A29) and (A30), we obtain
lim inf n Pr 1 n log P X θ n ( X θ n ) P X ¯ n ( X θ n ) R + S γ n = Φ θ S γ , θ Θ 0 1 . θ Θ 1
Thus, the right-hand side of Equation (A32) is rewritten as
Θ d w ( θ ) lim inf n Pr 1 n log P X θ n ( X θ n ) P X ¯ n ( X θ n ) R + S γ n Θ 1 d w ( θ ) lim inf n Pr 1 n log P X θ n ( X θ n ) P X ¯ n ( X θ n ) R + S γ n + Θ 0 d w ( θ ) lim inf n Pr 1 n log P X θ n ( X θ n ) P X ¯ n ( X θ n ) R + S γ n = { θ | D ( P X θ | | P X ¯ ) < R } d w ( θ ) + { θ | D ( P X θ | | P X ¯ ) = R } Φ θ S γ d w ( θ ) .
Substituting Equation (A34) into Equation (A32) and noting that γ > 0 is arbitrary, we obtain Equation (A22). □

Appendix E. Proofs of Equations (45) and (46)

(1)
Proof of Equation (45):
To prove Equation (45), we define a ( x ) as
a ( x ) : = v - ess . sup P ¯ σ n ( x ) ,
where v - ess . sup f σ denotes the essential supremum of f σ with respect to v ( σ ) , i.e., v - ess . sup f σ : = inf { α | Pr { f σ > α } = 0 } . Thus, from the property of the essential supremum we immediately have
a ( x ) P X ¯ n ( x ) ,
for n = 1 , 2 , .
Let P x denote the type of x T θ , ν n . Then, noting that
P ¯ σ n ( x ) = x X P ¯ σ ( x ) N ( x | x ) = exp x X N ( x | x ) log P ¯ σ ( x ) = exp n H ( P x ) + D ( P x | | P ¯ σ )
holds, a ( x ) is written as
a ( x ) = exp n H ( P x ) + v - ess . inf D ( P x | | P ¯ σ ) = exp n H ( P x ) + D ( P x | | P ¯ σ ( P x ) ) .
Here, it is important to notice that D ( P | | P ¯ σ ) is continuous in ( P , P ¯ σ ) and hence, owing to the assumption, D ( P | | P ¯ σ ( Q ) ) is continuous in Q P ( X ) . Thus, expanding D ( P | | P ¯ σ ( P x ) ) in P x around P θ leads to
D P | | P ¯ σ ( P x ) = D ( P | | P ¯ σ ( P θ ) ) + δ θ ( ν ) ( x T θ , ν n ) .
with some δ θ ( ν ) such that δ θ ( ν ) 0 as ν 0 , because x X | P θ ( x ) P x ( x ) | ν for x T θ , ν n .
Then, with P x instead of P in Equation (A39) and in view of Equation (A36) for each x T θ , ν n we have the upper bound:
P X ¯ n ( x ) a ( x ) = exp n H ( P x ) + D ( P x | | P ¯ σ ( P x ) ) = exp n H ( P x ) + D ( P x | | P ¯ σ ( P θ ) ) δ θ ( ν ) = P ¯ σ ( P θ ) n ( x ) exp [ n δ θ ( ν ) ] ,
from which it follows that for each x T θ , ν n
1 n log 1 P X ¯ n ( x ) 1 n log 1 P ¯ σ ( P θ ) n ( x ) δ θ ( ν ) .
Therefore, the proof of Equation (45) has been completed.
(2)
Proof of Equation (46):
To prove Equation (46), we show the lower bound of P X ¯ n ( x ) . For any P P ( X ) and any small constant τ > 0 , set
S τ ( P ) : = σ Σ D ( P | | P ¯ σ ) < D ( P | | P ¯ σ ( P ) ) + τ ,
then, by the definition of v-ess.inf,
c τ ( P ) : = S τ ( P ) d v ( σ ) > 0
holds. Our claim is that for any θ Θ and sufficiently small τ > 0 and with some positive constant c θ > 0
inf x T θ , ν n c τ ( P x ) c θ .
To see this, consider a sequence { τ i } i = 1 such that 0 < τ 1 < τ 2 < τ . Then, there exists a positive integer m such that c τ m ( P θ ) > 0 . Otherwise, the continuity of probability measure implies that
0 = lim i c τ i ( P θ ) = c τ ( P θ ) > 0 ,
which is a contradiction. On the other hand, in view of Equation (A42), σ S τ m ( P θ ) is equivalent to
D ( P θ | | P ¯ σ ) < D ( P θ | | P ¯ σ ( P θ ) ) + τ m ,
D ( P x | | P ¯ σ ) < D ( P x | | P ¯ σ ( P x ) ) + τ m + γ ( ν ) ( x T θ , ν n ) ,
where Equation (A47) follows from Equation (A46) by expanding P θ around P x with some γ ( ν ) > 0 such that γ ( ν ) 0 as ν 0 . Therefore, all σ S τ m ( P θ ) satisfy Equation (A47). Now we can take ν > 0 so that τ m + γ ( ν ) < τ to have
D ( P x | | P ¯ σ ) < D ( P x | | P ¯ σ ( P x ) ) + τ ( x T θ , ν n ) .
Therefore, S τ m ( P θ ) S τ ( P x ) . Hence, we have
0 < c τ m ( P θ ) c τ ( P x ) ( x T θ , ν n ) .
This is nothing but Equation (A44).
Thus, again for x T θ , ν n , we have the lower bound
P X ¯ n ( x ) = Σ P ¯ σ n ( x ) d v ( σ ) S τ ( P x ) P ¯ σ n ( x ) d v ( σ ) = S τ ( P x ) exp n H ( P x ) + D ( P x | | P ¯ σ ) d v ( σ ) S τ ( P x ) exp n H ( P x ) + D ( P x | | P ¯ σ ( P x ) ) + τ d v ( σ ) = c τ ( P x ) exp n H ( P x ) + D ( P x | | P ¯ σ ( P θ ) ) + τ δ θ ( ν ) c τ m ( P θ ) P ¯ σ ( P θ ) n ( x ) exp n δ θ ( ν ) τ ,
where in the second last equality and in the last inequality we have used the continuity of D P x | | P ¯ σ ( P x ) in P x around P θ and Equation (A49), respectively. From Equation (A50), we obtain
1 n log 1 P X ¯ n ( x ) 1 n log 1 P ¯ σ ( P θ ) n ( x ) + 1 n log 1 c τ m ( P θ ) + ( τ δ θ ( ν ) ) ,
for each x T θ , ν n , which completes the proof of Equation (46). □

Appendix F. Proof of Theorem 6

First, we prove the inequality:
B 0 ( X | | X ¯ ) min 1 i K , 1 j L B 0 ( X i | | X ¯ j ) .
To do so, we arbitrarily fix R i j for 1 i K , 1 j L so that
R i j < B 0 ( X i | | X ¯ j ) .
Then, by the definition of B 0 ( X i | | X ¯ j ) , there exists an acceptance region A n ( i , j ) satisfying
lim n μ n ( i , j ) = 0 ,
lim inf n 1 n log 1 λ n ( i , j ) R i j ,
where μ n ( i , j ) and λ n ( i , j ) are defined respectively as
μ n ( i , j ) : = Pr X i n A n ( i , j ) , λ n ( i , j ) : = Pr X ¯ j n A n ( i , j ) .
By using these regions, we define the acceptance region A n as
A n : = i = 1 K j = 1 L A n ( i , j ) .
Then, we have
μ n = Pr X n A n = i = 1 K α i Pr X i n i = 1 K j = 1 L A n ( i , j ) i = 1 K α i Pr X i n j = 1 L A n ( i , j ) i = 1 K j = 1 L α i Pr X i n A n ( i , j ) = i = 1 K j = 1 L α i μ n ( i , j ) ,
from which, together with Equation (A54), we obtain
lim n μ n = 0 .
Similarly, we have
λ n = Pr X ¯ n A n j = 1 L i = 1 K β j λ n ( i , j ) ,
from which, together with Equation (A55), we obtain for any small γ > 0
lim inf n 1 n log 1 λ n min 1 i K , 1 j L R i j γ .
Since R i j are arbitrary as far as Equation (A53) is satisfied, we have Equation (A52).
Next, we prove the inequality:
B 0 ( X | | X ¯ ) min 1 i K , 1 i L B 0 ( X i | | X ¯ j ) .
To do so, let R be 0-achievable, then there exists an acceptance region A n satisfying
lim n μ n = 0 ,
lim inf n 1 n log 1 λ n R .
We fix such an A n and consider the hypothesis testing with null hypothesis X i and alternative hypothesis X ¯ j for arbitrarily fixed i and j. Then, probabilities of type I error and type II error are given by
μ n ( i , j ) = Pr X i n A n ,
λ n ( i , j ) = Pr X ¯ j n A n .
Since
μ n = i = 1 K α i Pr X i n A n = i = 1 K α i μ n ( i , j ) ,
we have
μ n ( i , j ) μ n α i .
From this inequality and Equation (A63) we obtain
lim n μ n ( i , j ) = 0 .
Similar to the derivation of Equation (A68), we have
λ n ( i , j ) λ n β j .
Hence, from Equation (A64) we obtain
R lim inf n 1 n log 1 λ n lim inf n 1 n log 1 λ n ( i , j ) + lim sup n 1 n log 1 β j = lim inf n 1 n log 1 λ n ( i , j ) .
From Equations (A69) and (A71), it follows that R is 0-achievable for the hypothesis testing with X i against X ¯ j . Noting that i , j are arbitrary with 1 i K and 1 j L , we obtain
R min 1 i K , 1 i L B 0 ( X i | | X ¯ j ) .
This means that Equation (A62) holds, completing the proof of Theorem 6. □

Appendix G. Proof of Theorem 8

It suffices to show two inequalities:
B ( { X i } i = 1 K | | { X ¯ j } j = 1 L ) B ( { α i , X i } i = 1 K | | { β j , X ¯ j } j = 1 L ) ,
B ( { X i } i = 1 K | | { X ¯ j } j = 1 L ) B ( { α i , X i } i = 1 K | | { β j , X ¯ j } j = 1 L ) .
  • Proof of Equation (A73):
Suppose that R is 0-achievable for the compound hypothesis testing, that is, there exists an acceptance region A n such that
lim n μ n ( i ) = 0 ( i = 1 , 2 , , K ) ,
lim inf n 1 n log 1 λ n ( j ) R ( j = 1 , 2 , , L ) .
Then, the type I error probability μ n for the hypothesis testing with mixed general sources is evaluated as follows. By the definition of μ n and Equation (72), we have
μ n = Pr X n A n = i = 1 K α i Pr X i n A n = i = 1 K α i μ n ( i ) ,
from which, together with Equation (A75), we obtain
lim n μ n = 0 .
Similarly, we have
λ n = Pr X ¯ n A n = j = 1 L β j Pr X ¯ j n A n = j = 1 L β j λ n ( j ) .
On the other hand, Equation (A76) implies
λ n ( j ) e n ( R γ ) ( n n 0 ) ,
holds for any γ > 0 and all j = 1 , 2 , , L . Substituting this inequality into Equation (A79) yields
lim inf n 1 n log 1 λ n R γ .
Since γ > 0 is arbitrary, from Equations (A78) and (A81) we conclude that Equation (A73) holds.
  • Proof of Equation (A74):
Suppose that R is 0-achievable for the mixed hypothesis testing, that is, there exists an acceptance region A n such that
lim n μ n = 0 ,
lim inf n 1 n log 1 λ n R .
We fix such an A n and set
μ n ( i ) = Pr X i n A n ,
λ n ( j ) = Pr X ¯ j n A n .
Then, from Equation (72) we have
μ n = i = 1 K α i Pr X i n A n = i = 1 K α i μ n ( i ) ,
from which, it follows that
μ n ( i ) μ n α i
for all i = 1 , 2 , , K . From this inequality and Equation (A82), we obtain
lim n μ n ( i ) = 0 ,
for all i = 1 , 2 , , K . Similarly,
λ n = j = 1 L β j Pr X ¯ j n A n = j = 1 L β j λ n ( j ) ,
so that we have for j = 1 , 2 , , L ,
λ n ( j ) λ n β j ,
which means that
1 n log 1 λ n ( j ) 1 n log β j λ n = 1 n log 1 λ n 1 n log 1 β j .
Noting that β j ( j = 1 , 2 , , L ) are constants, from Equation (A83) we obtain
lim inf n 1 n log 1 λ n ( j ) R ,
for all j = 1 , 2 , , L . From Equations (A88) and (A92), we conclude that Equation (A74) holds. □

References

  1. Han, T.S. Information-Spectrum Methods in Information Theory; Springer: New York, NY, USA, 2003. [Google Scholar]
  2. Dembo, A.; Zeitouni, O. Large Deviations Techniques and Applications; Jones and Bartlett Publishers: Boston, MA, USA, 1993. [Google Scholar]
  3. Chen, P.N. General formulas for the Neyman-Pearson type-II error exponent subject to fixed and exponential type-I error bound. IEEE Trans. Inf. Theory 1996, 42, 316–323. [Google Scholar] [CrossRef]
  4. Strassen, V. Asymptotische Abshätzungen in Shannon’s Informations Theorie. In Proceedings of the Transactions of the 3rd Prague Conference on Information Theory, Prague, Czech Republic, 5–13 June 1962; pp. 687–723. [Google Scholar]
  5. Hayashi, M. Information Spectrum Approach to Second-Order Coding Rate in Channel Coding. IEEE Trans. Inf. Theory 2009, 55, 4947–4966. [Google Scholar] [CrossRef]
  6. Polyanskiy, Y.; Poor, H.; Verdú, S. Channel Coding Rate in the Finite Blocklength Regime. IEEE Trans. Inf. Theory 2010, 56, 2307–2359. [Google Scholar] [CrossRef]
  7. Nomura, R.; Han, T.S. Second-Order Resolvability, Intrinsic Randomness, and Fixed-Length Source Coding for Mixed Sources: Information Spectrum Approach. IEEE Trans. Inf. Theory 2013, 59, 1–16. [Google Scholar] [CrossRef]
  8. Tan, V.; Kosut, O. On the Dispersion of Three Network Information Theory Problems. IEEE Trans. Inf. Theory 2014, 60, 881–903. [Google Scholar] [CrossRef]
  9. Watanabe, S. Second-Order Region for Gray–Wyner Network. IEEE Trans. Inf. Theory 2017, 63, 1006–1018. [Google Scholar] [CrossRef]
  10. Polyanskiy, Y.; Poor, H.V.; Verdú, S. Dispersion of the Gilbert-Elliott Channel. IEEE Trans. Inf. Theory 2011, 57, 1829–1848. [Google Scholar] [CrossRef]
  11. Nomura, R.; Han, T.S. Second-Order Slepian-Wolf Coding Theorems for Non-Mixed and Mixed Sources. IEEE Trans. Inf. Theory 2014, 60, 5553–5572. [Google Scholar] [CrossRef]
  12. Yagi, H.; Han, T.S.; Nomura, R. First- and Second-Order Coding Theorems for Mixed Memoryless Channels with General Mixture. IEEE Trans. Inf. Theory 2016, 62, 4395–4412. [Google Scholar] [CrossRef]

Share and Cite

MDPI and ACS Style

Han, T.S.; Nomura, R. First- and Second-Order Hypothesis Testing for Mixed Memoryless Sources. Entropy 2018, 20, 174. https://doi.org/10.3390/e20030174

AMA Style

Han TS, Nomura R. First- and Second-Order Hypothesis Testing for Mixed Memoryless Sources. Entropy. 2018; 20(3):174. https://doi.org/10.3390/e20030174

Chicago/Turabian Style

Han, Te Sun, and Ryo Nomura. 2018. "First- and Second-Order Hypothesis Testing for Mixed Memoryless Sources" Entropy 20, no. 3: 174. https://doi.org/10.3390/e20030174

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop