Next Article in Journal
Unification of Mind and Matter through Hierarchical Extension of Cognition: A New Framework for Adaptation of Living Systems
Previous Article in Journal
The Entropy and Energy for Non-Mechanical Work at the Bose–Einstein Transition of a Harmonically Trapped Gas Using an Empirical Global-Variable Method
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Average Entropy of Gaussian Mixtures

Department of Computer Science and Mathematics, Eindhoven University of Technology, 5612 AZ Eindhoven, The Netherlands
*
Author to whom correspondence should be addressed.
Entropy 2024, 26(8), 659; https://doi.org/10.3390/e26080659
Submission received: 17 April 2024 / Revised: 26 June 2024 / Accepted: 20 July 2024 / Published: 1 August 2024
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

:
We calculate the average differential entropy of a q-component Gaussian mixture in R n . For simplicity, all components have covariance matrix σ 2 1 , while the means { W i } i = 1 q are i.i.d. Gaussian vectors with zero mean and covariance s 2 1 . We obtain a series expansion in μ = s 2 / σ 2 for the average differential entropy up to order O ( μ 2 ) , and we provide a recipe to calculate higher-order terms. Our result provides an analytic approximation with a quantifiable order of magnitude for the error, which is not achieved in previous literature.

1. Introduction

1.1. Gaussian Mixtures

A Gaussian mixture probability density on R n is a function of the following form:
f ( x ) = i = 1 q p i g w i , K i ( x ) ,
where q is some integer, p i are probabilities, and g w , K stands for the Gaussian distribution g w , K ( x ) = ( 2 π ) n 2 ( det K ) 1 2 exp [ 1 2 ( x w ) T K 1 ( x w ) ] , with x , w R n . In words, the density f ( x ) is built as a weighted average of q different n-dimensional Gaussian distributions. A mixture like this occurs when one stochastic process, with probability mass function p 1 , , p q , determines which distribution is chosen for x. Gaussian mixtures are widely used in various areas for their simplicity, as well as their wide range of applicability by virtue of being global (smooth) function approximators. Most recently, in astrophysics, Gaussian mixture models were used to study the kinematics of dark matter [1], as well as predicting the spectrum of quasars [2]. Furthermore, Gaussian mixtures are also widely used in the intersection between statistical physics and machine learning, i.e., diffusion models [3,4]. Diffusion models became popular for their wide range of applications, including generative tasks and image restoration, for example, see [5,6] for Gaussian mixture diffusion models. In wireless communications, Gaussian mixtures were recently utilized to estimate the channel trajectories of moving mobile terminals [7], while in cybernetic security, they played a role in defending against the Byzantine attack [8], and earlier for authentication in wireless networks [9]. In bioinformatics, they are used as an alternative to clustering algorithms for analyzing gene expression data [10,11]. They also find application in thermodynamics, e.g., a computationally efficient Monte Carlo method was devised in [12] to study the properties of nonadiabatic systems. In [13], many of the above-mentioned disciplines are combined, where the authors evaluate probabilities in deep generative models.

1.2. Related Work

Analytically computing or estimating the differential entropy of a Gaussian mixture is a difficult problem. The differential entropy of a continuous variable X X , Xf is defined as h ( X ) = X f ( x ) ln f ( x ) d x [14]. In the special case of a single Gaussian component ( q = 1 ), the integral can be computed analytically, yielding n 2 ln 2 π e + 1 2 ln det K . In [15], the authors study the case of q = 2 , n = 1 , and w 1 = w 2 , where a numerical approximation for this case is obtained. It is a known fact that given the second moment of a random variable, the distribution with maximum entropy is the Gaussian distribution [14]. That is, we can easily obtain a loose upper bound for the differential entropy of a Gaussian mixture given by [16]:
h ( X ) n 2 ln 2 π e + 1 2 ln det ( Σ ) ,
where Σ is given by
Σ = i = 1 q p i ( w i w i T + K i ) i = 1 q p i w i j = 1 q p j w j T .
A sequence of maximum entropy upper bounds for the differential entropy was obtained in [17], most notably the Laplacian upper bound, which is tighter than the Gaussian upper abound in some cases. However, it has no closed-form expression. Another way of approximating the differential entropy is by replacing the density inside the logarithm by a single Gaussian f ¯ with covariance and mean identical to the mixture. This leads to an exact expression in terms of the relative entropy,
h ( X ) = n 2 ln 2 π e + 1 2 ln det ( Σ ) D ( f | | f ¯ ) .
One can then find approximations to the relative entropy, as in [18,19]. Although this method is efficient, it also does not a have closed-form expression and only provides an upper bound to h ( X ) . Other Monte Carlo sampling methods give guaranteed convergence. However, they become computationally demanding as the number of samples grows. An approximation for h ( X ) was obtained in [20] by performing a Taylor series expansion of the logarithm term. In order to avoid the need to include higher-order terms, a splitting of Gaussian components belonging to the density outside the logarithm is applied. The idea is to split components with high variance and replace them with Gaussian mixtures with components of lower variance. This is because the higher the variance of the components the higher the number of terms needed in the Taylor expansion to achieve a certain level of accuracy. Of course, this process is not exact and produces an error depending on how many Gaussian components are included in the split. The splitting method is not directly comparable with our work, as we do not make a distinction between variances of the different components in the mixture; instead, our expansion parameter is how spread the components of the mixture are relative to their shared variance value. However, one can obtain a basic upper bound on h ( X ) , as shown in [20]:
h ( X ) i = 1 q p i ln p i + 1 2 ln ( 2 π e ) n det ( K i ) .
As mentioned in [20], the upper bound (5) is significantly tighter than the Gaussian bound (2); it is exact in the case of a single Gaussian; and it is arbitrarily close to the real value of h ( X ) when the support shared between the components of the mixture is negligible. A refinement of this bound was also introduced by means of merging different clusters in the mixture. However, we are concerned with the average over all possible mixtures, and therefore, this family of bounds is less comparable with our result. In [21], the differential entropy is estimated for mixture distributions using pair-wise distances between the components of the mixture. It is shown that the estimator is a lower bound when the Chernoff α -divergence is used, and an upper bound when the Kullback–Leibler divergence is used. From the results of [21], it is difficult to obtain tight bounds on the average entropy, which is the focus of our paper.

1.3. Contributions

We develop an analytical estimation method for the average Gaussian mixture entropy problem in the special case of equal weights 1 q and equal covariance matrix σ 2 1 . Our method postulates that the displacements w i themselves are stochastic, i.i.d. with Gaussian distribution g 0 , s 2 1 . We compute h ( X ) averaged over the displacements w i , resulting in the conditional entropy h ( X | W ) . We mention that the quantity h ( X | W ) , for exactly this setting with equal weights, spherical covariance, and Gaussian displacements, plays a role in the detectability of digital watermarks (see Section 2.2).
We work in the regime s < σ . Our method uses the fraction μ = s 2 σ 2 as the small parameter for a power expansion. We show results up to and including order O ( μ 2 ) . Our result is novel since most available estimators of h ( X ) in the literature have no closed-form expressions, and usually rely on upper bounds rather than an explicit calculation of h ( X ) . In [20], although a Taylor series approximation is employed, its accuracy is dependent on the splitting mentioned in Section 1.2. Furthermore, it is difficult to estimate the order of magnitude for the error in a general expansion around the means. In contrast, our set-up does not need the splitting method, and for our result, it is possible to quantify the order of magnitude for the error since our power series is in μ .

2. Deriving the Series Expansion

2.1. Notation

Boldface lowercase letters denote n-dimensional vectors (e.g., a ); the inner product between two n-dimensional vectors a and b is denoted by a · b ; A hat denotes an array in q dimensions, i.e., indexed with the number of the Gaussian components; for lowercase letters, ( · ^ ) denotes a q-dimensional vector (e.g., a ^ ); for uppercase letters, a q × q matrix (e.g., A ^ ). a ^ T b ^ denotes the q-dimensional inner product. ⊗ denotes the Kronecker product.

2.2. Starting Point

Consider a random variable X R n whose probability density function (pdf) is a Gaussian mixture with equal weights. The Gaussian pdfs all have covariance matrix σ 2 1 , and they are centered on points w 1 , , w q R n . We write w ^ = ( w 1 , , w q ) .
f X | W ^ ( x | w ^ ) = 1 q j = 1 q g w j , σ 2 ( x ) = ( 2 π σ 2 ) n 2 1 q j = 1 q exp ( x w j ) 2 2 σ 2 .
We note that a simplification is possible when n > q , i.e., the dimension of the space is higher than the number of components in the mixture. For all configurations w ^ , it is possible to find a rotation such that after rotation w j = α = 1 q w j α e α , where the e α are the basis vectors in R n . In other words, for all w j , the vector components in the dimensions beyond q vanish. The density (6) simplifies to
f X | W ^ ( x | w ^ ) exp 1 2 σ 2 α = q + 1 n x α 2 ( 2 π σ 2 ) n 2 1 q j = 1 q exp α = 1 q ( x α w j α ) 2 2 σ 2 .
The first exponent is trivially dealt with when the ln f X | W ^ ( x | w ^ ) is integrated, yielding a constant contribution n q 2 to the entropy. Hence, we only focus on the case n q in our calculations. In general, analytically approximating the differential entropy h ( X | W ^ = w ^ ) for a given w ^ is a difficult problem. In this paper, we study a problem that is slightly easier: finding analytic approximations for h ( X | W ^ ) = E w ^ h ( X | W ^ = w ^ ) in the case where the offsets w j are i.i.d. Gaussian-distributed with zero mean and covariance matrix s 2 1 .
f W ^ ( w ^ ) = j = 1 q g 0 , s 2 1 ( w j ) = ( 2 π s 2 ) n q 2 exp j = 1 q w j 2 2 s 2 .
We note that the quantity h ( X | W ^ ) , under precisely these conditions, occurs in the field of spread-spectrum watermarking [22,23]. In this setting, the W ^ is a table of q random watermarking sequences. A row index K { 1 , , q } is chosen at random, and the K’th row of W ^ is additively inserted into some data D , which are modeled as Gaussian, and then attackers apply noise in order to wipe out the watermark. The X plays the role of the watermarked data after this attack. A watermark detector tries to determine the index K based on X and W ^ (and optionally the unwatermarked original data D ). In the analysis of the detection efficiency, an important figure of merit is the mutual information I ( K ; X W ^ ) or I ( K ; D X W ^ ) , both of which after some rewriting involve precisely the conditional entropy h ( X | W ^ ) .
We take s 2 < σ 2 and develop a power series in the parameter μ = s 2 / σ 2 . In the following sections, we obtain our results by following these steps:
  • We first obtain a more compact form of h ( X | W ^ ) in Corollary 1 via a change in variables in order to write h ( X | W ^ ) in terms of the more familiar expectation with regard to a multivariate Gaussian density.
  • We next perform another change or variables to obtain a diagonal form in the expectation with regard to the multivariate Gaussian. This is important since our new variables will be independent, and many expressions will be simplified if our variables are not correlated.
  • We evaluate the leading terms to h ( X | W ^ ) , as shown in Theorem 2, where we obtain an expression that separates the leading contributions to h ( X | W ^ ) and the quantity S to be defined. The reason this is important is that the leading contributions contain terms that will make the Taylor series diverge, and therefore, we evaluate them analytically before performing the Taylor series expansion. We also include other terms that do not cause any problems in the limit of small μ for convenience. The remaining expression S is then safe to expand for small μ .
  • We perform a third change in variables to simplify S, and we obtain a Taylor series for h ( X | W ^ ) up to order O ( μ 2 ) in Theorem 3 via a brute-force approach. The change in variables is necessary for making the analytical expression tractable.
  • Finally, we provide a determinant-based approach to evaluate the power series for S, and we obtain a result for h ( X | W ^ ) up to order O ( μ ) .

2.3. First Change in Variables

We observe that inside the logarithm, the variables w j occur only in the combination w j x . This allows us to introduce shifted variables z j = w j x and then analytically carry out the integration over x .
Lemma 1. 
The differential entropy h ( X | W ^ ) is given by
h ( X | W ^ ) = n 2 ln 2 π σ 2 μ q n 2 ( 2 π ) q n 2 1 q j = 1 q d z 1 d z q e 1 2 a z a 2 + 1 2 q ( a z a ) 2 μ 2 z j 2 ln 1 q k = 1 q e μ 2 z k 2 .
Proof. 
See Appendix A. □
Next, we can get rid of the j summation, using permutation symmetry.
Lemma 2. 
Let F ( z 1 , , z q ) be any permutation invariant function; then, we have
exp 1 2 j z j 2 + 1 2 q i j z i · z j 1 q k exp μ 2 z k 2 F ( z 1 , , z q ) d z 1 d z q = exp 1 2 z ^ T M ^ ( q ) z ^ F ( z 1 , , z q ) d z 1 d z q ,
where the matrix elements { M i j ( k ) } k = 1 q are given by
M i j ( k ) = δ i j 1 q + μ δ i k δ j k .
Proof. 
See Appendix B. □
We now rewrite the differential entropy more compactly in the following corollary.
Corollary 1. 
Let the function F be given by
F ( z 1 , , z q ) = ln q 1 l = 1 q exp μ 2 z l · z l ,
then, the differential entropy h ( X | W ^ ) can be written as
h ( X | W ^ ) = n 2 ln 2 π σ 2 μ q n 2 ( 2 π ) q n 2 exp 1 2 z ^ T M ^ ( q ) z ^ F ( z 1 , , z q ) d z 1 d z q .
Proof. 
It follows directly from Lemmas 1 and 2, where we note that Equation (12) is permutation invariant. □

2.4. Diagonalization of M ^ ( q ) and Second Change in Variables

We would like to obtain an expression for the expectation term in Equation (13); however, we first need to switch to a diagonal form that removes correlation between the integration variables, i.e., we would like to find the eigenvectors and eigenvalues of the matrix M ^ ( q ) . We first notice that adding the identity matrix, while it shifts the eigenvalues by 1, does not change the eigenvectors of a matrix. Hence, we diagonalize the matrix C ^ ( q ) , as shown in Theorem 1, and our result follows directly in Corollary 2.
Theorem 1. 
Let J ^ be the all-one matrix. Let e ^ ( k ) denote the k-th standard basis vector. Then, the matrix C ^ ( q ) given by
C ^ ( q ) = q 1 J ^ + μ diag ( e ^ ( q ) ) = 1 q 1 1 1 1 1 1 1 1 1 q μ ,
has an orthogonal matrix Λ ^ ( q ) of eigenvectors given by
Λ ^ ( q ) = ( α ^ 1 , α ^ 2 , , α ^ q 2 , β ^ 1 , β ^ 2 ) ,
where { α ^ i } i = 1 q 2 are eigenvectors with eigenvalue 0, whereas β ^ 1 and β ^ 2 have eigenvalues λ 1 and λ 2 , respectively, and read
λ j = 1 2 ( μ 1 ) + ( 1 ) j + 1 μ 2 ( 4 q 1 2 ) μ + 1 , j { 1 , 2 } .
Furthermore, the elements of the eigenvectors are given by
α i , k = 1 i ( i + 1 ) 0 , k > i + 1 1 , k < i + 1 i , k = i + 1 ,
β i , k = 1 q 1 + [ 1 q ( λ i + 1 ) ] 2 1 , k q 1 q ( λ i + 1 ) , k = q .
Proof. 
See Appendix C. □
Note that in matrix notation, we can write
M ^ ( q ) = I ^ q 1 J ^ + μ diag ( e ^ ( q ) ) = I ^ + C ^ ( q ) ,
where I ^ denotes the identity matrix, and we have the following corollary:
Corollary 2. 
The matrix M ^ ( q ) given by Equation (19) is diagonalizable by the matrix of eigenvectors Λ ^ ( q ) given in Theorem 1, and we have
Λ ^ ( q ) T M ^ ( q ) Λ ^ ( q ) = 1 0 0 0 0 1 0 0 0 0 m 1 0 0 0 0 m 2 D ^ ( q ) ,
where m 1 and m 2 are defined as
m j λ j + 1 , j { 1 , 2 } .
Proof. 
Follows from Theorem 1. □
Lemma 3. 
Let z ^ = Λ ^ ( q ) u ^ , then the function F given in Corollary 1 can be written as
F ( Λ ^ ( q ) u ^ ) = μ 2 1 | | x ^ 1 | | 2 u q 1 · u q 1 + 2 | | x ^ 1 | | | | x ^ 2 | | u q 1 · u q + 1 | | x ^ 2 | | 2 u q · u q + ln q 1 l exp μ 2 G l + 2 δ q , l T ,
where we have
x ^ j = ( 1 , , 1 , x j ) T , j { 1 , 2 } ,
x j = 1 q m j , j { 1 , 2 } ,
G l = m = 1 q 2 m = 1 q 2 α m , l α m , l u m · u m + 2 q 1 + x 1 2 m = 1 q 2 α m , l u m · u q 1 + 2 q 1 + x 2 2 m = 1 q 2 α m , l u m · u q ,
T = x 2 2 1 2 ( q 1 + x 2 2 ) u q · u q + x 1 2 1 2 ( q 1 + x 1 2 ) u q 1 · u q 1 + x 1 x 2 1 q 1 + x 1 2 q 1 + x 2 2 u q · u q 1 .
Proof. 
See Appendix D. □

2.5. Pulling Leading-Order Terms Out of the Integral

We now use Corollary 1 and Lemma 3 to further obtain an expression for the differential entropy that separates the leading contributions and the quantity S, which we define as follows:
S E u ln q 1 l exp μ 2 G l + 2 δ q , l T ,
where E u is the expectation taken with respect to the density i = 1 n k = 1 q g 0 , d k 1 ( u k , i ) , and { d k } k = 1 q are given by:
d k = 1 k q 2 m 1 k = q 1 m 2 k = q .
Theorem 2. 
Let h σ = ln σ 2 π e be the differential entropy of a Gaussian distributed random variable with variance σ. The differential entropy h ( X | W ^ ) is given by
h ( X | W ^ ) = n h σ + n 2 q q 1 μ + S ,
where S is given by Equation (27).
Proof. 
See Appendix E. □

2.6. Third Change in Variables: Simplification

We would like to approximate S given in Equation (27). However, our current integration variables { u k } k = 1 q do not give simple formulas for G l , T, or S. By linearly mixing u q 1 and u q , we obtain a set of q-independent standard normal-distributed variables with the benefit that G l contains only one of the two newly introduced variables. We define the transformation matrix W as
W 1 1 μ + q q 1 1 / m 2 q 1 + x 2 2 1 / m 1 q 1 + x 1 2 1 / m 1 q 1 + x 1 2 1 / m 2 q 1 + x 2 2 ,
where note that we can use Equation (A21) to show that det ( W ) = 1 .
Lemma 4. 
Let r = m 2 u q , s = m 1 u q 1 , and W be as given in Equation (30), then the variables r ˜ and s ˜ given by
r s = W r ˜ s ˜
have independent standard normal distributions, i.e., we can write
E u [ · ] = E u , s ˜ , r ˜ [ · ˜ ] ,
where [ · ˜ ] denotes the transformed expression, and E u , s ˜ , r ˜ [ · ˜ ] is taken with regard to the density a = 1 n g 0 , 1 ( r ˜ a ) g 0 , 1 ( s ˜ a ) b = 1 q 2 g 0 , 1 ( u b , a ) .
Proof. 
See Appendix F. □
As a corollary, T and G l assume a simpler form in terms of our new variables.
Corollary 3. 
Let w l be given by
w l = m = 1 q 2 α m , l u m ,
then after applying the change in variables in Lemma 4, T and G l can be written as
G l = w l · w l + 2 1 μ + q q 1 w l · r ˜ ,
T = 1 2 1 + μ q q 1 1 q q 1 2 + μ q q 1 r ˜ · r ˜ + q q 1 s ˜ · s ˜ + 2 μ q q 1 r ˜ · s ˜ .
Proof. 
It follows by direct substitution. □
In this new form (34) and (35), the powers of μ are explicitly visible. The variables w l , r ˜ , s ˜ do not generate any μ -dependence when integrated. It is clear that μ G l and μ T are of order O ( μ ) ; hence, we can formally proceed with a Taylor expansion of the exp function in (27).

3. Brute Force Expansion

We develop a series expansion in μ as follows. First, we expand the exp function in (27), with G l and T as defined in (34) and (35). This yields a complicated series, whose leading order behaviour is 1 + O ( μ ) . Then, we substitute that series into the Taylor expansion of ln ( 1 + ε ) in order to evaluate S as a series expansion. Finally, we apply the expectation E u , s ˜ , r ˜ , which is possible because we only obtain expectations of powers of u , r ˜ , s ˜ , which are independent normal-distributed variables. Although all steps by themselves are straightforward enough, the result is a huge amount of bookkeeping. A number of things are worth noting:
  • What helps in this exercise is that odd powers of w l , r ˜ , and s ˜ lead to vanishing integrals.
  • Furthermore, it is clear from the start that the odd powers of μ will disappear in the end. This can be seen from the fact that a factor 1 μ in T and G l always occurs with an isolated r ˜ ; hence, any occurrence of an odd power of μ comes with an odd power of r ˜ .
  • Due to the n-dimensional inner products that occur in G l and T, which consist of n independent terms, each power of μ is associated with a factor n. Consequently, the power series becomes a series not in μ but actually in n μ . For convergence, the product n μ needs to be sufficiently small. Fortunately, we are allowed to work under the condition n q , as explained in Section 2.2.
We performed this brute-force exercise up to order O ( μ 2 ) .
Theorem 3. 
The differential entropy h ( X | W ^ ) up to order O ( μ 2 ) is given by
h ( X | W ^ ) = n h σ + n 2 1 1 q μ n 2 1 1 q n q 1 + 1 2 q μ 2 + O μ 3 .
Proof. 
See Appendix I. □
Theorem 3 is not directly comparable with results in the literature. However, we can make use of existing upper bounds in order to assess our expression. The Gaussian upper bound in Equation (2) is known to be loose, while the bound in (5) is closer to the value of the differential entropy. For our case, the bound in Equation (5) reads
h ( X | W ^ ) n h σ + ln q .
It is easy to see that for μ q 1 and n = q , our second-order Taylor expansion in Equation (36) satisfies this bound up to the error produced by the truncation of the series, which is of order o ( μ 2 ) .

4. Determinant-Based Approach

We introduce a less brute-force series expansion that does not expand the exp function but focuses on expanding the logarithm. This gives a power series in exponential expressions; each of these expressions can be integrated analytically, yielding a matrix determinant. For simplicity of notation we write the variables u k , r ˜ , s ˜ together as a q-component vector a ^ ,
a ^ = ( u 1 , , u q 2 , r ˜ , s ˜ ) .
The S is now written as follows:
S = E a ln q 1 l = 1 q exp μ 2 G l + 2 δ q , l T ,
where we have
E a = 2 π n q exp 1 2 a ^ T I ^ a ^ d a 1 d a q .
Next, we can write G l and T in vector-matrix-vector product form:
μ G l = a ^ T Q ^ l a ^ , 2 μ T = a ^ T P ^ a ^ ,
where the matrix elements of Q ^ l and P ^ are given by
Q l , m m = μ ω m , l ω m , l + μ 1 + μ q q 1 [ ω m , l δ m , q 1 + ω m , l δ m , q 1 ] ,
where we define
ω m , l α m , l , m q 2 0 , otherwise .
P m , m = 1 + μ q q 1 1 μ q q 1 , m = m = q μ q q 1 2 + μ q q 1 , m = m = q 1 μ q q 1 , m = q , m = q 1 μ q q 1 , m = q 1 , m = q 0 , otherwise .
The P ^ is essentially a 2 × 2 matrix. The Q ^ consists of the projector α ^ α ^ T plus extra entries in row and column q 1 . Let Z be defined as
Z = 1 q 1 l = 1 q exp μ 2 G l + 2 δ q , l T = 1 q 1 l = 1 q 1 exp 1 2 a ^ T Q ^ l a ^ q 1 exp 1 2 a ^ T P ^ a ^ ,
where equality follows from the definitions (41). We note that Z = O ( μ ) . This allows us to perform an expansion in Z, knowing that Z k does not produce powers of μ lower than μ k / 2 . Furthermore, the power μ occurs only in the off-diagonal components of P ^ and Q ^ , whereas the diagonal components are O ( μ ) . As in the brute-force approach, the integration over a ^ eliminates half-integer powers of μ . Hence, E a Z k = O ( μ k / 2 ) . In order to obtain the entropy estimation up to and including power μ t , we need all contributions up to and including E a Z 2 t . Using ln ( 1 Z ) = k = 1 k 1 Z k , one can easily show that S (39) can be written as
S = k = 1 Z k , Z k = 1 k E a Z k .
This provides a recipe for going to arbitrary order in μ . We show the calculation of the μ 1 contribution in S. For this, we need only Z 1 and Z 2 .
Lemma 5. 
For a positive definite matrix V ^ , we have
exp 1 2 u ^ T V ^ u ^ d u 1 d u q = ( 2 π ) n q / 2 det ( V ^ ) n / 2 .
Proof. 
See Appendix G. □
Lemma 6. 
It holds that
Z 1 = 1 = 1 q 1 q 1 det ( I ^ + Q ^ ) n / 2 q 1 det ( I ^ + P ^ ) n / 2 , Z 2 = 1 2 = 1 q 1 q 1 det ( I ^ + Q ^ ) n / 2 q 1 det ( I ^ + P ^ ) n / 2 + = 1 q 1 q 2 det ( I ^ + Q ^ + P ^ ) n / 2
+ = 1 q 1 = 1 q 1 ( 1 / 2 ) q 2 det ( I ^ + Q ^ + Q ^ ) n / 2 + ( 1 / 2 ) q 2 det ( I ^ + 2 P ^ ) n / 2 .
Proof. 
See Appendix H. □
Proposition 1. 
For t N ,
det ( I ^ + t P ^ ) = 1 μ t ( t + 1 ) q q 1 ,
det ( I ^ + t Q ^ ) = 1 μ t ( t 1 ) q 2 q 1 t 2 μ 2 q ( q 2 ) ( q 1 ) 2 ,
det ( I ^ + P ^ + Q ^ ) = 1 μ 2 q q 1 μ 2 4 q ( q 2 ) ( q 1 ) 2 ,
det ( I ^ + Q ^ + Q ^ ) = 1 + μ 2 q 1 μ 2 ( 3 q 1 ) ( q 3 ) ( q 1 ) 2 μ 3 2 q ( q 3 ) ( q 1 ) 2 .
Corollary 4. 
It holds that
Z 1 = n q 1 μ + O ( μ 2 ) , Z 2 = n / 2 q ( q 1 ) μ + O ( μ 2 ) .
Using (29) it follows that h ( X | W ^ ) = n h σ + n 2 q q 1 μ + S , with S Z 1 + Z 2 , yielding h ( X | W ^ ) = n h σ + n 2 ( 1 1 q ) μ + O ( μ 2 ) . This is consistent with the result of the brute-force approach, as in Theorem 3.
In higher orders of Z, determinants occur of the form det ( I ^ + r P ^ + t Q ^ + t Q ^ + t Q ^ + ) for integer r , t , t , t , . These are computable but involve a lot of bookkeeping. Because of the form [ det ( ) ] n / 2 and the fact that each determinant starts with 1, followed by integer powers of μ , the leading order coefficient of μ t is proportional to n / 2 t n t . Hence, as in the brute force approach, the expansion in powers of μ effectively becomes an expansion in powers of n μ .

5. Discussion

Gaussian mixtures occur in the literature either as an approximation to a probability density function or as the actual density of a two-stage stochastic process. Shannon entropy plays an important role in the signal processing, optimization, and analysis of systems. It is important to have (semi-)analytical methods for evaluating the Shannon entropy (differential entropy) of Gaussian mixtures, without lengthy numerical computations in high-dimensional spaces. We have developed an expansion method that has a number of advantages: (i) it yields an entropy estimate rather than merely bounds; (ii) the order of magnitude of the error is known; (iii) it avoids the process of [20] that splits wide Gaussians into narrow pieces, which leads to a large number of expansion terms and introduces inaccuracies. The inaccuracies in [20] grow with the number of wide Gaussians in the mixture. The approach in [20] is unable to exploit the existence of a small parameter μ when the mixture structure allows it. In our case, we make sure to leverage the trade-off between the support shared between the components and the variance of the components, which makes our method more efficient when the mixture is not composed of separated clusters. In the case of negligible shared support between the components, an approximation is not needed as the bound in Equation (37) becomes exact. From the second-order result (36), it seems that convergence is faster than what we expected. In leading order terms, every power of μ is accompanied by a power of n, which may endanger the radius of convergence. However, in the μ 2 part in (36), we see that the additional power of n comes as n / q , which is not dangerous since we are allowed to work with n q . Setting n = q and q 1 , (36) reduces to h ( X | W ^ ) n 2 [ ln 2 π e σ 2 + μ 1 q μ 2 + O ( μ 3 ) ] , which has fast convergence. In future work, we will try to apply our method for fixed displacements w ^ and study the convergence at higher orders of μ .

Author Contributions

Conceptualization, B.Š.; formal analysis, B.J. and B.Š.; writing—original draft preparation, B.J.; writing—review and editing, B.J. and B.Š. All authors have read and agreed to the published version of the manuscript.

Funding

Joudeh was supported by NWO grant CS.001 (Forwardt).

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Proof of Lemma 1

h ( X | W ^ ) = d w 1 d w q f W ^ ( w ^ ) d x f X | W ^ ( x | w ^ ) ln f X | W ^ ( x | w ^ ) = 1 q j = 1 q d x d w 1 d w q a = 1 q g 0 , s 2 ( w a ) g 0 , σ 2 ( w j x ) ln 1 q k = 1 q g 0 , σ 2 ( w k x ) .
All the integrals are from to . Inside the x-integration, we define new integration variables z j by writing w j = x s z j . Furthermore, we write x = s c . Now, we have
h ( X | W ^ ) = s q n + n 1 q j = 1 q d c d z 1 d z q a = 1 q g 0 , s 2 ( s c s z a ) g 0 , σ 2 ( s z j ) ln 1 q k = 1 q g 0 , σ 2 ( s z k ) = s q n + n 1 q j = 1 q d z 1 d z q d c ( 2 π s 2 ) q n 2 e 1 2 a ( c z a ) 2 g 0 , σ 2 ( s z j ) ln 1 q k = 1 q g 0 , σ 2 ( s z k ) = s n 1 q j = 1 q d z 1 d z q ( 2 π ) q n 2 e 1 2 a z a 2 d c e q 2 c 2 + c · a z a g 0 , σ 2 ( s z j ) ln 1 q k = 1 q g 0 , σ 2 ( s z k ) = s n 2 π q n 2 ( 2 π ) q n 2 1 q j = 1 q d z 1 d z q e 1 2 a z a 2 + 1 2 q ( a z a ) 2 g 0 , σ 2 ( s z j ) ln 1 q k = 1 q g 0 , σ 2 ( s z k )
= μ q n 2 ( 2 π ) q n 2 1 q j = 1 q d z 1 d z q e 1 2 a z a 2 + 1 2 q ( a z a ) 2 μ 2 z j 2 ln 1 q k = 1 q exp μ 2 z k 2 ( 2 π σ 2 ) n / 2 = n 2 ln 2 π σ 2 μ q n 2 ( 2 π ) q n 2 1 q j = 1 q d z 1 d z q e 1 2 a z a 2 + 1 2 q ( a z a ) 2 μ 2 z j 2 ln 1 q k = 1 q e μ 2 z k 2 .

Appendix B. Proof of Lemma 2

exp 1 2 j z j 2 + 1 2 q i j z i · z j 1 q k exp μ 2 z k 2 F ( z 1 , , z q ) d z 1 d z q = 1 q k exp 1 2 i j δ i j z i · z j + 1 2 q i j z i · z j i j μ 2 δ i k δ j k z i · z j F ( z 1 , , z q ) d z 1 d z q = 1 q k exp 1 2 i j M i j ( k ) z i · z j F ( z 1 , , z q ) d z 1 d z q = 1 q k exp 1 2 z ^ T M ^ ( k ) z ^ F ( z 1 , , z q ) d z 1 d z q .
Note that we have the following:
z ^ T M ^ ( k ) z ^ = j z j 2 1 q i j z i · z j + μ z k 2 = z ^ T M ^ ( k ) z ^ ,
where z ^ are the following permutation of z ^ :
z i = z i , i k , k z k , i = k z k , i = k ,
therefore, for any F that is permutation invariant, we can write
1 q k exp 1 2 z ^ T M ^ ( k ) z ^ F ( z 1 , , z q ) d z 1 d z q = exp 1 2 z ^ T M ^ ( q ) z ^ F ( z 1 , , z q ) d z 1 d z q .

Appendix C. Proof of Theorem 1

Suppose some eigenvectors of C ^ ( q ) are of the form ( 1 , , 1 , x ) T , where x is unknown. We now wish to solve the eigenvalue equation:
1 q 1 1 1 1 1 1 1 1 1 q μ 1 1 x = λ 1 1 x ,
for some eigenvalue λ . Carrying out the matrix multiplication, we obtain
q 1 + x q 1 + x q 1 + ( 1 q μ ) x = λ q λ q λ q x ,
which give the following two equations:
q 1 + x = λ q x = q ( λ + 1 ) + 1 ,
q 1 + ( 1 q μ ) x = λ q x x = q 1 λ q q μ + 1 ,
and by equating, we obtain
q ( λ + 1 ) 1 = q 1 λ q q μ + 1 q 1 = λ q q μ + 1 q ( λ + 1 ) 1 = q 2 λ ( λ + 1 ) q λ q 2 μ ( λ + 1 ) + q μ + q ( λ + 1 ) 1 = q 2 λ 2 + q 2 λ q λ q 2 μ λ q 2 μ + q μ + q λ + q 1 .
We have the quadratic equation:
q 2 λ 2 + ( 1 μ ) λ + μ ( q 1 1 ) = 0 .
Since q > 0 , we have the solutions:
λ 1 , 2 = 1 2 ( μ 1 ) ± D ,
where D is given by
D ( μ 1 ) 2 4 μ ( q 1 1 ) = μ 2 2 μ + 1 4 μ q 1 + 4 μ = μ 2 ( 4 q 1 2 ) μ + 1 .
From Equation (A9), obtain the two following values for x:
x 1 , 2 = q m 1 , 2 + 1 ,
where we write m 1 = λ 1 + 1 and m 2 = λ 2 + 1 , and we define x ^ 1 and x ^ 2 by
x ^ 1 ( 1 , , x 1 ) T , x ^ 2 ( 1 , , x 2 ) T .
Since x 1 and x 2 are solutions to a quadratic equation, we have the following simplifications:
m 1 m 2 = μ q ,
m 1 + m 2 = μ + 1 ,
x 1 + x 2 = q ( μ + 1 ) + 2 ,
x 1 x 2 = 1 q ,
μ / m 1 q 1 + x 1 2 + μ / m 2 q 1 + x 2 2 = 1 + μ q q 1 ,
( x 1 2 1 ) / m 1 q 1 + x 1 2 + ( x 2 2 1 ) / m 2 q 1 + x 2 2 = q 1 q ,
( x 1 2 1 ) ( x 2 2 1 ) q 2 ( q 1 + x 1 2 ) ( q 1 + x 2 2 ) = 1 / m 2 q 1 + x 2 2 1 / m 1 q 1 + x 1 2 = 1 1 q ,
x 1 / m 1 q 1 + x 1 2 + x 2 / m 2 q 1 + x 2 2 = 1 μ .
Note that x ^ 1 and x ^ 2 are orthogonal, that is, using Equation (A20), we have
x ^ 1 T x ^ 2 = q 1 + x 1 x 2 = 0 .
We define the normalized eigenvectors β ^ 1 and β ^ 2 by
β ^ 1 x ^ 1 | | x ^ 1 | | , β ^ 2 x ^ 2 | | x ^ 2 | | .
We now define the vectors { y ^ i } i = 1 q 2 by the following:
y i , k = 0 , k > i + 1 1 , k < i + 1 i , k = i + 1 ,
which in matrix notation are given by
y ^ 1 = 1 1 0 0 , y ^ 2 = 1 1 2 0 0 , y ^ 3 = 1 1 1 3 0 0 , , y ^ q 2 = 1 1 1 ( q 2 ) 0 .
The length of y ^ i is given by
| | y ^ i | | = i ( i + 1 ) .
One can check that y ^ i is an eigenvector of C ^ ( q ) with eigenvalue 0. The normalized eigenvectors α ^ i are defined by
α ^ i y ^ i | | y ^ i | | .
It can be checked that the set { α ^ i } i = 1 q 2 { β ^ 1 , β ^ 2 } are orthonormal. Therefore, we can form the orthogonal matrix Λ ^ ( q ) with its columns being eigenvectors of C ^ ( q ) , that is,
Λ ^ ( q ) = ( α ^ 1 , α ^ 2 , , α ^ q 2 , β ^ 1 , β ^ 2 ) ,
and we can write
Λ ^ ( q ) T C ^ ( q ) Λ ^ ( q ) = 0 0 0 0 0 0 0 0 0 0 λ 1 0 0 0 0 λ 2 .

Appendix D. Proof of Lemma 3

We perform the following change in variables:
z ^ = Λ ^ ( q ) u ^ = m = 1 q 2 α ^ m u m + β ^ 1 u q 1 + β ^ 2 u q ,
and we can write
z l = m = 1 q 2 α m , l u m + β 1 , l u q 1 + β 2 , l u q .
If we take the dot product, we obtain
z l · z l = m = 1 q 2 m = 1 q 2 α m , l α m , l u m · u m + 2 β 1 , l m = 1 q 2 α m , l u m · u q 1 + 2 β 2 , l m = 1 q 2 α m , l u m · u q + β 1 , l 2 u q 1 · u q 1 + 2 β 1 , l β 2 , l u q 1 · u q + β 2 , l 2 u q · u q .
Note that we can write
β 1 , l = 1 | | x ^ 1 | | 1 + δ q , l x 1 1 , β 2 , l = 1 | | x ^ 2 | | 1 + δ q , l x 2 1 ,
which gives
z l · z l = m = 1 q 2 m = 1 q 2 α m , l α m , l u m · u m + 2 | | x ^ 1 | | m = 1 q 2 α m , l u m · u q 1 + 2 | | x ^ 2 | | m = 1 q 2 α m , l u m · u q + 1 | | x ^ 1 | | 2 u q 1 · u q 1 + 2 | | x ^ 1 | | | | x ^ 2 | | u q 1 · u q + 1 | | x ^ 2 | | 2 u q · u q + 2 δ q , l ( x 2 1 ) | | x ^ 2 | | m = 1 q 2 α m , l u m · u q + u q 1 · u q | | x ^ 1 | | + x 2 + 1 2 | | x ^ 2 | | u q · u q + 2 δ q , l ( x 1 1 ) | | x ^ 1 | | m = 1 q 2 α m , l u m · u q 1 + u q 1 · u q | | x ^ 2 | | + x 1 + 1 2 | | x ^ 1 | | u q 1 · u q 1 + 2 δ q , l ( x 1 1 ) ( x 2 1 ) | | x ^ 1 | | | | x ^ 2 | | u q 1 · u q ,
and note that α m , q = 0 , so we have
z l · z l = m = 1 q 2 m = 1 q 2 α m , l α m , l u m · u m + 2 | | x ^ 1 | | m = 1 q 2 α m , l u m · u q 1 + 2 | | x ^ 2 | | m = 1 q 2 α m , l u m · u q + 1 | | x ^ 1 | | 2 u q 1 · u q 1 + 2 | | x ^ 1 | | | | x ^ 2 | | u q 1 · u q + 1 | | x ^ 2 | | 2 u q · u q + 2 δ q , l ( x 2 1 ) | | x ^ 2 | | u q 1 · u q | | x ^ 1 | | + x 2 + 1 2 | | x ^ 2 | | u q · u q + 2 δ q , l ( x 1 1 ) | | x ^ 1 | | u q 1 · u q | | x ^ 2 | | + x 1 + 1 2 | | x ^ 1 | | u q 1 · u q 1 + 2 δ q , l ( x 1 1 ) ( x 2 1 ) | | x ^ 1 | | | | x ^ 2 | | u q 1 · u q = G l + 2 δ q , l T + 1 | | x ^ 1 | | 2 u q 1 · u q 1 + 2 | | x ^ 1 | | | | x ^ 2 | | u q 1 · u q + 1 | | x ^ 2 | | 2 u q · u q ,
where we define
G l m = 1 q 2 m = 1 q 2 α m , l α m , l u m · u m + 2 | | x ^ 1 | | m = 1 q 2 α m , l u m · u q 1 + 2 | | x ^ 2 | | m = 1 q 2 α m , l u m · u q , T ( x 2 1 ) | | x ^ 2 | | u q 1 · u q | | x ^ 1 | | + x 2 + 1 2 | | x ^ 2 | | u q · u q + ( x 1 1 ) | | x ^ 1 | | u q 1 · u q | | x ^ 2 | | + x 1 + 1 2 | | x ^ 1 | | u q 1 · u q 1
+ ( x 1 1 ) ( x 2 1 ) | | x ^ 1 | | | | x ^ 2 | | u q 1 · u q ,
where note that G l = 0 if l = q , and so we have
G l + 2 δ l , q T = G l , l q 2 T , l = q .
Rearranging Equation (A40) further yields the desired expression for T. Writing F in terms of the new variables, we obtain
F ( z 1 , , z q ) = μ 2 1 | | x ^ 1 | | 2 u q 1 · u q 1 + 2 | | x ^ 1 | | | | x ^ 2 | | u q 1 · u q + 1 | | x ^ 2 | | 2 u q · u q + ln 1 q l exp μ 2 G l + 2 δ q , l T .

Appendix E. Proof of Theorem 2

We have d z 1 d z q = d u 1 d u q since Λ ^ ( q ) is orthogonal, and we also have
exp 1 2 z ^ T M ^ ( q ) z ^ = exp 1 2 u ^ T ( Λ ^ ( q ) ) T M ^ ( q ) Λ ^ ( q ) u ^ = exp 1 2 u ^ T D ^ ( q ) u ^ .
It then follows directly from Lemma 3 that
exp 1 2 z ^ T M ^ ( q ) z ^ F ( z 1 , , z q ) d z 1 d z q = μ 2 exp 1 2 u ^ T D ^ ( q ) u ^ · 1 | | x ^ 1 | | 2 u q 1 · u q 1 + 1 | | x ^ 2 | | 2 u q · u q + 2 | | x ^ 1 | | | | x ^ 2 | | u q 1 · u q d u 1 d u q + exp 1 2 u ^ T D ^ ( q ) u ^ ln q 1 l exp μ 2 G l + 2 δ q , l T d u 1 d u q .
Note that we have
u ^ T D ^ ( q ) u ^ = k = 1 q 2 u k · u k + m 1 u q 1 · u q 1 + m 2 u q · u q = k = 1 q 2 i = 1 n u k , i 2 + m 1 i = 1 n u q 1 , i 2 + m 2 i = 1 n u q , i 2 ,
and so we have
exp 1 2 u ^ T D ^ ( q ) u ^ = i = 1 n k = 1 q exp 1 2 d k u k , i 2 ,
where we define
d k 1 , k q 2 m 1 , k = q 1 m 2 , k = q .
Rearranging Equation (A46), we have
exp 1 2 u ^ T D ^ ( q ) u ^ = i = 1 n k = 1 q 2 π d k d k 2 π exp 1 2 u k ( 1 / d k ) 2 = i = 1 n k = 1 q 2 π d k g 0 , d k 1 ( u k , i ) ,
where g μ ˜ , σ 2 ( x ) is the Gaussian function with mean μ ˜ and variance σ 2 . The first term in Equation (A44) yields
μ 2 | | x ^ 1 | | 2 exp 1 2 u ^ T D ^ ( q ) u ^ u q 1 · u q 1 d u 1 d u q = μ 2 | | x ^ 1 | | 2 j = 1 n exp 1 2 u ^ T D ^ ( q ) u ^ u q 1 , j 2 d u 1 d u q = μ 2 | | x ^ 1 | | 2 j = 1 n u q 1 , j 2 i = 1 n k = 1 q 2 π d k g 0 , d k 1 ( u k , i ) d u k , i = μ ( 2 π ) n q 2 | | x ^ 1 | | 2 × j = 1 n ( k , i ) ( q 1 , j ) 1 d k g 0 , d k 1 ( u k , i ) d u k , i 1 d q 1 g 0 , d q 1 1 ( u q 1 , j ) u q 1 , j 2 d u q 1 , j = μ ( 2 π ) n q 2 | | x ^ 1 | | 2 j = 1 n ( k , i ) ( q 1 , j ) 1 d k d q 1 1 d q 1 = μ ( 2 π ) n q 2 | | x ^ 1 | | 2 j = 1 n i = 1 n k = 1 q 1 d k d q 1 1 = μ ( 2 π ) n q 2 | | x ^ 1 | | 2 j = 1 n 1 m 1 m 2 n 1 m 1 = n μ ( 2 π ) n q 2 | | x ^ 1 | | 2 1 ( m 1 m 2 ) n 1 m 1 .
In an analogous calculation, the second term in Equation (A44) yields
μ 2 | | x ^ 2 | | 2 exp 1 2 u ^ T D ^ ( q ) u ^ u q · u q d u 1 d u q = n μ ( 2 π ) n q 2 | | x ^ 2 | | 2 1 m 1 m 2 n 1 m 2 .
The third term in Equation (A44) is identically zero since the matrix D ^ ( q ) is diagonal (there is no covariance). From Corollary 1, Equations (A17), (A21), (A49) and (A50), we have
h ( X | W ^ ) = n ln ( σ 2 π ) + n 2 1 + μ q q 1 μ q n / 2 ( 2 π ) q n exp 1 2 u ^ T D ^ ( q ) u ^ ln q 1 l exp μ 2 G l + 2 δ q , l T d u 1 d u q , = n h σ + n 2 q q 1 μ E u ln q 1 l exp μ 2 G l + 2 δ q , l T ,
where E u is the expectation taken with respect to the density i = 1 n k = 1 q g 0 , d k 1 ( u k , i ) .

Appendix F. Proof of Lemma 4

We have u q = r / m 2 , u q 1 = s / m 1 . This normalizes the variance of r , s to 1. Hence, each component r 1 , , r n , s 1 , , s n is an independent normal-distributed variable. We show that it is consistent to write r s = W r ˜ s ˜ , with W as given in (30), and all the components r ˜ 1 , , r ˜ n , s ˜ 1 , , s ˜ n are independent and normally distributed. The change in variables can be written component-wise as r α = W 11 r ˜ α + W 12 s ˜ α , s α = W 21 r ˜ α + W 22 s ˜ α , for α { 1 , , n } . A linear combination of normal-distributed variables has a Gaussian distribution. Furthermore, we have first-order statistics E [ r α ] = 0 , E [ s α ] = 0 . The second-order statistics are given by
E [ r α 2 ] = W 11 2 E [ r ˜ α 2 ] + W 12 2 E [ s ˜ α 2 ] = μ 1 + μ q q 1 ( 1 / m 2 q 1 + x 2 2 + 1 / m 1 q 1 + x 1 2 ) = ( A 21 ) 1 ,
E [ s α 2 ] = W 21 2 E [ r ˜ α 2 ] + W 22 2 E [ s ˜ α 2 ] = μ 1 + μ q q 1 ( 1 / m 1 q 1 + x 1 2 + 1 / m 2 q 1 + x 2 2 ) = ( A 21 ) 1 ,
E [ r α s β ] = W 11 W 21 E [ r ˜ α r ˜ β ] + W 12 W 22 E [ s ˜ α s ˜ β ] = δ α β ( W 11 W 21 + W 12 W 22 ) = 0 ,
and E [ r α r β ] = δ α β , E [ s α s β ] = δ α β . The integration measure changes as
d r α d s α = | det W | d r ˜ α d s ˜ α ,
with
det W = μ 1 + μ q q 1 ( 1 / m 1 q 1 + x 1 2 1 / m 2 q 1 + x 2 2 ) = ( A 21 ) 1 .
We replace the expectation E u by E u , s ˜ , r ˜ .

Appendix G. Proof of Lemma 5

exp 1 2 u ^ T V ^ u ^ d u 1 d u q = k = 1 n exp 1 2 u ^ k T V ^ u ^ k d u ^ k = exp 1 2 u ^ T V ^ u ^ d u ^ n = ( 2 π ) q det ( V ^ 1 ) n ( 2 π ) q det ( V ^ 1 ) 1 exp 1 2 u ^ T V ^ u ^ d u ^ n = ( 2 π ) n q / 2 det ( V ^ ) n / 2 .

Appendix H. Proof of Lemma 6

2 π n q exp 1 2 a ^ T I ^ a ^ Z d a 1 d a q = 1 det ( I ^ ) n / 2 l = 1 q 1 q 1 det ( I ^ + Q ^ l ) n / 2 q 1 det ( I ^ + P ^ ) n / 2 .
For the Z 2 contribution, we have
Z 2 = Z q 1 l = 1 q 1 exp 1 2 a ^ T Q ^ l a ^ q 1 exp 1 2 a ^ T P ^ a ^ + 2 q 2 l = 1 q 1 exp 1 2 a ^ T ( Q ^ l + P ^ ) a ^ + q 2 l = 1 q 1 l = 1 q 1 exp 1 2 a ^ T ( Q ^ l + Q ^ l ) a ^ + q 2 exp 1 2 a ^ T 2 P ^ a ^ ,
and so we have
1 2 2 π n q exp 1 2 a ^ T I ^ a ^ Z 2 d a 1 d a q = ( 1 / 2 ) det ( I ^ ) n / 2 l = 1 q 1 q 1 det ( I ^ + Q ^ l ) n / 2 q 1 det ( I ^ + P ^ ) n / 2 + l = 1 q 1 q 2 det ( I ^ + Q ^ l + P ^ ) n / 2 + l = 1 q 1 l = 1 q 1 ( 1 / 2 ) q 2 det ( I ^ + Q ^ l + Q ^ l ) n / 2 + ( 1 / 2 ) q 2 det ( I ^ + 2 P ^ ) n / 2 .

Appendix I. Proof of Theorem 3

For the purpose of this calculation, we rename our expressions for simplicity as the following:
a a r ˜ · r ˜ , b b s ˜ · s ˜ , a b r ˜ · s ˜ , A l = 1 q 1 w l · w l , B l = 1 q 1 ( w l · r ˜ ) 2 , D l = 1 q 1 ( w l · w l ) ( w l · r ˜ ) , F l = 1 q 1 ( w l · w l ) 2 , H l = 1 q 1 ( w l · w l ) ( w l · r ˜ ) 2 , M l = 1 q 1 ( w l · r ˜ ) 3 , J l = 1 q 1 ( w l · r ˜ ) 4 .
Using Equation (27) and Lemma 4, we write the Taylor series for S as the following:
ln q 1 l exp μ 2 G l + 2 δ q , l T = c 1 / 2 μ + c 1 μ + c 3 / 2 μ 3 / 2 + c 2 μ 2 + O μ 5 / 2 ,
S = E u , s ˜ , r ˜ [ c 1 / 2 ] μ E u , s ˜ , r ˜ [ c 1 ] μ E u , s ˜ , r ˜ [ c 3 / 2 ] μ 3 / 2 E u , s ˜ , r ˜ [ c 2 ] μ 2 + O μ 3 ,
where we use Mathematica to obtain the following expressions for c 1 and c 2 :
2 q ( q 1 ) c 1 = ( q 1 ) A + ( q 1 ) B + 2 q a a q b b + ( q 1 ) a b 2 , 24 q 2 ( q 1 ) 2 c 2 = 3 ( q 1 ) 2 A 2 3 B 2 + 12 q a a B + 6 q B 2 6 q b b B + 3 q F 6 q H + q J 12 q 2 a a 2 12 q 2 B 12 q 2 a a B 3 q 2 B 2 + 12 q 2 a a b b + 6 q 2 b b B 3 q 2 b b 2 6 q 2 F + 12 q 2 H 2 q 2 J 12 q 3 a a + 12 q 3 a a 2 + 12 q 3 B + 12 q 3 b b 12 q 3 a a b b + 3 q 3 b b 2 + 3 q 3 F 6 q 3 H + q 3 J + 6 ( q 1 ) [ ( q 2 ) a b 2 A + ( q 1 ) A B + 2 q a a A q b b A ] + ( q 3 7 q 2 + 12 q 6 ) a b 4
6 ( q 1 ) [ ( q 2 ) a b 2 B 2 q ( q 2 ) a b 2 a a + q ( q 2 ) a b 2 b b + 4 q 2 a b 2 ] .
Furthermore, the expression for c 1 / 2 and c 3 / 2 do not contribute to S since they vanish under the expectation. We now analytically compute all the relevant terms needed.
E { u , s ˜ , r ˜ } a a = E { r ˜ } r ˜ · r ˜ = n , E { u , s ˜ , r ˜ } b b = E { s ˜ } s ˜ · s ˜ = n ,
E { u , s ˜ , r ˜ } a a 2 = E { r ˜ } ( r ˜ · r ˜ ) 2 = n ( n + 2 ) .
E { u , s ˜ , r ˜ } b b 2 = E { s ˜ } ( s ˜ · s ˜ ) 2 = n ( n + 2 ) .
E { u , s ˜ , r ˜ } a a b b = E { s ˜ , r ˜ } ( r ˜ · r ˜ ) ( s ˜ · s ˜ ) = n 2 .
E { u , s ˜ , r ˜ } a b 2 = E { s ˜ , r ˜ } ( r ˜ · s ˜ ) 2 = n .
E { u , s ˜ , r ˜ } a b 2 a a = E { s ˜ , r ˜ } ( r ˜ · s ˜ ) 2 ( r ˜ · r ˜ ) = i = 1 n j = 1 n k = 1 n E { s ˜ , r ˜ } r ˜ i r ˜ j r ˜ k r ˜ k s ˜ i s ˜ j = i = 1 n j = 1 n k = 1 n E { r ˜ } r ˜ i r ˜ j r ˜ k r ˜ k E { s ˜ } s ˜ i s ˜ j = i = 1 n j = 1 n k = 1 n E { r ˜ } W r ˜ i r ˜ j r ˜ k r ˜ k δ i , j = E { r ˜ } ( r ˜ · r ˜ ) 2 = n ( n + 2 ) .
E { u , s ˜ , r ˜ } a b 2 b b = E { s ˜ , r ˜ } ( r ˜ · s ˜ ) 2 ( s ˜ · s ˜ ) = i = 1 n j = 1 n k = 1 n E { s ˜ , r ˜ } r ˜ i r ˜ j s ˜ k s ˜ k s ˜ i s ˜ j = i = 1 n j = 1 n k = 1 n E { r ˜ } r ˜ i r ˜ j E { s ˜ } s ˜ i s ˜ j s ˜ k s ˜ k = i = 1 n j = 1 n k = 1 n E { s ˜ } s ˜ i s ˜ j s ˜ k s ˜ k δ i , j = E { s ˜ } ( s ˜ · s ˜ ) 2 = n ( n + 2 ) .
E { u , s ˜ , r ˜ } a b 4 = E { s ˜ , r ˜ } ( r ˜ · s ˜ ) 4 = i = 1 n j = 1 n k = 1 n z = 1 n E { s ˜ , r ˜ } r ˜ i r ˜ j r ˜ k r ˜ z s ˜ i s ˜ j s ˜ k s ˜ z = i = 1 n j = 1 n k = 1 n z = 1 n E { r ˜ } r ˜ i r ˜ j r ˜ k r ˜ z E { s ˜ } s ˜ i s ˜ j s ˜ k s ˜ z = i = 1 n j = 1 n k = 1 n z = 1 n E { r ˜ } r ˜ i r ˜ j r ˜ k r ˜ z δ i , j δ k , z + δ i , k δ z , j + δ i , z δ k , j = i = 1 n k = 1 n E { r ˜ } r ˜ i r ˜ i r ˜ k r ˜ k + i = 1 n j = 1 n E { r ˜ } r ˜ i r ˜ j r ˜ i r ˜ j + i = 1 n j = 1 n E { r ˜ } r ˜ i r ˜ j r ˜ j r ˜ i = 3 E { r ˜ } ( r ˜ · r ˜ ) 2 = 3 n ( n + 2 ) .
E { u , s ˜ , r ˜ } A = l = 1 q 1 E { u , s ˜ , r ˜ } w l · w l = l = 1 q 1 m = 1 q 2 m = 1 q 2 α m , l α m , l E { u , s ˜ , r ˜ } u m · u m = l = 1 q 1 m = 1 q 2 m = 1 q 2 α m , l α m , l n δ m , m = n l = 1 q 1 m = 1 q 2 α m , l 2 = n m = 1 q 2 l = 1 q α m , l 2 = n ( q 2 ) .
E { u , s ˜ , r ˜ } B = l = 1 q 1 E { u , s ˜ , r ˜ } ( w l · r ˜ ) 2 = l = 1 q 1 m = 1 q 2 m = 1 q 2 α m , l α m , l E { u , s ˜ , r ˜ } ( u m · r ˜ ) ( u m · r ˜ ) = l = 1 q 1 m = 1 q 2 m = 1 q 2 α m , l α m , l n δ m , m = n l = 1 q 1 m = 1 q 2 α m , l 2 = n m = 1 q 2 l = 1 q α m , l 2 = n ( q 2 ) .
E { u , s ˜ , r ˜ } A 2 = E { u , s ˜ , r ˜ } l = 1 q 1 w l · w l 2 = l = 1 q 1 l = 1 q 1 r = 1 q 2 r = 1 q 2 m = 1 q 2 m = 1 q 2 α m , l α m , l α r , l α r , l E { u , s ˜ , r ˜ } ( u m · u m ) ( u r · u r ) = r = 1 q 2 r = 1 q 2 m = 1 q 2 m = 1 q 2 l = 1 q 1 α m , l α m , l l = 1 q 1 α r , l α r , l E { u , s ˜ , r ˜ } ( u m · u m ) ( u r · u r ) = r = 1 q 2 r = 1 q 2 m = 1 q 2 m = 1 q 2 δ m , m δ r , r E { u , s ˜ , r ˜ } ( u m · u m ) ( u r · u r ) = r = 1 q 2 m = 1 q 2 E { u , s ˜ , r ˜ } ( u m · u m ) ( u r · u r ) = r = 1 q 2 m = 1 q 2 n 2 + 2 n δ m , r = n 2 ( q 2 ) 2 + 2 n ( q 2 ) .
E { u , s ˜ , r ˜ } B 2 = E { u , s ˜ , r ˜ } l = 1 q 1 ( w l · r ˜ ) 2 2 = l = 1 q 1 l = 1 q 1 E { u , s ˜ , r ˜ } ( w l · r ˜ ) 2 ( w l · r ˜ ) 2 = l = 1 q 1 l = 1 q 1 r = 1 q 2 r = 1 q 2 m = 1 q 2 m = 1 q 2 α m , l α m , l α r , l α r , l E { u , s ˜ , r ˜ } ( u m · r ˜ ) ( u m · r ˜ ) ( u r · r ˜ ) ( u r · r ˜ ) = r = 1 q 2 r = 1 q 2 m = 1 q 2 m = 1 q 2 l = 1 q 1 α m , l α m , l l = 1 q 1 α r , l α r , l E { u , s ˜ , r ˜ } ( u m · r ˜ ) ( u m · r ˜ ) ( u r · r ˜ ) ( u r · r ˜ )
= r = 1 q 2 r = 1 q 2 m = 1 q 2 m = 1 q 2 δ m , m δ r , r E { u , s ˜ , r ˜ } ( u m · r ˜ ) ( u m · r ˜ ) ( u r · r ˜ ) ( u r · r ˜ ) = r = 1 q 2 m = 1 q 2 E { u , s ˜ , r ˜ } ( u m · r ˜ ) ( u m · r ˜ ) ( u r · r ˜ ) ( u r · r ˜ ) = r = 1 q 2 m = 1 q 2 i = 1 n j = 1 n k = 1 n z = 1 n E u u m , i u m , j u r , k u r , z E r ˜ r ˜ i r ˜ j r ˜ k r ˜ z = r = 1 q 2 m = 1 q 2 i = 1 n j = 1 n k = 1 n z = 1 n E u u m , i u m , j u r , k u r , z δ i , j δ k , z + δ i , k δ z , j + δ i , z δ k , j = r = 1 q 2 m = 1 q 2 ( i = 1 n k = 1 n E u u m , i u m , i u r , k u r , k + i = 1 n j = 1 n E u u m , i u m , j u r , i u r , j + i = 1 n j = 1 n E u u m , i u m , j u r , j u r , i ) = r = 1 q 2 m = 1 q 2 ( i = 1 n j = 1 n E u u m , i u m , i u r , j u r , j + i = 1 n j = 1 n E u u m , i u m , j u r , i u r , j + i = 1 n j = 1 n E u u m , i u m , j u r , j u r , i ) = r = 1 q 2 m = 1 q 2 E u ( u m · u m ) ( u r · u r ) + 2 E u ( u m · u r ) ( u m · u r ) = r = 1 q 2 m = 1 q 2 n 2 + 2 n δ m , r + 2 ( n 2 + n ) δ m , r + n = ( q 2 ) 2 n 2 + 2 ( q 2 ) n + 2 ( q 2 ) n ( n + 1 ) + 2 ( q 2 ) 2 n = q ( q 2 ) n ( n + 2 ) .
E { u , s ˜ , r ˜ } A B = E { u , s ˜ , r ˜ } l = 1 q 1 w l · w l l = 1 q 1 ( w l · r ˜ ) 2 = l = 1 q 1 l = 1 q 1 r = 1 q 2 r = 1 q 2 m = 1 q 2 m = 1 q 2 α m , l α m , l α r , l α r , l E { u , s ˜ , r ˜ } ( u m · u m ) ( u r · r ˜ ) ( u r · r ˜ ) = r = 1 q 2 m = 1 q 2 E { u , s ˜ , r ˜ } ( u m · u m ) ( u r · r ˜ ) 2 = r = 1 q 2 m = 1 q 2 E u ( u m · u m ) E r ˜ ( u r · r ˜ ) 2 = r = 1 q 2 m = 1 q 2 E u ( u m · u m ) i = 1 n j = 1 n u r , i u r , j E r ˜ r ˜ i r ˜ j = r = 1 q 2 m = 1 q 2 E u ( u m · u m ) i = 1 n j = 1 n u r , i u r , j δ i , j = r = 1 q 2 m = 1 q 2 E u ( u m · u m ) i = 1 n u r , i 2 = r = 1 q 2 m = 1 q 2 E u ( u m · u m ) ( u r · u r ) = n 2 ( q 2 ) 2 + 2 n ( q 2 ) .
E { u , s ˜ , r ˜ } a a A = E { u , s ˜ , r ˜ } r ˜ · r ˜ l = 1 q 1 w l · w l = E r ˜ r ˜ · r ˜ E u l = 1 q 1 w l · w l = n 2 ( q 2 ) .
E { u , s ˜ , r ˜ } b b A = E { u , s ˜ , r ˜ } s ˜ · s ˜ l = 1 q 1 w l · w l = E s ˜ s ˜ · s ˜ E u l = 1 q 1 w l · w l = n 2 ( q 2 ) .
E { u , s ˜ , r ˜ } a a B = E { u , s ˜ , r ˜ } r ˜ · r ˜ l = 1 q 1 ( w l · r ˜ ) 2 = l = 1 q 1 m = 1 q 2 m = 1 q 2 α m , l α m , l E { u , s ˜ , r ˜ } r ˜ · r ˜ u m · r ˜ u m · r ˜ = m = 1 q 2 m = 1 q 2 δ m , m E { u , s ˜ , r ˜ } r ˜ · r ˜ u m · r ˜ u m · r ˜ = m = 1 q 2 E { u , s ˜ , r ˜ } r ˜ · r ˜ u m · r ˜ u m · r ˜ = m = 1 q 2 i = 1 n j = 1 n k = 1 n E { u , s ˜ , r ˜ } u m , i u m , j r ˜ k 2 r ˜ i r ˜ j = m = 1 q 2 i = 1 n j = 1 n k = 1 n E u u m , i u m , j E r ˜ r ˜ k 2 r ˜ i r ˜ j = m = 1 q 2 i = 1 n j = 1 n k = 1 n E u u m , i u m , j ( δ i , j + 2 δ i , k δ j , k ) = m = 1 q 2 i = 1 n k = 1 n E u u m , i 2 + 2 m = 1 q 2 k = 1 n E u u m , k 2 = n 2 ( q 2 ) + 2 n ( q 2 ) = ( q 2 ) n ( n + 2 ) .
E { u , s ˜ , r ˜ } b b B = E { u , s ˜ , r ˜ } s ˜ · s ˜ l = 1 q 1 ( w l · r ˜ ) 2 = l = 1 q 1 m = 1 q 2 m = 1 q 2 α m , l α m , l E { u , s ˜ , r ˜ } s ˜ · s ˜ u m · r ˜ u m · r ˜ = m = 1 q 2 m = 1 q 2 δ m , m E { u , s ˜ , r ˜ } s ˜ · s ˜ u m · r ˜ u m · r ˜ = m = 1 q 2 E s ˜ s ˜ · s ˜ E u , r ˜ u m · r ˜ u m · r ˜ = n m = 1 q 2 i = 1 n j = 1 n E u , r ˜ u m , i u m , j r ˜ i r ˜ j = n m = 1 q 2 i = 1 n j = 1 n E u u m , i u m , j E r ˜ r ˜ i r ˜ j = n m = 1 q 2 i = 1 n j = 1 n E u u m , i u m , j δ i , j = n m = 1 q 2 i = 1 n E u u m , i 2 = ( q 2 ) n 2 .
E { u , s ˜ , r ˜ } a b 2 A = E { u , s ˜ , r ˜ } r ˜ · s ˜ 2 l = 1 q 1 w l · w l = E { s ˜ , r ˜ } r ˜ · s ˜ 2 E u l = 1 q 1 w l · w l = ( q 2 ) n 2 .
E { u , s ˜ , r ˜ } a b 2 B = E { u , s ˜ , r ˜ } r ˜ · s ˜ 2 l = 1 q 1 ( w l · r ˜ ) 2 = l = 1 q 1 m = 1 q 2 m = 1 q 2 i = 1 n j = 1 n k = 1 n z = 1 n α m , l α m , l E { u , s ˜ , r ˜ } r ˜ i s ˜ i r ˜ j s ˜ j r ˜ k r ˜ z u m , k u m , z = m = 1 q 2 i = 1 n j = 1 n k = 1 n z = 1 n E { u , s ˜ , r ˜ } r ˜ i s ˜ i r ˜ j s ˜ j r ˜ k r ˜ z u m , k u m , z = m = 1 q 2 i = 1 n j = 1 n k = 1 n z = 1 n E r ˜ r ˜ i r ˜ j r ˜ k r ˜ z E s ˜ s ˜ i s ˜ j E u u m , k u m , z = m = 1 q 2 i = 1 n j = 1 n k = 1 n z = 1 n E r ˜ r ˜ i r ˜ j r ˜ k r ˜ z δ i , j E u u m , k u m , z = m = 1 q 2 i = 1 n k = 1 n z = 1 n E r ˜ r ˜ i 2 r ˜ k r ˜ z E u u m , k u m , z = m = 1 q 2 i = 1 n k = 1 n z = 1 n ( δ k , z + 2 δ i , k δ i , z ) E u u m , k u m , z = m = 1 q 2 i = 1 n k = 1 n E u u m , k 2 + 2 m = 1 q 2 i = 1 n E u u m , i 2 = ( q 2 ) n ( n + 2 ) .
E { u , s ˜ , r ˜ } F = E u l = 1 q 1 ( w l · w l ) 2 = l = 1 q 1 m = 1 q 2 m = 1 q 2 r = 1 q 2 r = 1 q 2 α m , l α m , l α r , l α r , l E u ( u m · u m ) ( u r · u r ) = l = 1 q 1 m = 1 q 2 m = 1 q 2 r = 1 q 2 r = 1 q 2 α m , l α m , l α r , l α r , l n 2 δ m , m δ r , r + n δ m , r δ m , r + n δ m , r δ m , r = ( n 2 + 2 n ) l = 1 q 1 m = 1 q 2 r = 1 q 2 α m , l 2 α r , l 2 .
E { u , s ˜ , r ˜ } H = E { u , r ˜ } l = 1 q 1 ( w l · w l ) ( w l · r ˜ ) 2 = l = 1 q 1 m = 1 q 2 m = 1 q 2 r = 1 q 2 r = 1 q 2 α m , l α m , l α r , l α r , l E { u , r ˜ } ( u m · u m ) ( u r · r ˜ ) ( u r · r ˜ ) = l = 1 q 1 m = 1 q 2 m = 1 q 2 r = 1 q 2 r = 1 q 2 i = 1 n j = 1 n k = 1 n α m , l α m , l α r , l α r , l E { u , r ˜ } u m , i u m , i u r , j u r , k r ˜ j r ˜ k = l = 1 q 1 m = 1 q 2 m = 1 q 2 r = 1 q 2 r = 1 q 2 i = 1 n j = 1 n k = 1 n α m , l α m , l α r , l α r , l E u u m , i u m , i u r , j u r , k E r ˜ r ˜ j r ˜ k = l = 1 q 1 m = 1 q 2 m = 1 q 2 r = 1 q 2 r = 1 q 2 i = 1 n j = 1 n k = 1 n α m , l α m , l α r , l α r , l E u u m , i u m , i u r , j u r , k δ j , k = l = 1 q 1 m = 1 q 2 m = 1 q 2 r = 1 q 2 r = 1 q 2 i = 1 n j = 1 n α m , l α m , l α r , l α r , l E u u m , i u m , i u r , j u r , j = l = 1 q 1 m = 1 q 2 m = 1 q 2 r = 1 q 2 r = 1 q 2 α m , l α m , l α r , l α r , l E u ( u m · u m ) ( u r · u r ) = l = 1 q 1 m = 1 q 2 m = 1 q 2 r = 1 q 2 r = 1 q 2 α m , l α m , l α r , l α r , l n 2 δ m , m δ r , r + n δ m , r δ m , r + n δ m , r δ m , r = n ( n + 2 ) l = 1 q 1 m = 1 q 2 r = 1 q 2 α m , l 2 α r , l 2 .
E { u , s ˜ , r ˜ } J = E { u , r ˜ } l = 1 q 1 ( w l · r ˜ ) 4 = l = 1 q 1 m = 1 q 2 m = 1 q 2 r = 1 q 2 r = 1 q 2 α m , l α m , l α r , l α r , l E { u , r ˜ } ( u m · r ˜ ) ( u m · r ˜ ) ( u r · r ˜ ) ( u r · r ˜ ) = l = 1 q 1 m = 1 q 2 m = 1 q 2 r = 1 q 2 r = 1 q 2 i = 1 n j = 1 n k = 1 n z = 1 n α m , l α m , l α r , l α r , l E { u , r ˜ } u m , i u m , j u r , k u r , z r ˜ i r ˜ j r ˜ k r ˜ z = l = 1 q 1 m = 1 q 2 m = 1 q 2 r = 1 q 2 r = 1 q 2 i = 1 n j = 1 n k = 1 n z = 1 n α m , l α m , l α r , l α r , l E u u m , i u m , j u r , k u r , z E r ˜ r ˜ i r ˜ j r ˜ k r ˜ z = l = 1 q 1 m = 1 q 2 m = 1 q 2 r = 1 q 2 r = 1 q 2 i = 1 n j = 1 n k = 1 n z = 1 n α m , l α m , l α r , l α r , l E u u m , i u m , j u r , k u r , z δ i , j δ k , z + δ i , k δ j , z + δ i , z δ j , k = l = 1 q 1 m = 1 q 2 m = 1 q 2 r = 1 q 2 r = 1 q 2 α m , l α m , l α r , l α r , l ( i = 1 n k = 1 n E u u m , i u m , i u r , k u r , k + i = 1 n j = 1 n E u u m , i u m , j u r , i u r , j
+ i = 1 n j = 1 n E u u m , i u m , j u r , j u r , i ) = l = 1 q 1 m = 1 q 2 m = 1 q 2 r = 1 q 2 r = 1 q 2 α m , l α m , l α r , l α r , l ( E u ( u m · u m ) ( u r · u r ) + E u ( u m · u r ) ( u m · u r ) + E u ( u m · u r ) ( u m · u r ) ) = 3 l = 1 q 1 m = 1 q 2 m = 1 q 2 r = 1 q 2 r = 1 q 2 α m , l α m , l α r , l α r , l E u ( u m · u m ) ( u r · u r ) = 3 l = 1 q 1 m = 1 q 2 m = 1 q 2 r = 1 q 2 r = 1 q 2 α m , l α m , l α r , l α r , l n 2 δ m , m δ r , r + n δ m , r δ m , r + n δ m , r δ m , r = 3 n ( n + 2 ) l = 1 q 1 m = 1 q 2 r = 1 q 2 α m , l 2 α r , l 2 .
Direct substitution yields
E u , s ˜ , r ˜ [ c 1 ] = n ( 2 q 1 ) 2 q ( q 1 ) , E u , s ˜ , r ˜ [ c 2 ] = n ( n + q ) ( q 1 ) 4 q 2 ,
and so we have
S = n ( 2 q 1 ) 2 q ( q 1 ) μ n ( n + q ) ( q 1 ) 4 q 2 μ 2 + O μ 3 .
From Equation (A89) and Theorem 2, we have
h ( X | W ^ ) = n h σ + n 2 1 1 q μ n ( n + q ) ( q 1 ) 4 q 2 μ 2 + O μ 3 , = n h σ + n 2 1 1 q μ n 2 1 1 q n q 1 + 1 2 q μ 2 + O μ 3 .

References

  1. Zhu, H.; Guo, R.; Shen, J.; Liu, J.; Liu, C.; Xue, X.X.; Zhang, L.; Mao, S. The Local Dark Matter Kinematic Substructure Based on LAMOST K Giants. arXiv 2024, arXiv:2404.19655. [Google Scholar]
  2. Turner, W.; Martini, P.; Karaçaylı, N.G.; Aguilar, J.; Ahlen, S.; Brooks, D.; Claybaugh, T.; de la Macorra, A.; Dey, A.; Doel, P.; et al. New measurements of the Lyman-α forest continuum and effective optical depth with LyCAN and DESI Y1 data. arXiv 2024, arXiv:2405.06743. [Google Scholar]
  3. Wu, Y.; Chen, M.; Li, Z.; Wang, M.; Wei, Y. Theoretical insights for diffusion guidance: A case study for gaussian mixture models. arXiv 2024, arXiv:2403.01639. [Google Scholar]
  4. Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar]
  5. Sulam, J.; Romano, Y.; Elad, M. Gaussian mixture diffusion. In Proceedings of the 2016 IEEE International Conference on the Science of Electrical Engineering (ICSEE), Eilat, Israel, 16–18 November 2016; pp. 1–5. [Google Scholar]
  6. Guo, H.; Lu, C.; Bao, F.; Pang, T.; Yan, S.; Du, C.; Li, C. Gaussian Mixture Solvers for Diffusion Models. Adv. Neural Inf. Process. Syst. 2024, 36. [Google Scholar]
  7. Turan, N.; Böck, B.; Chan, K.J.; Fesl, B.; Burmeister, F.; Joham, M.; Fettweis, G.; Utschick, W. Wireless Channel Prediction via Gaussian Mixture Models. arXiv 2024, arXiv:2402.08351. [Google Scholar]
  8. Parmar, A.; Shah, K.; Captain, K.; López-Benítez, M.; Patel, J. Gaussian Mixture Model Based Anomaly Detection for Defense Against Byzantine Attack in Cooperative Spectrum Sensing. IEEE Trans. Cogn. Commun. Netw. 2023, 10, 499–509. [Google Scholar] [CrossRef]
  9. Qiu, X.; Jiang, T.; Wu, S.; Hayes, M. Physical layer authentication enhancement using a Gaussian mixture model. IEEE Access 2018, 6, 53583–53592. [Google Scholar] [CrossRef]
  10. McNicholas, P.D.; Murphy, T.B. Model-based clustering of microarray expression data via latent Gaussian mixture models. Bioinformatics 2010, 26, 2705–2712. [Google Scholar] [CrossRef] [PubMed]
  11. Toh, H.; Horimoto, K. Inference of a genetic network by a combined approach of cluster analysis and graphical Gaussian modeling. Bioinformatics 2002, 18, 287–297. [Google Scholar] [CrossRef] [PubMed]
  12. Raymond, N.; Iouchtchenko, D.; Roy, P.N.; Nooijen, M. A path integral methodology for obtaining thermodynamic properties of nonadiabatic systems using Gaussian mixture distributions. J. Chem. Phys. 2018, 148, 194110. [Google Scholar] [CrossRef] [PubMed]
  13. Sohl-Dickstein, J.; Weiss, E.; Maheswaranathan, N.; Ganguli, S. Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the International Conference on Machine Learning, PMLR, Lille, France, 7–9 July 2015; pp. 2256–2265. [Google Scholar]
  14. Cover, T.; Thomas, J. Elements of Information Theory; John Wiley & Sons: Hoboken, NJ, USA, 1999. [Google Scholar]
  15. Michalowicz, J.V.; Nichols, J.M.; Bucholtz, F. Calculation of differential entropy for a mixed Gaussian distribution. Entropy 2008, 10, 200. [Google Scholar] [CrossRef]
  16. Nielsen, F.; Sun, K. Guaranteed bounds on the Kullback–Leibler divergence of univariate mixtures. IEEE Signal Process. Lett. 2016, 23, 1543–1546. [Google Scholar] [CrossRef]
  17. Nielsen, F.; Nock, R. A series of maximum entropy upper bounds of the differential entropy. arXiv 2016, arXiv:1612.02954. [Google Scholar]
  18. Hershey, J.R.; Olsen, P.A. Approximating the Kullback Leibler divergence between Gaussian mixture models. In Proceedings of the 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07, IEEE, Honolulu, HI, USA, 15–20 April 2007; Volume 4, pp. IV-317–IV-320. [Google Scholar]
  19. Goldberger; Gordon; Greenspan. An efficient image similarity measure based on approximations of KL-divergence between two Gaussian mixtures. In Proceedings of the Ninth IEEE International Conference on Computer Vision, Nice, France, 13–16 October 2003; pp. 487–493. [Google Scholar]
  20. Huber, M.F.; Bailey, T.; Durrant-Whyte, H.; Hanebeck, U.D. On entropy approximation for Gaussian mixture random vectors. In Proceedings of the 2008 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, Seoul, Republic of Korea, 20–22 August 2008; pp. 181–188. [Google Scholar]
  21. Kolchinsky, A.; Tracey, B.D. Estimating mixture entropy with pairwise distances. Entropy 2017, 19, 361. [Google Scholar] [CrossRef]
  22. Cox, J.; Kilian, J.; Leighton, F.; Shamoon, T. Secure spread spectrum watermarking for multimedia. IEEE Trans. Image Process. 1997, 6, 1673–1687. [Google Scholar] [CrossRef] [PubMed]
  23. Wu, S.; Huang, Y.; Guan, H.; Zhang, S.; Liu, J. ECSS: High-Embedding-Capacity Audio Watermarking with Diversity Reception. Entropy 2022, 22, 1843. [Google Scholar] [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Joudeh, B.; Škorić, B. Average Entropy of Gaussian Mixtures. Entropy 2024, 26, 659. https://doi.org/10.3390/e26080659

AMA Style

Joudeh B, Škorić B. Average Entropy of Gaussian Mixtures. Entropy. 2024; 26(8):659. https://doi.org/10.3390/e26080659

Chicago/Turabian Style

Joudeh, Basheer, and Boris Škorić. 2024. "Average Entropy of Gaussian Mixtures" Entropy 26, no. 8: 659. https://doi.org/10.3390/e26080659

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop