Next Article in Journal
Turing Instability and Spatial Pattern Formation in a Model of Urban Crime
Previous Article in Journal
On the Oscillatory Behavior of Solutions of Second-Order Non-Linear Differential Equations with Damping Term
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A New Class of Bayes Minimax Estimators of the Mean Matrix of a Matrix Variate Normal Distribution

by
Shokofeh Zinodiny
1 and
Saralees Nadarajah
2,*
1
Department of Mathematics, Amirkabir University of Technology, Tehran 15916-34311, Iran
2
Department of Mathematics, University of Manchester, Manchester M13 9PL, UK
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(7), 1098; https://doi.org/10.3390/math12071098
Submission received: 4 January 2024 / Revised: 24 February 2024 / Accepted: 4 April 2024 / Published: 5 April 2024

Abstract

:
Bayes minimax estimation is important because it provides a robust approach to statistical estimation that considers the worst-case scenario while incorporating prior knowledge. In this paper, Bayes minimax estimation of the mean matrix of a matrix variate normal distribution is considered under the quadratic loss function. A large class of (proper and generalized) Bayes minimax estimators of the mean matrix is presented. Two examples are given to illustrate the class of estimators, showing, among other things, that the class includes classes of estimators presented by Tsukuma.

1. Introduction

Let X = x i , j be a p × m matrix random variable with a matrix variate normal distribution with mean matrix Θ = θ i , j and covariance matrix I p I m , where I k is the k × k identity matrix and ⊗ denotes the Kronecker product.
The matrix variate normal distribution finds applications across various fields, including multivariate statistical analysis, machine learning, and signal processing. In multivariate statistical analysis, it serves as a fundamental tool for modeling covariance structures in datasets where the observations are matrices, such as in longitudinal studies or multivariate time series analysis. In machine learning, it is utilized for modeling complex dependencies among high-dimensional data, particularly in tasks involving matrix-valued inputs or outputs, such as in recommender systems or tensor factorization. Moreover, in signal processing, the matrix variate normal distribution is employed for modeling the joint distribution of multiple correlated signals or images, enabling efficient estimation and inference in applications such as array processing or medical imaging.
Some recent applications of the matrix variate normal distribution include analysis of multiple vector autoregressions [1]; brain connectivity alternation detection [2]; capacity for severely fading MIMO channels [3]; integrated principal components analysis [4]; determination of the relationship between incidence and mortality of asthma with PM2.5, ozone, and household air pollution [5]; autism spectrum disorder identification [6]; and identification of depression disorder using multi-view high-order brain function networks [7], to mention just a few.
Bayesian minimax estimation is a statistical approach that combines Bayesian inference with minimax decision theory. In traditional Bayesian inference, we use prior knowledge and observed data to update our beliefs about the parameters of interest. Minimax decision theory, on the other hand, focuses on minimizing the maximum possible loss (risk) that can occur under different parameter values.
In Bayesian minimax estimation, we seek an estimator that minimizes the maximum possible posterior expected loss, where the expectation is taken with respect to the posterior distribution of the parameter given the observed data. This approach is particularly useful when there is uncertainty about the true parameter value and when it is important to protect against worst-case scenarios.
There has not been much work on Bayesian estimation of the parameters of the matrix variate normal distribution. Ref. [8] extended the so-called Stein effect and proposed an empirical Bayes estimator, outperforming the maximum likelihood estimator, X , for the case m > p + 1 . Since then, many classes of minimax estimators better than the maximum likelihood estimator have been found. Ref. [9] derived a large class of unbiased risk estimators, including a class of minimax estimators obtained by [8]. Using the result of Stein, ref. [10] extended the results of [11] to the multivariate case. For the case of Σ I m , where Σ is an unknown positive definite matrix and p > m + 1 , ref. [12] introduced a class of minimax estimators containing those of [8]. Ref. [13] derived a large class of minimax estimators using the Stein identity and the Haff identity [14] for the case m > p + 1 . For the case of Σ = I m , Ref. [15] found orthogonally invariant hierarchical priors, resulting in Bayes estimators that are admissible and minimax. For the case of an unknown covariance matrix, Ref. [16] obtained a generalized Bayes class of minimax estimators of the mean matrix for m > p + 1 , p > m + 1 . Ref. [17] obtained Bayes minimax estimators of the mean for the case of common unknown variance. Ref. [18] obtained Bayes minimax estimators of the normal mean matrix for the case of common unknown variances.
For the problem of estimating the mean matrix of an elliptically contoured distribution, Ref. [19] derived generalized Bayes minimax estimators for the mean matrix; ref. [20] also obtained a class of minimax estimators for the mean matrix, which was used to find a class of proper Bayes minimax estimators of Θ .
In this paper, we derive a large class of (proper and generalized) Bayes minimax estimators of Θ containing estimators of [15] as a special case. In fact, we extend the results of [21] to the multivariate case. The main result, giving a large class of (proper and generalized) Bayes minimax estimators, is developed in Section 2. Section 3 considers two examples of classes of (proper and generalized) Bayes estimators. In particular, example 1 demonstrates a result from [15]. Some concluding remarks are given in Section 4.
Throughout this paper, let A , tr ( A ) and A denote, respectively, the determinant, trace and transpose of a matrix A . Also for A and B , let B < A mean that A B is positive definite.

2. A Class of Bayes Minimax Estimators of the Mean Matrix

Let N p × m Θ , I p I m denote the matrix variate normal distribution with mean matrix Θ and covariance matrix I p I m . Assume that X N p × m Θ , I p I m . Assume also that
Θ | Λ N p × m 0 p × m , Λ 1 I p Λ I m , Λ Λ a 2 1 g t r ( Λ ) p , 0 p × p < Λ < I p , a > m ,
where Λ = λ i , j is a p × p matrix distributed as g, and g is a differentiable positive function on ( 0 , 1 ) . In addition, assume g is such that
π Θ = ( 2 π ) p m 2 0 p × p < Λ < I p g t r ( Λ ) p Λ a + m 2 1 I p Λ m 2 e t r I p Λ 1 Λ Θ Θ 2 d Λ ,
where d Λ = i j p d λ i , j . Note that (2) will be proper if g is integrable on its domain.
The purpose of this section is to construct generalized (and proper) Bayes minimax estimators of Θ under the loss function
L δ ; Θ = t r δ Θ δ Θ .
The following lemmas give sufficient conditions on g and a such that the generalized (or proper) Bayes estimators with respect to (2) are minimax.
Let O p be the set of orthogonal matrices of order p. Let V m , p = V R m × p , V V = I p , where m p . Write X as ULV , where U O p , V V m , p and L = diag l 1 , l 2 , , l p with l 1 > l 2 > > l p > 0 .
Lemma 1.
For i = 1 , , p , write ϕ i = ϕ i F and F = diag f 1 , , f p = L 2 . The risk of a shrinkage equivariant estimator δ = UL I p Φ ( F ) V is
R δ ; Θ = m p + E i = 1 p f i ϕ i 2 2 m p + 1 ϕ i 4 f i ϕ i f i 4 j > i f i ϕ i f j ϕ j f i f j
provided each expectation exists.
Proof. 
See [9].
If Φ F = F 1 Ψ F , where Ψ F = diag ψ 1 ( F ) , , ψ p ( F ) , then by replacing ϕ i by ψ i f i , (4) can be written as
R δ ; Θ = m p + E i = 1 p ψ i 2 f i 2 m p 1 ψ i f i 4 f i ψ i f i f i 4 j > i ψ i ψ j f i f j .
Using (5), we obtain Corollary 1. □
Corollary 1.
Suppose
δ = I p U F 1 Ψ ( F ) U X ,
where F 1 = diag f 1 1 , f 2 1 , , f p 1 . Then, δ is minimax under (3) if
I. 
For any i, ψ i is non-decreasing with respect to f i ;
II. 
0 ψ p ψ p 1 ψ 1 2 m p 1 .
We give conditions on g and a for obtaining generalized (proper) Bayes estimators of the form (6) such that the resulting estimators satisfy the conditions of Corollary 1, and hence are minimax. Note that the conditional distribution of Θ given X , Λ is N p × m I p Λ X , I p Λ I m . Therefore, the generalized Bayes estimator of Θ with respect to (2) under (3) is (see [15])
δ π ( X ) = E Θ X = E E Θ | X , Λ | X = I p E Λ | X X .
E Λ | X denotes expectation with respect to the posterior distribution of Λ , that is
p Λ | X g t r ( Λ ) p Λ a + m 2 1 e t r Λ XX 2 I 0 p × p < Λ < I p ,
so the resulting estimator δ π ( X ) can be written as δ π ( X ) = I p E Λ | X X , where
E Λ | X = 0 p × p < Λ < I p Λ g t r ( Λ ) p Λ a + m 2 1 e t r Λ XX 2 d Λ 0 p × p < Λ < I p g t r ( Λ ) p Λ a + m 2 1 e t r Λ XX 2 d Λ .
Now, using X = ULV and letting Λ U Λ U ,
E Λ | X = U 0 p × p < Λ < I p Λ g t r ( Λ ) p Λ a + m 2 1 e t r Λ F 2 d Λ 0 p × p < Λ < I p g t r ( Λ ) p Λ a + m 2 1 e t r Λ F 2 d Λ U .
So, we have δ π = I p U Φ ( F ) U X , where
Φ F = 0 p × p < Λ < I p Λ g t r ( Λ ) p Λ a + m 2 1 e t r Λ F 2 d Λ 0 p × p < Λ < I p g t r ( Λ ) p Λ a + m 2 1 e t r Λ F 2 d Λ .
The estimation problem discussed in this paper is invariant with respect to X PXQ and Θ P Θ Q for any P O p and Q O m . Also, (2) is orthogonally invariant, namely
π Θ = π P Θ Q .
For every P O p and Q O m , according to Lemma 1 in [15], Φ F is a diagonal matrix, say Φ ( F ) = diag ϕ 1 F , , ϕ p F . Also, δ π = I p U Φ ( F ) U X , where Φ = F 1 Ψ ( F ) and the resulting δ π is of the form (4) with Ψ ( F ) = diag ψ 1 , ψ 2 , , ψ p and
ψ i F = f i 0 p × p < Λ < I p λ i , i g t r ( Λ ) p Λ a + m 2 1 e t r Λ F 2 d Λ 0 p × p < Λ < I p g t r ( Λ ) p Λ a + m 2 1 e t r Λ F 2 d Λ .
Now, let λ k = λ k , k for k = 1 , , p and λ k , l = γ k , l λ k , k λ l , l for k < l . The Jacobian of this transformation is
J λ 1 , 1 , , λ p , p , λ 1 , 2 , , λ p 1 , p λ 1 , , λ p , γ 1 , 2 , , γ p 1 , p = k = 1 p λ k p 1 2 .
It holds that Λ = Γ k = 1 p λ k , where Γ = γ k , l is a p × p positive definite matrix with γ k , k = 1 . Denoting d Γ = k < l d γ k , l and d λ = k = 1 p d λ k , we can write ψ i as
ψ i F = f i 0 p × p < Γ < I p 0 1 0 1 λ i g p 1 k = 1 p λ k Γ a + m 2 1 k = 1 p λ k a + p + m 3 2 e k = 1 p λ k f k 2 d λ d Γ 0 p × p < Γ < I p 0 1 0 1 g p 1 k = 1 p λ k Γ a + m 2 1 k = 1 p λ k a + p + m 3 2 e k = 1 p λ k f k 2 d λ d Γ .
Note that 0 p × p < Γ < I p Γ a + m 2 1 d Γ is finite for a > m (see, for example, Theorem 1.4.5 on page 22 of [22]). Then, we can write
ψ i F = f i 0 1 0 1 λ i g p 1 k = 1 p λ k k = 1 p λ k a + p + m 3 2 e k = 1 p λ k f k 2 d λ 0 1 0 1 g p 1 k = 1 p λ k k = 1 p λ k a + p + m 3 2 e k = 1 p λ k f k 2 d λ .
Lemma 2 shows that 0 ψ p ψ p 1 ψ 1 .
Lemma 2.
If g ( t ) is a decreasing function in t, then 0 ψ p ψ p 1 ψ 1 .
Proof. 
We show that ψ i ψ j 0 for j > i . The proof is similar to the proof of part (iv) of Lemma 3.1 in [20]. By using the transformation y k = λ k f k , k = 1 , , p with the Jacobian J λ 1 , , λ p y 1 , , y p = k = 1 p f k 1 , (16) can be written as
ψ i = 0 f 1 0 f p y i g p 1 k = 1 p y k f k k = 1 p y k a + p + m 3 2 e k = 1 p y k 2 d y p d y 1 0 f 1 0 f p g p 1 k = 1 p y k f k k = 1 p y k a + p + m 3 2 e k = 1 p y k 2 d y p d y 1
for i = 1 , , p . For j > i , we can write
ψ i ψ j = 0 f 1 0 f p y i y j g p 1 t r Y F 1 Y a + p + m 3 2 e t r Y 2 d Y 0 f 1 0 f p g p 1 t r Y F 1 Y a + p + m 3 2 e t r Y 2 d Y ,
where Y = diag y 1 , y 2 , , y p . □
In order to prove ψ i ψ j 0 for every j > i , without any loss of generality, it is enough to show that the following function is non-negative
L F = 0 f 1 0 f p y 1 y 2 g p 1 t r Y F 1 Y a + p + m 3 2 e t r Y 2 d Y .
Now, let O 1 , 2 denote the p × p permutation matrix which interchanges the first and second rows by letting Y O 1 , 2 Y . Let W = O 1 , 2 Y O 1 , 2 . The Jacobian of the transformation is J Y W = 1 because O 1 , 2 = O 1 , 2 = O 1 , 2 1 and W = diag w 1 , w 2 , , w p . We can rewrite (19) as
L F = 0 f 1 0 f p w 2 w 1 g p 1 w 1 f 2 + w 2 f 1 + k 1 , 2 p w k f k W a + p + m 3 2 e t r W 2 d W .
Note that we can replace w k ’s with y k ’s in (20); its value does not change, meaning
L F = 0 f 1 0 f p y 2 y 1 g p 1 y 1 f 2 + y 2 f 1 + k 1 , 2 p y k f k W a + p + m 3 2 e t r Y 2 d Y .
Combining (19) and (21) yields
2 L F = 0 f 1 0 f p y 1 y 2 g p 1 k = 1 p y k f k g p 1 y 1 f 2 + y 2 f 1 + k 1 , 2 p y k f k Y a + p + m 3 2 e t r Y 2 d Y .
Note that we have two cases. One case is y 1 y 2 and the other case is y 1 < y 2 . If y 1 y 2 , since f 1 > f 2 , then
y 1 1 f 1 1 f 2 y 2 1 f 1 1 f 2
which implies
p 1 y 1 f 1 + y 2 f 2 + k 1 , 2 p y k f k = p 1 k = 1 p y k f k p 1 y 1 f 2 + y 2 f 1 + k 1 , 2 p y k f k .
Since g ( · ) is a decreasing function,
g p 1 k = 1 p λ k f k g p 1 λ 1 f 2 + λ 2 f 1 + k 1 , 2 p λ k f k ,
so we have
y 1 y 2 g p 1 k = 1 p λ k f k g p 1 λ 1 f 2 + λ 2 f 1 + k 1 , 2 p λ k f k 0 .
Hence, L ( F ) 0 for the case y 1 y 2 . For the case y 1 y 2 , we can similarly show L ( F ) 0 . It can be proven similarly that ψ i ψ j 0 for every j > i and hence ψ p ψ p 1 ψ 1 . Clearly, ψ p 0 , and hence 0 ψ p ψ p 1 ψ 1 .
We need Lemma 3 to continue.
Lemma 3.
Let ζ denote a probability density function with respect to a σ-finite measure υ on R p . For any two points, λ = λ 1 , , λ p and μ = μ 1 , , μ p , define λ μ = min λ 1 , μ 1 , , min λ p , μ p and λ μ = max λ 1 , μ 1 , , max λ p , μ p . Suppose ζ satisfies
ζ λ ζ μ ζ λ μ ζ λ μ .
If functions f and g are non-decreasing in each argument and if f, g and f g are integrable with respect to ζ, then
f λ g λ ζ λ d υ λ f λ ζ λ d υ λ g λ ζ λ d υ λ .
Proof. 
See [23]. □
Lemma 4 gives conditions for ψ i f 1 , f 2 , , f i 1 , f i , f i + 1 , , f p to be non-decreasing in f i , i = 1 , , p for fixed f 1 , f 2 , , f i 1 , f i + 1 , , f p .
Lemma 4.
Suppose g satisfies
I. 
lim λ i 0 λ i a + m + p 1 2 g p 1 k = 1 p λ k = 0 for i = 1 , , p ;
II. 
For i = 1 , , p , λ i g p 1 k = 1 p λ k g p 1 k = 1 p λ k is non-increasing in λ j , j = 1 , , p ;
III. 
For λ = λ 1 , λ 2 , , λ p and λ = λ 1 , λ 2 , , λ p , where 0 < λ i , λ i < 1 , i = 1 , , p , g ( · ) satisfies
g p 1 k = 1 p λ k g p 1 k = 1 p λ k g p 1 k = 1 p λ k λ k g p 1 k = 1 p λ k λ k .
Then, for any i, ψ i is non-decreasing with respect to f i .
Proof. 
We can write (16) as
ψ i = f i Φ 1 ( F ) Φ 0 ( F ) ,
where
Φ k F = 0 1 0 1 λ i k g p 1 k = 1 p λ k k = 1 p λ k a + p + m 3 2 e k = 1 p λ k f k 2 d λ
for k = 0 , 1 . We have
ψ i f i = Φ 1 ( F ) Φ 0 ( F ) + f i Φ 0 2 ( F ) Φ 1 ( F ) f i Φ 0 ( F ) Φ 0 ( F ) f i Φ 1 ( F )
and
Φ k ( F ) f i = 1 f i C ( F ) k + a + m + p 1 2 Φ k B k ( F ) ,
where
C F = 0 1 0 1 g p 1 1 + k i λ k k = 1 p λ k a + p + m 3 2 e k i λ k f k 2 d λ i
and
B k F = 0 1 0 1 λ i k + 1 g p 1 k = 1 p λ k λ i k = 1 p λ k a + p + m 3 2 e k = 1 p λ k f k 2 d λ .
By replacing (33) with (32), we have
ψ i f i = 1 Φ 0 2 ( F ) Φ 0 Φ 1 C ( F ) + B 0 Φ 1 B 1 Φ 0 .
Since C ( F ) 0 and also Φ 0 Φ 1 0 , we have Φ 0 Φ 1 C ( F ) 0 . In order to prove ψ i f i 0 , it suffices to show that B 0 ( F ) Φ 0 Φ 1 Φ 0 B 1 ( F ) Φ 0 . That is,
  0 1 0 1 λ i 2 g p 1 k = 1 p λ k g p 1 k = 1 p λ k ξ λ d λ 0 1 0 1 λ i g p 1 k = 1 p λ k g p 1 k = 1 p λ k ξ λ d λ 0 1 0 1 λ i ξ λ d λ ,
where
ξ λ = g p 1 k = 1 p λ k k = 1 p λ k a + p + m 3 2 e k = 1 p λ k f k 2 0 1 0 1 g p 1 k = 1 p λ k k = 1 p λ k a + p + m 3 2 e k = 1 p λ k f k 2 d λ .
Using condition (III), it is easy to show that
ξ λ ξ λ ξ λ λ ξ λ λ
for λ = λ 1 , λ 2 , , λ p and λ = λ 1 , λ 2 , , λ p . Because of condition (II), Lemma 3 can be applied to prove (37), so ψ i f i 0 for all i = 1 , , p . □
Lemma 5 gives conditions for determining an upper bound on ψ 1 .
Lemma 5.
Assume that g 1 = lim t 1 g ( t ) < and
lim t 0 g t e α t = c .
For some α 0 and some c > 0 ,
lim f p lim f p 1 lim f 1 ψ 1 ( F ) = a + m + p 1 .
Proof. 
Note that g t e α t is continuous in ( 0 , 1 ) and has limits at the points 0 and 1. So, this function is bounded on its domain, meaning there exists a k > 0 such that
g t k e α t .
From (16),
ψ 1 F = f 1 0 1 0 1 λ 1 g p 1 k = 1 p λ k k = 1 p λ k a + p + m 3 2 e k = 1 p λ k f k 2 d λ 0 1 0 1 g p 1 k = 1 p λ k k = 1 p λ k a + p + m 3 2 e k = 1 p λ k f k 2 d λ .
Letting λ k λ k f k , we obtain
ψ 1 F = 0 0 M 1 F , λ d λ 0 0 M 0 F , λ d λ ,
where for i = 0 , 1 ,
M i F , λ = λ 1 i g p 1 k = 1 p λ k f k k = 1 p λ k a + p + m 3 2 I 0 < λ k < f k e k = 1 p λ k 2 .
We now bound the integrand of M i in order to apply the Lebesgue-dominated convergence theorem. First, using (42), we have
0 0 0 M i F , λ d λ = 0 f 1 0 f 2 0 f p λ 1 i g 1 p k = 1 p λ k f k k = 1 p λ k a + p + m 3 2 e k = 1 p λ k 2 d λ k 0 f 1 0 f 2 0 f p λ 1 i e α p k = 1 p λ k f k k = 1 p λ k a + p + m 3 2 e k = 1 p λ k 2 d λ .
Since α 0 , we have
  0 f 1 0 f 2 0 f p λ 1 i e α p k = 1 p λ k f k k = 1 p λ k a + p + m 3 2 e k = 1 p λ k 2 d λ 0 f 1 0 f 2 0 f p λ 1 i k = 1 p λ k a + p + m 3 2 e k = 1 p λ k 2 d λ .
Since a > m , we have
0 f 1 0 f 2 0 f p λ 1 i k = 1 p λ k a + p + m 3 2 e k = 1 p λ k 2 d λ = 2 p ( a + m + p 1 ) 2 + i Γ p 1 a + p + m 1 2 Γ a + p + m + 2 i 1 2 .
Thus, (46) is finite, and the Lebesgue-dominated convergence theorem can be used. Hence,
lim f p lim f p 1 lim f 1 0 0 0 M i F , λ d λ = 0 0 0 lim f p lim f p 1 lim f 1 M i F , λ d λ = c 0 0 0 λ 1 i k = 1 p λ k a + p + m 3 2 e k = 1 p λ k 2 d λ = c 2 p ( a + m + p 1 ) 2 + i Γ p 1 a + p + m 1 2 Γ a + p + m + 2 i 1 2 .
Finally, using (44) and the above limits, we have
lim f p lim f p 1 lim f 1 ψ 1 ( F ) = a + m + p 1 ,
the desired result.
The results of Lemmas 2, 4 and 5 are our main result.
Theorem 1.
(a) If the conditions of Lemmas 2, 4 and 5 hold and if a < m 3 p 1 , then the generalized Bayes estimator δ π X with respect to (2) is minimax under the loss function (3).
(b) Further, if g is integrable, then the estimator δ π is proper Bayes and minimax, and hence admissible under (3).

3. Examples

We give two examples in this section to which our results can be applied. We also make connections to [15].
Example 1.
Assume that g ( t ) = 1 for 0 < t < 1 . For this function g ( t ) = 1 , the class of prior distributions π ( Θ ) is the form of
π Θ = 2 π m p 2 0 p × p < Λ < I p Λ a + m 2 1 I p Λ m 2 e t r I p Λ 1 Λ Θ Θ 2 d Λ .
This is the same class of prior distributions studied by [15]. If we do the same as in Section 2 of [15], we will obtain the class of Bayes estimators of the form δ π = I p U F 1 Ψ F U X with Ψ F = diag ψ 1 F , , ψ p F , where
ψ i F = f i 0 1 0 1 λ i k = 1 p λ k a + p + m 3 2 e k = 1 p λ k f k 2 d λ 0 1 0 1 k = 1 p λ k a + p + m 3 2 e k = 1 p λ k f k 2 d λ
for i = 1 , , p . We now show the class of Bayes estimators is minimax under the loss function (3). It is sufficient to show that the conditions of Theorem 1 are satisfied, meaning that we should show that g ( t ) = 1 for 0 < t < 1 follows the conditions stated in Lemmas 2, 4 and 5. Note that Lemma 2 states that if function g ( t ) is decreasing in t, then the conditions of Lemma 2 hold. Because g ( t ) = 1 is a constant function, it is decreasing in t. So, the conditions of Lemma 2 hold.
For a > m + p 1 2 ,
lim λ i 0 λ i a + p + m 1 2 = 0
for i = 1 , , p . Also λ i g p 1 k = 1 p λ k g p 1 k = 1 p λ k = 0 and so λ i g p 1 k = 1 p λ k g p 1 k = 1 p λ k is non-decreasing in λ j , j = 1 , , p . For λ = λ 1 , λ 2 , , λ p and λ = λ 1 , λ 2 , , λ p , where 0 < λ i , λ i < 1 , i = 1 , , p , the inequality
g p 1 k = 1 p λ k g p 1 k = 1 p λ k g p 1 k = 1 p λ k λ k g p 1 k = 1 p λ k λ k
holds because g p 1 k = 1 p λ k = 1 . Therefore, g ( · ) satisfies the conditions of Lemma 4. Further, g 1 1 = lim t 1 g ( t ) = 1 < and, if we select α = 0 , then
lim λ i 0 g ( t ) e α t = 1 .
Thus, the conditions of Lemma 5 hold. Now, if a < m 3 p 1 , then based on Theorem 1, the proper Bayes estimators δ π are minimax under loss function (3). Thus, our class of minimax estimators include [15]’s results.
Example 2.
Another class of prior distributions π ( Θ ) can be constructed by taking g ( · ) to be
g t = c e β t , 0 < t < 1 ,
where c > 0 and β > 0 . Then, π Θ will be
π Θ = c 2 π m p 2 0 p × p < Λ < I p e β t r Λ p Λ a + m 2 1 I p Λ m 2 e t r I p Λ 1 Λ Θ Θ 2 d Λ .
If we follow the discussion of Section 2 in order, the Bayes estimators will be of the form δ π = I p U F 1 Ψ F U X and Ψ F = diag ψ 1 F , , ψ p F , with
ψ i F = f i 0 1 0 1 λ i k = 1 p λ k a + p + m 3 2 e k = 1 p λ k β p + f k 2 d λ 0 1 0 1 k = 1 p λ k a + p + m 3 2 e k = 1 p λ k β p + f k 2 d λ
for i = 1 , , p . We show that this class satisfies the conditions of Theorem 1. Based on Theorem 1, it is sufficient to show that the conditions of Lemmas 2, 4 and 5 are satisfied. Lemma 2 states that if g ( t ) is decreasing in t, then Lemma 2 holds. Since c > 0 and β > 0 , g t = c β e β t < 0 for each 0 < t < 1 . Therefore, the condition of Lemma 2 is satisfied.
For λ = λ 1 , λ 2 , , λ p and λ = λ 1 , λ 2 , , λ p , where 0 < λ i , λ i < 1 , i = 1 , , p ,
e β k = 1 p λ k p e β k = 1 p λ k p e β k = 1 p λ k λ k p e β k = 1 p λ k λ k p
and so condition III of Lemma 4 holds. Also, if a > ( m + p 1 ) , then
lim λ i 0 λ i a + m + p 1 2 g p 1 k = 1 p λ k = lim λ i 0 λ i a + m + p 1 2 e β p 1 k = 1 p λ k = 0
for i = 1 , , p , so condition II of Lemma 4 holds. λ i g p 1 k = 1 p λ k g p 1 k = 1 p λ k = c β λ i is a decreasing function in λ j for each j = 1 , , p , so condition I of Lemma 4 holds. We have
g 1 = lim t 1 g ( t ) = lim t 1 c e β t = c e β <
and, if we choose α = β , we obtain
lim t 0 g t e α t = c > 0 .
Hence, the conditions of Lemma 5 hold. Also, if a < m 3 p 1 , then all the conditions of Theorem 1 hold and hence the Bayes estimator δ π obtained with respect to the priors π Θ is minimax under the loss function (3).

4. Concluding Remarks

The problem of estimating the mean matrix Θ of a matrix variate normal distribution with the covariance matrix I p I m under the loss function t r δ Θ δ Θ has been investigated.
This is an invariant problem with respect to the group of orthogonal transformations. We considered the following prior distribution which is invariant under the group of orthogonal transformations.
π Θ = 2 π m p 2 0 p × p < Λ < I p g t r Λ p Λ a + m 2 1 I p Λ m 2 e t r I p Λ 1 Λ Θ Θ 2 d Λ .
Using the invariant discussion, our Bayes estimators are of the form δ π = I p U F 1 Ψ F U X , where Ψ F = diag ψ 1 F , , ψ p F , with
ψ i F = f i 0 1 0 1 λ i g k = 1 p λ k p k = 1 p λ k a + p + m 3 2 e k = 1 p λ k f k 2 d λ 0 1 0 1 g k = 1 p λ k p k = 1 p λ k a + p + m 3 2 e k = 1 p λ k f k 2 d λ
for i = 1 , , p .
In this paper, we have obtained the conditions for the continuous function g ( · ) such that the resulting Bayes estimators are minimax under the given loss function. If we want to make a comparison between the results of [15] and our results in this paper, it should be said that the difference is only in g ( · ) . [15] showed that if g ( t ) = 1 , 0 < t < 1 , then the resulting Bayes estimators are minimax under the given loss function. We obtained conditions on g ( t ) such that the resulting class of Bayes estimators is minimax. We also showed that the function g obtained by [15] applies to the conditions obtained in this paper, so the results of our paper include [15]’s results. Also, we presented another example in such a way that if g ( t ) = c e β t , 0 < t < 1 , where c , β > 0 , then the conditions obtained in the paper are valid and the resulting Bayes estimators are minimax under (3). Hence, we have obtained a larger class of Bayes estimators which include the class of Bayes estimators obtained by [15].
The estimator proposed in this paper being minimax and admissible could lead to improved inference in various application areas of the matrix variate normal distribution, including analyses of multiple vector autoregressions; brain connectivity alternation detection; capacity for severely fading MIMO channels; integrated principal components analyses; determination of relationships between incidence and mortality of asthma and PM2.5, ozone, and household air pollution; autism spectrum disorder identification; and identification of depression disorder using multi-view high-order brain function networks. We provide two examples:
  • Example 1 —Suppose that there are three mines in one area and the owner of all three mines is the same. Suppose the owner wants to know how much gold, copper, zinc, aluminum, bronze, and iron can be extracted per kilogram of ore in each mine. They want the authorities to randomly extract one kilogram of ore from each mine in one day n certain times and determine the amount of the metals in a laboratory:
gold copper zinc aluminum bronze iron mine 1 X 1 , 1 X 1 , 2 X 1 , 3 X 1 , 4 X 1 , 5 X 1 , 6 mine 2 X 2 , 1 X 2 , 2 X 2 , 3 X 2 , 4 X 2 , 5 X 2 , 6 mine 3 X 3 , 1 X 3 , 2 X 3 , 3 X 3 , 4 X 3 , 5 X 3 , 6
They are faced with the following matrix of variables
X = X 1 , 1 X 2 , 1 X 3 , 1 X 1 , 2 X 2 , 2 X 3 , 2 X 1 , 3 X 2 , 3 X 3 , 3 X 1 , 4 X 2 , 4 X 3 , 4 X 1 , 5 X 2 , 5 X 3 , 5 X 1 , 6 X 2 , 6 X 3 , 6 .
Based on previous experience, they know that the amount of each metal extracted from each mine is independent of other mines and the amount of other metals. They also know that the amount of metal extracted from each kilogram of ore has a small dispersion and that the amount of each metal from each mine has a normal distribution. Our results in this paper can be used to estimate the means of the metals extracted.
  • Example 2—Suppose that a researcher wants to investigate the effect of the number of study hours (3 or 4 h) on the progress of three students in four subjects: mathematics, history, art, and geography. They choose four classmates at random and ask them to spend 3 h per week studying each subject for half of a semester and 4 h per week for the other half of the same semester. They want to observe the results per student as a random matrix as follows:
Information   of   student k Mathematics   score History   score Art   score Geography   score 3 h X 1 , 1 , k X 1 , 2 , k X 1 , 3 , k X 1 , 4 , k 4 h X 2 , 1 , k X 2 , 2 , k X 2 , 3 , k X 2 , 4 , k
Suppose the numerical results are
x 1 = 16.5 13.75 18.75 17.75 17.5 12.25 18.5 19.5 , x 2 = 17.2 15.75 19.25 18.25 17 14.85 19.5 18.5 , x 3 = 15.25 15.75 17.25 18.25 16.5 15.85 17.5 18.75 , x 4 = 15.25 14.75 17.25 18.75 15.5 13.85 16.5 19.25 .
The researcher previously performed similar tests in other schools; it was found that the number of study hours (3 or 4) has no effect, the rates of progress in each course are independent from each other, and each variable of this random matrix has a normal distribution. If they want to estimate the mean of the matrix variate normal distribution, our results can be used.
A future study could be deriving explicit expressions for moments of Θ X . These may be obtained using the results of [24].

Author Contributions

Conceptualization, S.Z. and S.N.; methodology, S.Z. and S.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

The authors would like to thank the editor and the three referees for careful reading and comments which greatly improved the paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Wichitaksorn, N. Analyzing multiple vector autoregressions through matrix-variate normal distribution with two covariance matrices. Commun. Stat.—Theory Methods 2019, 49, 1801–1817. [Google Scholar] [CrossRef]
  2. Xia, Y.; Li, L.X. Matrix graph hypothesis testing and application in brain connectivity alternation detection. Stat. Sin. 2019, 29, 303–328. [Google Scholar] [CrossRef]
  3. Ferreira, J.T. Upper bounds for the capacity for severely fading MIMO channels under a scale mixture assumption. Entropy 2021, 23, 845. [Google Scholar] [CrossRef]
  4. Tang, T.M.; Allen, G.I. Integrated principal components analysis. J. Mach. Learn. Res. 2021, 22, 1–71. [Google Scholar]
  5. Ahmadi, F.; Fallah, Z.; Shadmani, F.K.; Allahmoradi, M.; Salahshoor, P.; Ahmadi, S.; Mansori, K. Relationship between incidence and mortality of asthma with PM2.5, ozone, and household air pollution from 1990 to 2106 in the world: An ecological study. Egypt. J. Chest Dis. Tuberc. 2022, 71, 457–463. [Google Scholar]
  6. Jiang, X.; Zhou, Y.Y.; Zhang, Y.N.; Zhang, L.M.; Qiao, L.S.; De Leone, R. Estimating high-order brain functional networks in Bayesian view for autism spectrum disorder identification. Front. Neurosci. 2022, 16, 872848. [Google Scholar] [CrossRef]
  7. Zhao, F.; Gao, T.Y.; Cao, Z.; Chen, X.B.; Mao, Y.Y.; Mao, N.; Ren, Y.D. Identifying depression disorder using multi-view high-order brain function network derived from electroencephalography signal. Front. Comput. Neurosci. 2022, 16, 1046310. [Google Scholar] [CrossRef]
  8. Efron, B.; Morris, C. Empirical Bayes on vector observations: An extension of Stein’s method. Biometrika 1972, 59, 335–347. [Google Scholar] [CrossRef]
  9. Stein, C. Estimation of the mean of a multivariate normal distribution. In Proceedings of the Prague Symposium on Asymptotic Statistics, Prague, Czech Republic, 3–6 September 1973; pp. 345–381. [Google Scholar]
  10. Zhang, Z. On estimation of matrix of normal mean. J. Multivar. Anal. 1986, 18, 70–82. [Google Scholar] [CrossRef]
  11. Baranchik, A.J. A family of minimax estimators of the mean of a multivariate normal distribution. Ann. Math. Stat. 1970, 41, 642–645. [Google Scholar] [CrossRef]
  12. Bilodeau, M.; Kariya, T. Minimax estimators in the normal MANOVA model. J. Multivar. Anal. 1989, 28, 260–270. [Google Scholar] [CrossRef]
  13. Konno, Y. On estimation of a matrix of normal means with unknown covariance matrix. J. Multivar. Anal. 1991, 36, 44–55. [Google Scholar] [CrossRef]
  14. Haff, R.L. An identity for the Wishart distribution with applications. J. Multivar. Anal. 1979, 9, 531–544. [Google Scholar] [CrossRef]
  15. Tsukuma, H. Admissibility and minimaxity of Bayes estimators for a normal mean matrix. J. Multivar. Anal. 2008, 99, 2251–2264. [Google Scholar] [CrossRef]
  16. Tsukuma, H. Generalized Bayes minimax estimation of the normal mean matrix with unknown covariance matrix. J. Multivar. Anal. 2009, 100, 2296–2304. [Google Scholar] [CrossRef]
  17. Zinodiny, S.; Strawderman, W.E.; Parsian, A. Bayes minimax estimation of the multivariate normal mean vector for the case of common unknown variance. J. Multivar. Anal. 2011, 102, 1256–1262. [Google Scholar] [CrossRef]
  18. Zinodiny, S.; Rezaei, S.; Arjmand, O.N.; Nadarajah, S. A new class of Bayes minimax estimators of the normal mean matrix for the case of common unknown variances. Statistics 2017, 51, 1082–1094. [Google Scholar] [CrossRef]
  19. Tsukuma, H. Shrinkage priors for Bayesian estimation of the mean matrix in an elliptically contoured distribution. J. Multivar. Anal. 2010, 101, 1483–1492. [Google Scholar] [CrossRef]
  20. Tsukuma, H. Proper Bayes minimax estimators of the normal mean matrix with common unknown variances. J. Stat. Plan. Inference 2010, 140, 2596–2606. [Google Scholar] [CrossRef]
  21. Faith, E.R. Minimax Bayes estimators of a multivariate normal mean. J. Multivar. Anal. 1978, 8, 372–379. [Google Scholar] [CrossRef]
  22. Gupta, A.K.; Nagar, D. Matrix Variate Distributions; Chapman and Hall/CRC: London, UK, 1999. [Google Scholar]
  23. Fortuin, C.M.; Kasteleyn, P.W.; Ginibre, J. Correlation inequalities on some partially ordered sets. Commun. Math. Phys. 1971, 22, 89–103. [Google Scholar] [CrossRef]
  24. Mathai, A.M.; Provost, S.B.; Haubold, H.J. Multivariate Statistical Analysis in the Real and Complex Domains; Springer: New York, NY, USA, 2022. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zinodiny, S.; Nadarajah, S. A New Class of Bayes Minimax Estimators of the Mean Matrix of a Matrix Variate Normal Distribution. Mathematics 2024, 12, 1098. https://doi.org/10.3390/math12071098

AMA Style

Zinodiny S, Nadarajah S. A New Class of Bayes Minimax Estimators of the Mean Matrix of a Matrix Variate Normal Distribution. Mathematics. 2024; 12(7):1098. https://doi.org/10.3390/math12071098

Chicago/Turabian Style

Zinodiny, Shokofeh, and Saralees Nadarajah. 2024. "A New Class of Bayes Minimax Estimators of the Mean Matrix of a Matrix Variate Normal Distribution" Mathematics 12, no. 7: 1098. https://doi.org/10.3390/math12071098

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop