A New Class of Bayes Minimax Estimators of the Mean Matrix of a Matrix Variate Normal Distribution

Zinodiny, Shokofeh; Nadarajah, Saralees

doi:10.3390/math12071098

Open AccessArticle

A New Class of Bayes Minimax Estimators of the Mean Matrix of a Matrix Variate Normal Distribution

by

Shokofeh Zinodiny

¹ and

Saralees Nadarajah

^2,*

¹

Department of Mathematics, Amirkabir University of Technology, Tehran 15916-34311, Iran

²

Department of Mathematics, University of Manchester, Manchester M13 9PL, UK

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(7), 1098; https://doi.org/10.3390/math12071098

Submission received: 4 January 2024 / Revised: 24 February 2024 / Accepted: 4 April 2024 / Published: 5 April 2024

Download Review Reports Versions Notes

Abstract

:

Bayes minimax estimation is important because it provides a robust approach to statistical estimation that considers the worst-case scenario while incorporating prior knowledge. In this paper, Bayes minimax estimation of the mean matrix of a matrix variate normal distribution is considered under the quadratic loss function. A large class of (proper and generalized) Bayes minimax estimators of the mean matrix is presented. Two examples are given to illustrate the class of estimators, showing, among other things, that the class includes classes of estimators presented by Tsukuma.

Keywords:

Bayes estimation; matrix variate normal distribution; mean matrix; minimax estimation

MSC:

62E99

1. Introduction

Let

X = (x_{i, j})

be a

p \times m

matrix random variable with a matrix variate normal distribution with mean matrix

Θ = (θ_{i, j})

and covariance matrix

I_{p} \otimes I_{m}

, where

I_{k}

is the

k \times k

identity matrix and ⊗ denotes the Kronecker product.

The matrix variate normal distribution finds applications across various fields, including multivariate statistical analysis, machine learning, and signal processing. In multivariate statistical analysis, it serves as a fundamental tool for modeling covariance structures in datasets where the observations are matrices, such as in longitudinal studies or multivariate time series analysis. In machine learning, it is utilized for modeling complex dependencies among high-dimensional data, particularly in tasks involving matrix-valued inputs or outputs, such as in recommender systems or tensor factorization. Moreover, in signal processing, the matrix variate normal distribution is employed for modeling the joint distribution of multiple correlated signals or images, enabling efficient estimation and inference in applications such as array processing or medical imaging.

Some recent applications of the matrix variate normal distribution include analysis of multiple vector autoregressions [1]; brain connectivity alternation detection [2]; capacity for severely fading MIMO channels [3]; integrated principal components analysis [4]; determination of the relationship between incidence and mortality of asthma with PM2.5, ozone, and household air pollution [5]; autism spectrum disorder identification [6]; and identification of depression disorder using multi-view high-order brain function networks [7], to mention just a few.

Bayesian minimax estimation is a statistical approach that combines Bayesian inference with minimax decision theory. In traditional Bayesian inference, we use prior knowledge and observed data to update our beliefs about the parameters of interest. Minimax decision theory, on the other hand, focuses on minimizing the maximum possible loss (risk) that can occur under different parameter values.

In Bayesian minimax estimation, we seek an estimator that minimizes the maximum possible posterior expected loss, where the expectation is taken with respect to the posterior distribution of the parameter given the observed data. This approach is particularly useful when there is uncertainty about the true parameter value and when it is important to protect against worst-case scenarios.

There has not been much work on Bayesian estimation of the parameters of the matrix variate normal distribution. Ref. [8] extended the so-called Stein effect and proposed an empirical Bayes estimator, outperforming the maximum likelihood estimator,

X

, for the case

m > p + 1

. Since then, many classes of minimax estimators better than the maximum likelihood estimator have been found. Ref. [9] derived a large class of unbiased risk estimators, including a class of minimax estimators obtained by [8]. Using the result of Stein, ref. [10] extended the results of [11] to the multivariate case. For the case of

Σ \otimes I_{m}

, where

Σ

is an unknown positive definite matrix and

p > m + 1

, ref. [12] introduced a class of minimax estimators containing those of [8]. Ref. [13] derived a large class of minimax estimators using the Stein identity and the Haff identity [14] for the case

m > p + 1

. For the case of

Σ = I_{m}

, Ref. [15] found orthogonally invariant hierarchical priors, resulting in Bayes estimators that are admissible and minimax. For the case of an unknown covariance matrix, Ref. [16] obtained a generalized Bayes class of minimax estimators of the mean matrix for

m > p + 1

,

p > m + 1

. Ref. [17] obtained Bayes minimax estimators of the mean for the case of common unknown variance. Ref. [18] obtained Bayes minimax estimators of the normal mean matrix for the case of common unknown variances.

For the problem of estimating the mean matrix of an elliptically contoured distribution, Ref. [19] derived generalized Bayes minimax estimators for the mean matrix; ref. [20] also obtained a class of minimax estimators for the mean matrix, which was used to find a class of proper Bayes minimax estimators of

Θ

.

In this paper, we derive a large class of (proper and generalized) Bayes minimax estimators of

Θ

containing estimators of [15] as a special case. In fact, we extend the results of [21] to the multivariate case. The main result, giving a large class of (proper and generalized) Bayes minimax estimators, is developed in Section 2. Section 3 considers two examples of classes of (proper and generalized) Bayes estimators. In particular, example 1 demonstrates a result from [15]. Some concluding remarks are given in Section 4.

Throughout this paper, let

|A|

,

tr (A)

and

A^{'}

denote, respectively, the determinant, trace and transpose of a matrix

A

. Also for

A

and

B

, let

B < A

mean that

A - B

is positive definite.

2. A Class of Bayes Minimax Estimators of the Mean Matrix

Let

N_{p \times m} (Θ, I_{p} \otimes I_{m})

denote the matrix variate normal distribution with mean matrix

Θ

and covariance matrix

I_{p} \otimes I_{m}

. Assume that

X \sim N_{p \times m} (Θ, I_{p} \otimes I_{m})

. Assume also that

\begin{matrix} Θ | Λ \sim N_{p \times m} (0_{p \times m}, Λ^{- 1} (I_{p} - Λ) \otimes I_{m}), Λ \sim {|Λ|}^{\frac{a}{2} - 1} g (\frac{t r (Λ)}{p}), 0_{p \times p} < Λ < I_{p}, a > - m, \end{matrix}

(1)

where

Λ = (λ_{i, j})

is a

p \times p

matrix distributed as g, and g is a differentiable positive function on

(0, 1)

. In addition, assume g is such that

\begin{matrix} π (Θ) = {(2 π)}^{- \frac{p m}{2}} \int_{0_{p \times p} < Λ < I_{p}} g (\frac{t r (Λ)}{p}) {|Λ|}^{\frac{a + m}{2} - 1} {|I_{p} - Λ|}^{- \frac{m}{2}} e^{- \frac{t r ({(I_{p} - Λ)}^{- 1} Λ Θ Θ^{'})}{2}} d Λ, \end{matrix}

(2)

where

d Λ = \land_{i \leq j}^{p} d λ_{i, j}

. Note that (2) will be proper if g is integrable on its domain.

The purpose of this section is to construct generalized (and proper) Bayes minimax estimators of

Θ

under the loss function

\begin{matrix} L (δ; Θ) = t r ((δ - Θ) {(δ - Θ)}^{'}) . \end{matrix}

(3)

The following lemmas give sufficient conditions on g and a such that the generalized (or proper) Bayes estimators with respect to (2) are minimax.

Let

O_{p}

be the set of orthogonal matrices of order p. Let

V_{m, p} = \{V \in R^{m \times p}, V^{'} V = I_{p}\}

, where

m \geq p

. Write

X

as

{ULV}^{'}

, where

U \in O_{p}

,

V \in V_{m, p}

and

L = diag (l_{1}, l_{2}, \dots, l_{p})

with

l_{1} > l_{2} > \dots > l_{p} > 0

.

Lemma 1.

For

i = 1, \dots, p

, write

ϕ_{i} = ϕ_{i} (F)

and

F = diag (f_{1}, \dots, f_{p}) = L^{2}

. The risk of a shrinkage equivariant estimator

δ = UL (I_{p} - Φ (F)) V^{'}

is

\begin{matrix} R (δ; Θ) = m p + E [\sum_{i = 1}^{p} \{f_{i} ϕ_{i}^{2} - 2 (m - p + 1) ϕ_{i} - 4 f_{i} \frac{\partial ϕ_{i}}{\partial f_{i}} - 4 \sum_{j > i} \frac{f_{i} ϕ_{i} - f_{j} ϕ_{j}}{f_{i} - f_{j}}\}] \end{matrix}

(4)

provided each expectation exists.

Proof.

See [9].

If

Φ (F) = F^{- 1} Ψ (F)

, where

Ψ (F) = diag (ψ_{1} (F), \dots, ψ_{p} (F))

, then by replacing

ϕ_{i}

by

\frac{ψ_{i}}{f_{i}}

, (4) can be written as

\begin{matrix} R (δ; Θ) = m p + E [\sum_{i = 1}^{p} \{\frac{ψ_{i}^{2}}{f_{i}} - 2 (m - p - 1) \frac{ψ_{i}}{f_{i}} - 4 f_{i} (\frac{\frac{\partial ψ_{i}}{\partial f_{i}}}{f_{i}}) - 4 \sum_{j > i} \frac{ψ_{i} - ψ_{j}}{f_{i} - f_{j}}\}] . \end{matrix}

(5)

Using (5), we obtain Corollary 1. □

Corollary 1.

Suppose

\begin{matrix} δ = (I_{p} - U F^{- 1} Ψ (F) U^{'}) X, \end{matrix}

(6)

where

F^{- 1} = diag (f_{1}^{- 1}, f_{2}^{- 1}, \dots, f_{p}^{- 1})

. Then,

δ

is minimax under (3) if

I.: For any i, $ψ_{i}$ is non-decreasing with respect to $f_{i}$ ;
II.: $0 \leq ψ_{p} \leq ψ_{p - 1} \leq \dots \leq ψ_{1} \leq 2 (m - p - 1)$ .

We give conditions on g and a for obtaining generalized (proper) Bayes estimators of the form (6) such that the resulting estimators satisfy the conditions of Corollary 1, and hence are minimax. Note that the conditional distribution of

Θ

given

X

,

Λ

is

N_{p \times m} ((I_{p} - Λ) X, (I_{p} - Λ) \otimes I_{m})

. Therefore, the generalized Bayes estimator of

Θ

with respect to (2) under (3) is (see [15])

\begin{matrix} δ_{π} (X) = E [Θ |X] = E [E [Θ | X, Λ] | X] = (I_{p} - E [Λ | X]) X . \end{matrix}

(7)

E [Λ | X]

denotes expectation with respect to the posterior distribution of

Λ

, that is

\begin{matrix} p (Λ | X) \propto g (\frac{t r (Λ)}{p}) {|Λ|}^{\frac{a + m}{2} - 1} e^{- \frac{t r (Λ {XX}^{'})}{2}} I (0_{p \times p} < Λ < I_{p}), \end{matrix}

(8)

so the resulting estimator

δ_{π} (X)

can be written as

δ_{π} (X) = (I_{p} - E [Λ | X]) X

, where

\begin{matrix} E [Λ | X] = \frac{\int_{0_{p \times p} < Λ < I_{p}} Λ g (\frac{t r (Λ)}{p}) {|Λ|}^{\frac{a + m}{2} - 1} e^{- \frac{t r (Λ {XX}^{'})}{2}} d Λ}{\int_{0_{p \times p} < Λ < I_{p}} g (\frac{t r (Λ)}{p}) {|Λ|}^{\frac{a + m}{2} - 1} e^{- \frac{t r (Λ {XX}^{'})}{2}} d Λ} . \end{matrix}

(9)

Now, using

X = {ULV}^{'}

and letting

Λ \to U Λ U^{'}

,

\begin{matrix} E [Λ | X] = U \frac{\int_{0_{p \times p} < Λ < I_{p}} Λ g (\frac{t r (Λ)}{p}) {|Λ|}^{\frac{a + m}{2} - 1} e^{- \frac{t r (Λ F)}{2} d Λ}}{\int_{0_{p \times p} < Λ < I_{p}} g (\frac{t r (Λ)}{p}) {|Λ|}^{\frac{a + m}{2} - 1} e^{- \frac{t r (Λ F)}{2} d Λ}} U^{'} . \end{matrix}

(10)

So, we have

δ_{π} = (I_{p} - U Φ (F) U^{'}) X

, where

\begin{matrix} Φ (F) = \frac{\int_{0_{p \times p} < Λ < I_{p}} Λ g (\frac{t r (Λ)}{p}) {|Λ|}^{\frac{a + m}{2} - 1} e^{- \frac{t r (Λ F)}{2} d Λ}}{\int_{0_{p \times p} < Λ < I_{p}} g (\frac{t r (Λ)}{p}) {|Λ|}^{\frac{a + m}{2} - 1} e^{- \frac{t r (Λ F)}{2} d Λ}} . \end{matrix}

(11)

The estimation problem discussed in this paper is invariant with respect to

X \to PXQ

and

Θ \to P Θ Q

for any

P \in O_{p}

and

Q \in O_{m}

. Also, (2) is orthogonally invariant, namely

\begin{matrix} π (Θ) = π (P Θ Q) . \end{matrix}

(12)

For every

P \in O_{p}

and

Q \in O_{m}

, according to Lemma 1 in [15],

Φ (F)

is a diagonal matrix, say

Φ (F) = diag (ϕ_{1} (F), \dots, ϕ_{p} (F))

. Also,

δ_{π} = (I_{p} - U Φ (F) U^{'}) X

, where

Φ = F^{- 1} Ψ (F)

and the resulting

δ_{π}

is of the form (4) with

Ψ (F) = diag (ψ_{1}, ψ_{2}, \dots, ψ_{p})

and

\begin{matrix} ψ_{i} (F) = f_{i} \frac{\int_{0_{p \times p} < Λ < I_{p}} λ_{i, i} g (\frac{t r (Λ)}{p}) {|Λ|}^{\frac{a + m}{2} - 1} e^{- \frac{t r (Λ F)}{2}} d Λ}{\int_{0_{p \times p} < Λ < I_{p}} g (\frac{t r (Λ)}{p}) {|Λ|}^{\frac{a + m}{2} - 1} e^{- \frac{t r (Λ F)}{2}} d Λ} . \end{matrix}

(13)

Now, let

λ_{k} = λ_{k, k}

for

k = 1, \dots, p

and

λ_{k, l} = γ_{k, l} \sqrt{λ_{k, k} λ_{l, l}}

for

k < l

. The Jacobian of this transformation is

\begin{matrix} J ((λ_{1, 1}, \dots, λ_{p, p}, λ_{1, 2}, \dots, λ_{p - 1, p}) \to (λ_{1}, \dots, λ_{p}, γ_{1, 2}, \dots, γ_{p - 1, p})) = \prod_{k = 1}^{p} λ_{k}^{\frac{p - 1}{2}} . \end{matrix}

(14)

It holds that

|Λ| = |Γ| \prod_{k = 1}^{p} λ_{k}

, where

Γ = (γ_{k, l})

is a

p \times p

positive definite matrix with

γ_{k, k} = 1

. Denoting

d Γ = \land_{k < l} d γ_{k, l}

and

d λ = \land_{k = 1}^{p} d λ_{k}

, we can write

ψ_{i}

as

\begin{matrix} ψ_{i} (F) = f_{i} \frac{\int_{0_{p \times p} < Γ < I_{p}} \int_{0}^{1} \dots \int_{0}^{1} λ_{i} g (p^{- 1} \sum_{k = 1}^{p} λ_{k}) {|Γ|}^{\frac{a + m}{2} - 1} (\prod_{k = 1}^{p} λ_{k}^{\frac{a + p + m - 3}{2}}) e^{- \frac{\sum_{k = 1}^{p} λ_{k} f_{k}}{2}} d λ d Γ}{\int_{0_{p \times p} < Γ < I_{p}} \int_{0}^{1} \dots \int_{0}^{1} g (p^{- 1} \sum_{k = 1}^{p} λ_{k}) {|Γ|}^{\frac{a + m}{2} - 1} (\prod_{k = 1}^{p} λ_{k}^{\frac{a + p + m - 3}{2}}) e^{- \frac{\sum_{k = 1}^{p} λ_{k} f_{k}}{2}} d λ d Γ} . \end{matrix}

(15)

Note that

\int_{0_{p \times p} < Γ < I_{p}} {|Γ|}^{(\frac{a + m}{2}) - 1} d Γ

is finite for

a > - m

(see, for example, Theorem 1.4.5 on page 22 of [22]). Then, we can write

\begin{matrix} ψ_{i} (F) = f_{i} \frac{\int_{0}^{1} \dots \int_{0}^{1} λ_{i} g (p^{- 1} \sum_{k = 1}^{p} λ_{k}) (\prod_{k = 1}^{p} λ_{k}^{\frac{a + p + m - 3}{2}}) e^{- \frac{\sum_{k = 1}^{p} λ_{k} f_{k}}{2}} d λ}{\int_{0}^{1} \dots \int_{0}^{1} g (p^{- 1} \sum_{k = 1}^{p} λ_{k}) (\prod_{k = 1}^{p} λ_{k}^{\frac{a + p + m - 3}{2}}) e^{- \frac{\sum_{k = 1}^{p} λ_{k} f_{k}}{2}} d λ} . \end{matrix}

(16)

Lemma 2 shows that

0 \leq ψ_{p} \leq ψ_{p - 1} \leq \dots \leq ψ_{1}

.

Lemma 2.

If

g (t)

is a decreasing function in t, then

0 \leq ψ_{p} \leq ψ_{p - 1} \leq \dots \leq ψ_{1}

.

Proof.

We show that

ψ_{i} - ψ_{j} \geq 0

for

j > i

. The proof is similar to the proof of part (iv) of Lemma 3.1 in [20]. By using the transformation

y_{k} = \frac{λ_{k}}{f_{k}}

,

k = 1, \dots, p

with the Jacobian

J ((λ_{1}, \dots, λ_{p}) \mapsto (y_{1}, \dots, y_{p})) = \prod_{k = 1}^{p} f_{k}^{- 1}

, (16) can be written as

\begin{matrix} ψ_{i} = \frac{\int_{0}^{f_{1}} \dots \int_{0}^{f_{p}} y_{i} g (p^{- 1} \sum_{k = 1}^{p} \frac{y_{k}}{f_{k}}) (\prod_{k = 1}^{p} y_{k}^{\frac{a + p + m - 3}{2}}) e^{- \frac{\sum_{k = 1}^{p} y_{k}}{2}} d y_{p} \dots d y_{1}}{\int_{0}^{f_{1}} \dots \int_{0}^{f_{p}} g (p^{- 1} \sum_{k = 1}^{p} \frac{y_{k}}{f_{k}}) (\prod_{k = 1}^{p} y_{k}^{\frac{a + p + m - 3}{2}}) e^{- \frac{\sum_{k = 1}^{p} y_{k}}{2}} d y_{p} \dots d y_{1}} \end{matrix}

(17)

for

i = 1, \dots, p

. For

j > i

, we can write

\begin{matrix} ψ_{i} - ψ_{j} = \frac{\int_{0}^{f_{1}} \dots \int_{0}^{f_{p}} (y_{i} - y_{j}) g (p^{- 1} t r (Y F^{- 1})) {|Y|}^{\frac{a + p + m - 3}{2}} e^{- \frac{t r (Y)}{2}} d Y}{\int_{0}^{f_{1}} \dots \int_{0}^{f_{p}} g (p^{- 1} t r (Y F^{- 1})) {|Y|}^{\frac{a + p + m - 3}{2}} e^{- \frac{t r (Y)}{2}} d Y}, \end{matrix}

(18)

where

Y = diag (y_{1}, y_{2}, \dots, y_{p})

. □

In order to prove

ψ_{i} - ψ_{j} \geq 0

for every

j > i

, without any loss of generality, it is enough to show that the following function is non-negative

\begin{matrix} L (F) = \int_{0}^{f_{1}} \dots \int_{0}^{f_{p}} (y_{1} - y_{2}) g (p^{- 1} t r (Y F^{- 1})) {|Y|}^{\frac{a + p + m - 3}{2}} e^{- \frac{t r (Y)}{2}} d Y . \end{matrix}

(19)

Now, let

O_{1, 2}

denote the

p \times p

permutation matrix which interchanges the first and second rows by letting

Y \to O_{1, 2} Y

. Let

W = O_{1, 2} Y O_{1, 2}

. The Jacobian of the transformation is

J (Y \mapsto W) = 1

because

O_{1, 2} = O_{1, 2}^{'} = O_{1, 2}^{- 1}

and

W = diag (w_{1}, w_{2}, \dots, w_{p})

. We can rewrite (19) as

\begin{matrix} L (F) = \int_{0}^{f_{1}} \dots \int_{0}^{f_{p}} (w_{2} - w_{1}) g (p^{- 1} (\frac{w_{1}}{f_{2}} + \frac{w_{2}}{f_{1}} + \sum_{k \neq 1, 2}^{p} \frac{w_{k}}{f_{k}})) {|W|}^{\frac{a + p + m - 3}{2}} e^{- \frac{t r (W)}{2}} d W . \end{matrix}

(20)

Note that we can replace

w_{k}

’s with

y_{k}

’s in (20); its value does not change, meaning

\begin{matrix} L (F) = \int_{0}^{f_{1}} \dots \int_{0}^{f_{p}} (y_{2} - y_{1}) g (p^{- 1} (\frac{y_{1}}{f_{2}} + \frac{y_{2}}{f_{1}} + \sum_{k \neq 1, 2}^{p} \frac{y_{k}}{f_{k}})) {|W|}^{\frac{a + p + m - 3}{2}} e^{- \frac{t r (Y)}{2}} d Y . \end{matrix}

(21)

Combining (19) and (21) yields

\begin{matrix} 2 L (F) = \int_{0}^{f_{1}} \dots \int_{0}^{f_{p}} (y_{1} - y_{2}) (g (p^{- 1} \sum_{k = 1}^{p} \frac{y_{k}}{f_{k}}) - g (p^{- 1} (\frac{y_{1}}{f_{2}} + \frac{y_{2}}{f_{1}} + \sum_{k \neq 1, 2}^{p} \frac{y_{k}}{f_{k}}))) {|Y|}^{\frac{a + p + m - 3}{2}} e^{- \frac{t r (Y)}{2}} d Y . \end{matrix}

(22)

Note that we have two cases. One case is

y_{1} \geq y_{2}

and the other case is

y_{1} < y_{2}

. If

y_{1} \geq y_{2}

, since

f_{1} > f_{2}

, then

\begin{matrix} y_{1} (\frac{1}{f_{1}} - \frac{1}{f_{2}}) \leq y_{2} (\frac{1}{f_{1}} - \frac{1}{f_{2}}) \end{matrix}

(23)

which implies

\begin{matrix} p^{- 1} (\frac{y_{1}}{f_{1}} + \frac{y_{2}}{f_{2}} + \sum_{k \neq 1, 2}^{p} \frac{y_{k}}{f_{k}}) = p^{- 1} (\sum_{k = 1}^{p} \frac{y_{k}}{f_{k}}) \leq p^{- 1} (\frac{y_{1}}{f_{2}} + \frac{y_{2}}{f_{1}} + \sum_{k \neq 1, 2}^{p} \frac{y_{k}}{f_{k}}) . \end{matrix}

(24)

Since

g (\cdot)

is a decreasing function,

\begin{matrix} g (p^{- 1} \sum_{k = 1}^{p} \frac{λ_{k}}{f_{k}}) \geq g (p^{- 1} (\frac{λ_{1}}{f_{2}} + \frac{λ_{2}}{f_{1}} + \sum_{k \neq 1, 2}^{p} \frac{λ_{k}}{f_{k}})), \end{matrix}

(25)

so we have

\begin{matrix} (y_{1} - y_{2}) (g (p^{- 1} \sum_{k = 1}^{p} \frac{λ_{k}}{f_{k}}) - g (p^{- 1} (\frac{λ_{1}}{f_{2}} + \frac{λ_{2}}{f_{1}} + \sum_{k \neq 1, 2}^{p} \frac{λ_{k}}{f_{k}}))) \geq 0 . \end{matrix}

(26)

Hence,

L (F) \geq 0

for the case

y_{1} \geq y_{2}

. For the case

y_{1} \leq y_{2}

, we can similarly show

L (F) \geq 0

. It can be proven similarly that

ψ_{i} - ψ_{j} \geq 0

for every

j > i

and hence

ψ_{p} \leq ψ_{p - 1} \leq \dots \leq ψ_{1}

. Clearly,

ψ_{p} \geq 0

, and hence

0 \leq ψ_{p} \leq ψ_{p - 1} \leq \dots \leq ψ_{1}

.

We need Lemma 3 to continue.

Lemma 3.

Let ζ denote a probability density function with respect to a σ-finite measure υ on

R^{p}

. For any two points,

λ = {(λ_{1}, \dots, λ_{p})}^{'}

and

μ = {(μ_{1}, \dots, μ_{p})}^{'}

, define

λ \land μ = {(min (λ_{1}, μ_{1}), \dots, min (λ_{p}, μ_{p}))}^{'}

and

λ \lor μ = {(max (λ_{1}, μ_{1}), \dots, max (λ_{p}, μ_{p}))}^{'}

. Suppose ζ satisfies

\begin{matrix} ζ (λ) ζ (μ) \leq ζ (λ \land μ) ζ (λ \lor μ) . \end{matrix}

(27)

If functions f and g are non-decreasing in each argument and if f, g and

f g

are integrable with respect to ζ, then

\begin{matrix} \int f (λ) g (λ) ζ (λ) d υ (λ) \geq \int f (λ) ζ (λ) d υ (λ) \int g (λ) ζ (λ) d υ (λ) . \end{matrix}

(28)

Proof.

See [23]. □

Lemma 4 gives conditions for

ψ_{i} (f_{1}, f_{2}, \dots, f_{i - 1}, f_{i}, f_{i + 1}, \dots, f_{p})

to be non-decreasing in

f_{i}, i = 1, \dots, p

for fixed

f_{1}, f_{2}, \dots, f_{i - 1}, f_{i + 1}, \dots, f_{p}

.

Lemma 4.

Suppose g satisfies

I.: $lim_{λ_{i} \to 0} λ_{i}^{\frac{a + m + p - 1}{2}} g (p^{- 1} \sum_{k = 1}^{p} λ_{k}) = 0$ for $i = 1, \dots, p$ ;
II.: For $i = 1, \dots, p$ , $\frac{λ_{i} g^{'} (p^{- 1} \sum_{k = 1}^{p} λ_{k})}{g (p^{- 1} \sum_{k = 1}^{p} λ_{k})}$ is non-increasing in $λ_{j}$ , $j = 1, \dots, p$ ;
III.: For $λ = (λ_{1}, λ_{2}, \dots, λ_{p})$ and $λ^{'} = (λ_{1}^{'}, λ_{2}^{'}, \dots, λ_{p}^{'})$ , where $0 < λ_{i}, λ_{i}^{'} < 1$ , $i = 1, \dots, p$ , $g (\cdot)$ satisfies

$\begin{matrix} g (p^{- 1} \sum_{k = 1}^{p} λ_{k}) g (p^{- 1} \sum_{k = 1}^{p} λ_{k}^{'}) \leq g (p^{- 1} \sum_{k = 1}^{p} (λ_{k} \lor λ_{k}^{'})) g (p^{- 1} \sum_{k = 1}^{p} (λ_{k} \land λ_{k}^{'})) . \end{matrix}$

(29)

Then, for any i,

ψ_{i}

is non-decreasing with respect to

f_{i}

.

Proof.

We can write (16) as

\begin{matrix} ψ_{i} = f_{i} \frac{Φ_{1} (F)}{Φ_{0} (F)}, \end{matrix}

(30)

where

\begin{matrix} Φ_{k} (F) = \int_{0}^{1} \dots \int_{0}^{1} λ_{i}^{k} g (p^{- 1} \sum_{k = 1}^{p} λ_{k}) (\prod_{k = 1}^{p} λ_{k}^{\frac{a + p + m - 3}{2}}) e^{- \frac{\sum_{k = 1}^{p} λ_{k} f_{k}}{2}} d λ \end{matrix}

(31)

for

k = 0, 1

. We have

\begin{matrix} \frac{\partial ψ_{i}}{\partial f_{i}} = \frac{Φ_{1} (F)}{Φ_{0} (F)} + \frac{f_{i}}{Φ_{0}^{2} (F)} (\frac{\partial Φ_{1} (F)}{\partial f_{i}} Φ_{0} (F) - \frac{\partial Φ_{0} (F)}{\partial f_{i}} Φ_{1} (F)) \end{matrix}

(32)

and

\begin{matrix} \frac{\partial Φ_{k} (F)}{\partial f_{i}} = \frac{1}{f_{i}} \{C (F) - (k + \frac{a + m + p - 1}{2}) Φ_{k} - B_{k} (F)\}, \end{matrix}

(33)

where

\begin{matrix} C (F) = \int_{0}^{1} \dots \int_{0}^{1} g (p^{- 1} (1 + \sum_{k \neq i} λ_{k})) (\prod_{k = 1}^{p} λ_{k}^{\frac{a + p + m - 3}{2}}) e^{- \frac{\sum_{k \neq i} λ_{k} f_{k}}{2}} d λ_{- i} \end{matrix}

(34)

and

\begin{matrix} B_{k} (F) = \int_{0}^{1} \dots \int_{0}^{1} λ_{i}^{k + 1} \frac{\partial g (p^{- 1} \sum_{k = 1}^{p} λ_{k})}{\partial λ_{i}} (\prod_{k = 1}^{p} λ_{k}^{\frac{a + p + m - 3}{2}}) e^{- \frac{\sum_{k = 1}^{p} λ_{k} f_{k}}{2}} d λ . \end{matrix}

(35)

By replacing (33) with (32), we have

\begin{matrix} \frac{\partial ψ_{i}}{\partial f_{i}} = \frac{1}{Φ_{0}^{2} (F)} \{(Φ_{0} - Φ_{1}) C (F) + (B_{0} Φ_{1} - B_{1} Φ_{0})\} . \end{matrix}

(36)

Since

C (F) \geq 0

and also

Φ_{0} - Φ_{1} \geq 0

, we have

(Φ_{0} - Φ_{1}) C (F) \geq 0

. In order to prove

\frac{\partial ψ_{i}}{\partial f_{i}} \geq 0

, it suffices to show that

\frac{B_{0} (F)}{Φ_{0}} \frac{Φ_{1}}{Φ_{0}} \geq \frac{B_{1} (F)}{Φ_{0}}

. That is,

\begin{matrix} \int_{0}^{1} \dots \int_{0}^{1} - \frac{λ_{i}^{2} g^{'} (p^{- 1} \sum_{k = 1}^{p} λ_{k})}{g (p^{- 1} \sum_{k = 1}^{p} λ_{k})} ξ (λ) d λ \\ \geq & \int_{0}^{1} \dots \int_{0}^{1} - \frac{λ_{i} g^{'} (p^{- 1} \sum_{k = 1}^{p} λ_{k})}{g (p^{- 1} \sum_{k = 1}^{p} λ_{k})} ξ (λ) d λ \int_{0}^{1} \dots \int_{0}^{1} λ_{i} ξ (λ) d λ, \end{matrix}

(37)

where

\begin{matrix} ξ (λ) = \frac{g (p^{- 1} \sum_{k = 1}^{p} λ_{k}) (\prod_{k = 1}^{p} λ_{k}^{\frac{a + p + m - 3}{2}}) e^{- \frac{\sum_{k = 1}^{p} λ_{k} f_{k}}{2}}}{\int_{0}^{1} \dots \int_{0}^{1} g (p^{- 1} \sum_{k = 1}^{p} λ_{k}) (\prod_{k = 1}^{p} λ_{k}^{\frac{a + p + m - 3}{2}}) e^{- \frac{\sum_{k = 1}^{p} λ_{k} f_{k}}{2}} d λ} . \end{matrix}

(38)

Using condition (III), it is easy to show that

\begin{matrix} ξ (λ) ξ (λ^{'}) \leq ξ (λ \lor λ^{'}) ξ (λ \land λ^{'}) \end{matrix}

(39)

for

λ = (λ_{1}, λ_{2}, \dots, λ_{p})

and

λ^{'} = (λ_{1}^{'}, λ_{2}^{'}, \dots, λ_{p}^{'})

. Because of condition (II), Lemma 3 can be applied to prove (37), so

\frac{\partial ψ_{i}}{\partial f_{i}} \geq 0

for all

i = 1, \dots, p

. □

Lemma 5 gives conditions for determining an upper bound on

ψ_{1}

.

Lemma 5.

Assume that

g (1^{-}) = {lim}_{t ↑ 1} g (t) < \infty

and

\begin{matrix} lim_{t \to 0} \frac{g (t)}{e^{α t}} = c . \end{matrix}

(40)

For some

α \leq 0

and some

c > 0

,

\begin{matrix} lim_{f_{p} \to \infty} lim_{f_{p - 1} \to \infty} \dots lim_{f_{1} \to \infty} ψ_{1} (F) = a + m + p - 1 . \end{matrix}

(41)

Proof.

Note that

\frac{g (t)}{e^{α t}}

is continuous in

(0, 1)

and has limits at the points 0 and 1. So, this function is bounded on its domain, meaning there exists a

k > 0

such that

\begin{matrix} g (t) \leq k e^{α t} . \end{matrix}

(42)

From (16),

\begin{matrix} ψ_{1} (F) = f_{1} \frac{\int_{0}^{1} \dots \int_{0}^{1} λ_{1} g (p^{- 1} \sum_{k = 1}^{p} λ_{k}) (\prod_{k = 1}^{p} λ_{k}^{\frac{a + p + m - 3}{2}}) e^{- \frac{\sum_{k = 1}^{p} λ_{k} f_{k}}{2}} d λ}{\int_{0}^{1} \dots \int_{0}^{1} g (p^{- 1} \sum_{k = 1}^{p} λ_{k}) (\prod_{k = 1}^{p} λ_{k}^{\frac{a + p + m - 3}{2}}) e^{- \frac{\sum_{k = 1}^{p} λ_{k} f_{k}}{2}} d λ} . \end{matrix}

(43)

Letting

λ_{k} \to \frac{λ_{k}}{f_{k}}

, we obtain

\begin{matrix} ψ_{1} (F) = \frac{\int_{0}^{\infty} \dots \int_{0}^{\infty} M_{1} (F, λ) d λ}{\int_{0}^{\infty} \dots \int_{0}^{\infty} M_{0} (F, λ) d λ}, \end{matrix}

(44)

where for

i = 0, 1

,

\begin{matrix} M_{i} (F, λ) = λ_{1}^{i} g (p^{- 1} \sum_{k = 1}^{p} \frac{λ_{k}}{f_{k}}) (\prod_{k = 1}^{p} (λ_{k}^{\frac{a + p + m - 3}{2}}) I (0 < λ_{k} < f_{k})) e^{- \frac{\sum_{k = 1}^{p} λ_{k}}{2}} . \end{matrix}

(45)

□

We now bound the integrand of

M_{i}

in order to apply the Lebesgue-dominated convergence theorem. First, using (42), we have

\begin{matrix} \int_{0}^{\infty} \int_{0}^{\infty} \dots \int_{0}^{\infty} M_{i} (F, λ) d λ \\ = & \int_{0}^{f_{1}} \int_{0}^{f_{2}} \dots \int_{0}^{f_{p}} λ_{1}^{i} g (\frac{1}{p} (\sum_{k = 1}^{p} \frac{λ_{k}}{f_{k}})) (\prod_{k = 1}^{p} λ_{k}^{\frac{a + p + m - 3}{2}}) e^{- \frac{\sum_{k = 1}^{p} λ_{k}}{2}} d λ \\ \leq & k \int_{0}^{f_{1}} \int_{0}^{f_{2}} \dots \int_{0}^{f_{p}} λ_{1}^{i} e^{\frac{α}{p} (\sum_{k = 1}^{p} \frac{λ_{k}}{f_{k}})} (\prod_{k = 1}^{p} λ_{k}^{\frac{a + p + m - 3}{2}}) e^{- \frac{\sum_{k = 1}^{p} λ_{k}}{2}} d λ . \end{matrix}

(46)

Since

α \leq 0

, we have

\begin{matrix} \int_{0}^{f_{1}} \int_{0}^{f_{2}} \dots \int_{0}^{f_{p}} λ_{1}^{i} e^{\frac{α}{p} (\sum_{k = 1}^{p} \frac{λ_{k}}{f_{k}})} (\prod_{k = 1}^{p} λ_{k}^{\frac{a + p + m - 3}{2}}) e^{- \frac{\sum_{k = 1}^{p} λ_{k}}{2}} d λ \\ \leq & \int_{0}^{f_{1}} \int_{0}^{f_{2}} \dots \int_{0}^{f_{p}} λ_{1}^{i} (\prod_{k = 1}^{p} λ_{k}^{\frac{a + p + m - 3}{2}}) e^{- \frac{\sum_{k = 1}^{p} λ_{k}}{2}} d λ . \end{matrix}

(47)

Since

a > - m

, we have

\begin{matrix} \int_{0}^{f_{1}} \int_{0}^{f_{2}} \dots \int_{0}^{f_{p}} λ_{1}^{i} (\prod_{k = 1}^{p} λ_{k}^{\frac{a + p + m - 3}{2}}) e^{- \frac{\sum_{k = 1}^{p} λ_{k}}{2}} d λ \\ = & 2^{\frac{p (a + m + p - 1)}{2} + i} Γ^{p - 1} (\frac{a + p + m - 1}{2}) Γ (\frac{a + p + m + 2 i - 1}{2}) . \end{matrix}

(48)

Thus, (46) is finite, and the Lebesgue-dominated convergence theorem can be used. Hence,

\begin{matrix} lim_{f_{p} \to \infty} lim_{f_{p - 1} \to \infty} \dots lim_{f_{1} \to \infty} \int_{0}^{\infty} \int_{0}^{\infty} \dots \int_{0}^{\infty} M_{i} (F, λ) d λ \\ = & \int_{0}^{\infty} \int_{0}^{\infty} \dots \int_{0}^{\infty} lim_{f_{p} \to \infty} lim_{f_{p - 1} \to \infty} \dots lim_{f_{1} \to \infty} M_{i} (F, λ) d λ \\ = & c \int_{0}^{\infty} \int_{0}^{\infty} \dots \int_{0}^{\infty} λ_{1}^{i} (\prod_{k = 1}^{p} λ_{k}^{\frac{a + p + m - 3}{2}}) e^{- \frac{\sum_{k = 1}^{p} λ_{k}}{2}} d λ \\ = & c 2^{\frac{p (a + m + p - 1)}{2} + i} Γ^{p - 1} (\frac{a + p + m - 1}{2}) Γ (\frac{a + p + m + 2 i - 1}{2}) . \end{matrix}

(49)

Finally, using (44) and the above limits, we have

\begin{matrix} lim_{f_{p} \to \infty} lim_{f_{p - 1} \to \infty} \dots lim_{f_{1} \to \infty} ψ_{1} (F) = a + m + p - 1, \end{matrix}

(50)

the desired result.

The results of Lemmas 2, 4 and 5 are our main result.

Theorem 1.

(a) If the conditions of Lemmas 2, 4 and 5 hold and if

a < m - 3 p - 1

, then the generalized Bayes estimator

δ_{π} (X)

with respect to (2) is minimax under the loss function (3).

(b) Further, if g is integrable, then the estimator

δ_{π}

is proper Bayes and minimax, and hence admissible under (3).

3. Examples

We give two examples in this section to which our results can be applied. We also make connections to [15].

Example 1.

Assume that

g (t) = 1

for

0 < t < 1

. For this function

g (t) = 1

, the class of prior distributions

π (Θ)

is the form of

\begin{matrix} π (Θ) = {(2 π)}^{- \frac{m p}{2}} \int_{0_{p \times p} < Λ < I_{p}} {|Λ|}^{\frac{a + m}{2} - 1} {|I_{p} - Λ|}^{- \frac{m}{2}} e^{- \frac{t r ({(I_{p} - Λ)}^{- 1} Λ Θ Θ^{'})}{2}} d Λ . \end{matrix}

(51)

This is the same class of prior distributions studied by [15]. If we do the same as in Section 2 of [15], we will obtain the class of Bayes estimators of the form

δ_{π} = (I_{p} - U F^{- 1} Ψ (F) U^{'}) X

with

Ψ (F) = diag (ψ_{1} (F), \dots, ψ_{p} (F))

, where

\begin{matrix} ψ_{i} (F) = f_{i} \frac{\int_{0}^{1} \dots \int_{0}^{1} λ_{i} (\prod_{k = 1}^{p} λ_{k}^{\frac{a + p + m - 3}{2}}) e^{- \frac{\sum_{k = 1}^{p} λ_{k} f_{k}}{2}} d λ}{\int_{0}^{1} \dots \int_{0}^{1} (\prod_{k = 1}^{p} λ_{k}^{\frac{a + p + m - 3}{2}}) e^{- \frac{\sum_{k = 1}^{p} λ_{k} f_{k}}{2}} d λ} \end{matrix}

(52)

for

i = 1, \dots, p

. We now show the class of Bayes estimators is minimax under the loss function (3). It is sufficient to show that the conditions of Theorem 1 are satisfied, meaning that we should show that

g (t) = 1

for

0 < t < 1

follows the conditions stated in Lemmas 2, 4 and 5. Note that Lemma 2 states that if function

g (t)

is decreasing in t, then the conditions of Lemma 2 hold. Because

g (t) = 1

is a constant function, it is decreasing in t. So, the conditions of Lemma 2 hold.

For

a > - \frac{m + p - 1}{2}

,

\begin{matrix} lim_{λ_{i} \to 0} λ_{i}^{\frac{a + p + m - 1}{2}} = 0 \end{matrix}

(53)

for

i = 1, \dots, p

. Also

\frac{λ_{i} g^{'} (p^{- 1} \sum_{k = 1}^{p} λ_{k})}{g (p^{- 1} \sum_{k = 1}^{p} λ_{k})} = 0

and so

\frac{λ_{i} g^{'} (p^{- 1} \sum_{k = 1}^{p} λ_{k})}{g (p^{- 1} \sum_{k = 1}^{p} λ_{k})}

is non-decreasing in

λ_{j}

,

j = 1, \dots, p

. For

λ = (λ_{1}, λ_{2}, \dots, λ_{p})

and

λ^{'} = (λ_{1}^{'}, λ_{2}^{'}, \dots, λ_{p}^{'})

, where

0 < λ_{i}, λ_{i}^{'} < 1

,

i = 1, \dots, p

, the inequality

\begin{matrix} g (p^{- 1} \sum_{k = 1}^{p} λ_{k}) g (p^{- 1} \sum_{k = 1}^{p} λ_{k}^{'}) \leq g (p^{- 1} \sum_{k = 1}^{p} (λ_{k} \lor λ_{k}^{'})) g (p^{- 1} \sum_{k = 1}^{p} (λ_{k} \land λ_{k}^{'})) \end{matrix}

(54)

holds because

g (p^{- 1} \sum_{k = 1}^{p} λ_{k}) = 1

. Therefore,

g (\cdot)

satisfies the conditions of Lemma 4. Further,

g (1^{- 1}) = lim_{t ↑ 1} g (t) = 1 < \infty

and, if we select

α = 0

, then

\begin{matrix} lim_{λ_{i} \to 0} \frac{g (t)}{e^{α t}} = 1 . \end{matrix}

(55)

Thus, the conditions of Lemma 5 hold. Now, if

a < m - 3 p - 1

, then based on Theorem 1, the proper Bayes estimators

δ_{π}

are minimax under loss function (3). Thus, our class of minimax estimators include [15]’s results.

Example 2.

Another class of prior distributions

π (Θ)

can be constructed by taking

g (\cdot)

to be

\begin{matrix} g (t) = c^{'} e^{- β t}, 0 < t < 1, \end{matrix}

(56)

where

c^{'} > 0

and

β > 0

. Then,

π (Θ)

will be

\begin{matrix} π (Θ) = c^{'} {(2 π)}^{- \frac{m p}{2}} \int_{0_{p \times p} < Λ < I_{p}} e^{- β \frac{t r (Λ)}{p}} {|Λ|}^{\frac{a + m}{2} - 1} {|I_{p} - Λ|}^{- \frac{m}{2}} e^{- \frac{t r ({(I_{p} - Λ)}^{- 1} Λ Θ Θ^{'})}{2}} d Λ . \end{matrix}

(57)

If we follow the discussion of Section 2 in order, the Bayes estimators will be of the form

δ_{π} = (I_{p} - U F^{- 1} Ψ (F) U^{'}) X

and

Ψ (F) = diag (ψ_{1} (F), \dots, ψ_{p} (F))

, with

\begin{matrix} ψ_{i} (F) = f_{i} \frac{\int_{0}^{1} \dots \int_{0}^{1} λ_{i} (\prod_{k = 1}^{p} λ_{k}^{\frac{a + p + m - 3}{2}}) e^{- \sum_{k = 1}^{p} λ_{k} (\frac{β}{p} + \frac{f_{k}}{2})} d λ}{\int_{0}^{1} \dots \int_{0}^{1} (\prod_{k = 1}^{p} λ_{k}^{\frac{a + p + m - 3}{2}}) e^{- \sum_{k = 1}^{p} λ_{k} (\frac{β}{p} + \frac{f_{k}}{2})} d λ} \end{matrix}

(58)

for

i = 1, \dots, p

. We show that this class satisfies the conditions of Theorem 1. Based on Theorem 1, it is sufficient to show that the conditions of Lemmas 2, 4 and 5 are satisfied. Lemma 2 states that if

g (t)

is decreasing in t, then Lemma 2 holds. Since

c^{'} > 0

and

β > 0

,

g^{'} (t) = - c^{'} β e^{- β t} < 0

for each

0 < t < 1

. Therefore, the condition of Lemma 2 is satisfied.

For

λ = (λ_{1}, λ_{2}, \dots, λ_{p})

and

λ^{'} = (λ_{1}^{'}, λ_{2}^{'}, \dots, λ_{p}^{'})

, where

0 < λ_{i}, λ_{i}^{'} < 1

,

i = 1, \dots, p

,

\begin{matrix} e^{- \frac{β \sum_{k = 1}^{p} λ_{k}}{p}} e^{- \frac{β \sum_{k = 1}^{p} λ_{k}^{'}}{p}} \leq e^{- \frac{β \sum_{k = 1}^{p} (λ_{k} \lor λ_{k}^{'})}{p}} e^{- \frac{β \sum_{k = 1}^{p} (λ_{k} \land λ_{k}^{'})}{p}} \end{matrix}

(59)

and so condition III of Lemma 4 holds. Also, if

a > - (m + p - 1)

, then

\begin{matrix} lim_{λ_{i} \to 0} λ_{i}^{\frac{a + m + p - 1}{2}} g (p^{- 1} \sum_{k = 1}^{p} λ_{k}) = lim_{λ_{i} \to 0} λ_{i}^{\frac{a + m + p - 1}{2}} e^{- β (p^{- 1} \sum_{k = 1}^{p} λ_{k})} = 0 \end{matrix}

(60)

for

i = 1, \dots, p

, so condition II of Lemma 4 holds.

\frac{λ_{i} g^{'} (p^{- 1} \sum_{k = 1}^{p} λ_{k})}{g (p^{- 1} \sum_{k = 1}^{p} λ_{k})} = - c^{'} β λ_{i}

is a decreasing function in

λ_{j}

for each

j = 1, \dots, p

, so condition I of Lemma 4 holds. We have

\begin{matrix} g (1^{-}) = lim_{t ↑ 1} g (t) = lim_{t ↑ 1} c^{'} e^{- β t} = c^{'} e^{- β} < \infty \end{matrix}

(61)

and, if we choose

α = - β

, we obtain

\begin{matrix} lim_{t \to 0} \frac{g (t)}{e^{α t}} = c^{'} > 0 . \end{matrix}

(62)

Hence, the conditions of Lemma 5 hold. Also, if

a < m - 3 p - 1

, then all the conditions of Theorem 1 hold and hence the Bayes estimator

δ_{π}

obtained with respect to the priors

π (Θ)

is minimax under the loss function (3).

4. Concluding Remarks

The problem of estimating the mean matrix

Θ

of a matrix variate normal distribution with the covariance matrix

I_{p} \otimes I_{m}

under the loss function

t r ((δ - Θ) {(δ - Θ)}^{'})

has been investigated.

This is an invariant problem with respect to the group of orthogonal transformations. We considered the following prior distribution which is invariant under the group of orthogonal transformations.

\begin{matrix} π (Θ) = {(2 π)}^{- \frac{m p}{2}} \int_{0_{p \times p} < Λ < I_{p}} g (\frac{t r (Λ)}{p}) {|Λ|}^{\frac{a + m}{2} - 1} {|I_{p} - Λ|}^{- \frac{m}{2}} e^{- \frac{t r ({(I_{p} - Λ)}^{- 1} Λ Θ Θ^{'})}{2}} d Λ . \end{matrix}

(63)

Using the invariant discussion, our Bayes estimators are of the form

δ_{π} = (I_{p} - U F^{- 1} Ψ (F) U^{'}) X

, where

Ψ (F) = diag (ψ_{1} (F), \dots, ψ_{p} (F))

, with

\begin{matrix} ψ_{i} (F) = f_{i} \frac{\int_{0}^{1} \dots \int_{0}^{1} λ_{i} g (\frac{\sum_{k = 1}^{p} λ_{k}}{p}) (\prod_{k = 1}^{p} λ_{k}^{\frac{a + p + m - 3}{2}}) e^{- \frac{\sum_{k = 1}^{p} λ_{k} f_{k}}{2}} d λ}{\int_{0}^{1} \dots \int_{0}^{1} g (\frac{\sum_{k = 1}^{p} λ_{k}}{p}) (\prod_{k = 1}^{p} λ_{k}^{\frac{a + p + m - 3}{2}}) e^{- \frac{\sum_{k = 1}^{p} λ_{k} f_{k}}{2}} d λ} \end{matrix}

(64)

for

i = 1, \dots, p

.

In this paper, we have obtained the conditions for the continuous function

g (\cdot)

such that the resulting Bayes estimators are minimax under the given loss function. If we want to make a comparison between the results of [15] and our results in this paper, it should be said that the difference is only in

g (\cdot)

. [15] showed that if

g (t) = 1

,

0 < t < 1

, then the resulting Bayes estimators are minimax under the given loss function. We obtained conditions on

g (t)

such that the resulting class of Bayes estimators is minimax. We also showed that the function g obtained by [15] applies to the conditions obtained in this paper, so the results of our paper include [15]’s results. Also, we presented another example in such a way that if

g (t) = c^{'} e^{- β t}

,

0 < t < 1

, where

c^{'}, β > 0

, then the conditions obtained in the paper are valid and the resulting Bayes estimators are minimax under (3). Hence, we have obtained a larger class of Bayes estimators which include the class of Bayes estimators obtained by [15].

The estimator proposed in this paper being minimax and admissible could lead to improved inference in various application areas of the matrix variate normal distribution, including analyses of multiple vector autoregressions; brain connectivity alternation detection; capacity for severely fading MIMO channels; integrated principal components analyses; determination of relationships between incidence and mortality of asthma and PM2.5, ozone, and household air pollution; autism spectrum disorder identification; and identification of depression disorder using multi-view high-order brain function networks. We provide two examples:

Example 1 —Suppose that there are three mines in one area and the owner of all three mines is the same. Suppose the owner wants to know how much gold, copper, zinc, aluminum, bronze, and iron can be extracted per kilogram of ore in each mine. They want the authorities to randomly extract one kilogram of ore from each mine in one day n certain times and determine the amount of the metals in a laboratory:

\begin{matrix} \begin{matrix} gold & copper & zinc & aluminum & bronze & iron \\ mine 1 & X_{1, 1} & X_{1, 2} & X_{1, 3} & X_{1, 4} & X_{1, 5} & X_{1, 6} \\ mine 2 & X_{2, 1} & X_{2, 2} & X_{2, 3} & X_{2, 4} & X_{2, 5} & X_{2, 6} \\ mine 3 & X_{3, 1} & X_{3, 2} & X_{3, 3} & X_{3, 4} & X_{3, 5} & X_{3, 6} \end{matrix} \end{matrix}

They are faced with the following matrix of variables

\begin{matrix} X = [\begin{matrix} X_{1, 1} & X_{2, 1} & X_{3, 1} \\ X_{1, 2} & X_{2, 2} & X_{3, 2} \\ X_{1, 3} & X_{2, 3} & X_{3, 3} \\ X_{1, 4} & X_{2, 4} & X_{3, 4} \\ X_{1, 5} & X_{2, 5} & X_{3, 5} \\ X_{1, 6} & X_{2, 6} & X_{3, 6} \end{matrix}] . \end{matrix}

Based on previous experience, they know that the amount of each metal extracted from each mine is independent of other mines and the amount of other metals. They also know that the amount of metal extracted from each kilogram of ore has a small dispersion and that the amount of each metal from each mine has a normal distribution. Our results in this paper can be used to estimate the means of the metals extracted.

Example 2—Suppose that a researcher wants to investigate the effect of the number of study hours (3 or 4 h) on the progress of three students in four subjects: mathematics, history, art, and geography. They choose four classmates at random and ask them to spend 3 h per week studying each subject for half of a semester and 4 h per week for the other half of the same semester. They want to observe the results per student as a random matrix as follows:

\begin{matrix} \begin{matrix} Information of student k & Mathematics score & History score & Art score & Geography score \\ 3 h & X_{1, 1, k} & X_{1, 2, k} & X_{1, 3, k} & X_{1, 4, k} \\ 4 h & X_{2, 1, k} & X_{2, 2, k} & X_{2, 3, k} & X_{2, 4, k} \end{matrix} \end{matrix}

Suppose the numerical results are

\begin{matrix} \begin{matrix} x_{1} = [\begin{matrix} 16.5 & 13.75 & 18.75 & 17.75 \\ 17.5 & 12.25 & 18.5 & 19.5 \end{matrix}], x_{2} = [\begin{matrix} 17.2 & 15.75 & 19.25 & 18.25 \\ 17 & 14.85 & 19.5 & 18.5 \end{matrix}], \\ x_{3} = [\begin{matrix} 15.25 & 15.75 & 17.25 & 18.25 \\ 16.5 & 15.85 & 17.5 & 18.75 \end{matrix}], x_{4} = [\begin{matrix} 15.25 & 14.75 & 17.25 & 18.75 \\ 15.5 & 13.85 & 16.5 & 19.25 \end{matrix}] . \end{matrix} \end{matrix}

The researcher previously performed similar tests in other schools; it was found that the number of study hours (3 or 4) has no effect, the rates of progress in each course are independent from each other, and each variable of this random matrix has a normal distribution. If they want to estimate the mean of the matrix variate normal distribution, our results can be used.

A future study could be deriving explicit expressions for moments of

Θ |X

. These may be obtained using the results of [24].

Author Contributions

Conceptualization, S.Z. and S.N.; methodology, S.Z. and S.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

The authors would like to thank the editor and the three referees for careful reading and comments which greatly improved the paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wichitaksorn, N. Analyzing multiple vector autoregressions through matrix-variate normal distribution with two covariance matrices. Commun. Stat.—Theory Methods 2019, 49, 1801–1817. [Google Scholar] [CrossRef]
Xia, Y.; Li, L.X. Matrix graph hypothesis testing and application in brain connectivity alternation detection. Stat. Sin. 2019, 29, 303–328. [Google Scholar] [CrossRef]
Ferreira, J.T. Upper bounds for the capacity for severely fading MIMO channels under a scale mixture assumption. Entropy 2021, 23, 845. [Google Scholar] [CrossRef]
Tang, T.M.; Allen, G.I. Integrated principal components analysis. J. Mach. Learn. Res. 2021, 22, 1–71. [Google Scholar]
Ahmadi, F.; Fallah, Z.; Shadmani, F.K.; Allahmoradi, M.; Salahshoor, P.; Ahmadi, S.; Mansori, K. Relationship between incidence and mortality of asthma with PM2.5, ozone, and household air pollution from 1990 to 2106 in the world: An ecological study. Egypt. J. Chest Dis. Tuberc. 2022, 71, 457–463. [Google Scholar]
Jiang, X.; Zhou, Y.Y.; Zhang, Y.N.; Zhang, L.M.; Qiao, L.S.; De Leone, R. Estimating high-order brain functional networks in Bayesian view for autism spectrum disorder identification. Front. Neurosci. 2022, 16, 872848. [Google Scholar] [CrossRef]
Zhao, F.; Gao, T.Y.; Cao, Z.; Chen, X.B.; Mao, Y.Y.; Mao, N.; Ren, Y.D. Identifying depression disorder using multi-view high-order brain function network derived from electroencephalography signal. Front. Comput. Neurosci. 2022, 16, 1046310. [Google Scholar] [CrossRef]
Efron, B.; Morris, C. Empirical Bayes on vector observations: An extension of Stein’s method. Biometrika 1972, 59, 335–347. [Google Scholar] [CrossRef]
Stein, C. Estimation of the mean of a multivariate normal distribution. In Proceedings of the Prague Symposium on Asymptotic Statistics, Prague, Czech Republic, 3–6 September 1973; pp. 345–381. [Google Scholar]
Zhang, Z. On estimation of matrix of normal mean. J. Multivar. Anal. 1986, 18, 70–82. [Google Scholar] [CrossRef]
Baranchik, A.J. A family of minimax estimators of the mean of a multivariate normal distribution. Ann. Math. Stat. 1970, 41, 642–645. [Google Scholar] [CrossRef]
Bilodeau, M.; Kariya, T. Minimax estimators in the normal MANOVA model. J. Multivar. Anal. 1989, 28, 260–270. [Google Scholar] [CrossRef]
Konno, Y. On estimation of a matrix of normal means with unknown covariance matrix. J. Multivar. Anal. 1991, 36, 44–55. [Google Scholar] [CrossRef]
Haff, R.L. An identity for the Wishart distribution with applications. J. Multivar. Anal. 1979, 9, 531–544. [Google Scholar] [CrossRef]
Tsukuma, H. Admissibility and minimaxity of Bayes estimators for a normal mean matrix. J. Multivar. Anal. 2008, 99, 2251–2264. [Google Scholar] [CrossRef]
Tsukuma, H. Generalized Bayes minimax estimation of the normal mean matrix with unknown covariance matrix. J. Multivar. Anal. 2009, 100, 2296–2304. [Google Scholar] [CrossRef]
Zinodiny, S.; Strawderman, W.E.; Parsian, A. Bayes minimax estimation of the multivariate normal mean vector for the case of common unknown variance. J. Multivar. Anal. 2011, 102, 1256–1262. [Google Scholar] [CrossRef]
Zinodiny, S.; Rezaei, S.; Arjmand, O.N.; Nadarajah, S. A new class of Bayes minimax estimators of the normal mean matrix for the case of common unknown variances. Statistics 2017, 51, 1082–1094. [Google Scholar] [CrossRef]
Tsukuma, H. Shrinkage priors for Bayesian estimation of the mean matrix in an elliptically contoured distribution. J. Multivar. Anal. 2010, 101, 1483–1492. [Google Scholar] [CrossRef]
Tsukuma, H. Proper Bayes minimax estimators of the normal mean matrix with common unknown variances. J. Stat. Plan. Inference 2010, 140, 2596–2606. [Google Scholar] [CrossRef]
Faith, E.R. Minimax Bayes estimators of a multivariate normal mean. J. Multivar. Anal. 1978, 8, 372–379. [Google Scholar] [CrossRef]
Gupta, A.K.; Nagar, D. Matrix Variate Distributions; Chapman and Hall/CRC: London, UK, 1999. [Google Scholar]
Fortuin, C.M.; Kasteleyn, P.W.; Ginibre, J. Correlation inequalities on some partially ordered sets. Commun. Math. Phys. 1971, 22, 89–103. [Google Scholar] [CrossRef]
Mathai, A.M.; Provost, S.B.; Haubold, H.J. Multivariate Statistical Analysis in the Real and Complex Domains; Springer: New York, NY, USA, 2022. [Google Scholar]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zinodiny, S.; Nadarajah, S. A New Class of Bayes Minimax Estimators of the Mean Matrix of a Matrix Variate Normal Distribution. Mathematics 2024, 12, 1098. https://doi.org/10.3390/math12071098

AMA Style

Zinodiny S, Nadarajah S. A New Class of Bayes Minimax Estimators of the Mean Matrix of a Matrix Variate Normal Distribution. Mathematics. 2024; 12(7):1098. https://doi.org/10.3390/math12071098

Chicago/Turabian Style

Zinodiny, Shokofeh, and Saralees Nadarajah. 2024. "A New Class of Bayes Minimax Estimators of the Mean Matrix of a Matrix Variate Normal Distribution" Mathematics 12, no. 7: 1098. https://doi.org/10.3390/math12071098

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A New Class of Bayes Minimax Estimators of the Mean Matrix of a Matrix Variate Normal Distribution

Abstract

1. Introduction

2. A Class of Bayes Minimax Estimators of the Mean Matrix

3. Examples

4. Concluding Remarks

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI