Next Article in Journal
SEAIS: Secure and Efficient Agricultural Image Storage Combining Blockchain and Satellite Networks
Previous Article in Journal
Absolute Monotonicity of Normalized Tail of Power Series Expansion of Exponential Function
Previous Article in Special Issue
Optimal Non-Asymptotic Bounds for the Sparse β Model
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Generalizations of the Kantorovich and Wielandt Inequalities with Applications to Statistics

School of Mathematics and Physics, Jiangsu University of Technology, Changzhou 213001, China
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(18), 2860; https://doi.org/10.3390/math12182860 (registering DOI)
Submission received: 26 July 2024 / Revised: 9 September 2024 / Accepted: 10 September 2024 / Published: 14 September 2024
(This article belongs to the Special Issue New Advances in High-Dimensional and Non-asymptotic Statistics)

Abstract

:
By utilizing the properties of positive definite matrices, mathematical expectations, and positive linear functionals in matrix space, the Kantorovich inequality and Wielandt inequality for positive definite matrices and random variables are obtained. Some novel Kantorovich type inequalities pertaining to matrix ordinary products, Hadamard products, and mathematical expectations of random variables are provided. Furthermore, several interesting unified and generalized forms of the Wielandt inequality for positive definite matrices are also studied. These derived inequalities are then exploited to establish an inequality regarding various correlation coefficients and study some applications in the relative efficiency of parameter estimation of linear statistical models.

1. Introduction

The Kantorovich inequality and Wielandt inequality have long been recognized as fundamental mathematical tools with profound implications in various fields. Their influence and application have been extensively documented in the literature, making them keystones in mathematical analysis and statistical modeling.
Ref. [1] provided an early and comprehensive analysis of the Kantorovich inequality, exploring its theoretical underpinnings and potential applications. Ref. [2] further extended this analysis, focusing on the inequality’s use in optimization problems and its connection to other mathematical principles. Ref. [3] then bridged the theoretical gap between these inequalities and their practical applications, particularly in linear statistical models. Their work highlighted how these inequalities could be used to enhance statistical analysis and modeling.
Ref. [4] further explored the applications of these inequalities in statistical models, demonstrating their versatility and importance in complex statistical analysis. Ref. [5] provided an in-depth examination of the Wielandt inequality, emphasizing its role in matrix analysis and its connection to other fundamental matrix inequalities. Ref. [6] revisited the Kantorovich inequality, offering a fresh perspective on its use in modern mathematical problems, especially those related to compressed sensing and signal processing. Ref. [7] brought a contemporary perspective, discussing how these inequalities remain relevant in modern statistical and mathematical challenges.
Despite the rich literature surrounding the Kantorovich and Wielandt inequalities, there is still ample room for exploration and generalization. Many existing studies focus on specific applications or theoretical aspects, leaving a gap for a more unified and generalized approach. This article aims to fill that gap, building upon the foundation laid by previous scholars and pushing the boundaries of these inequalities’ applications in probability and statistics.
In conclusion, the Kantorovich and Wielandt inequalities have been extensively studied and applied in various fields. However, there is still a need for a more comprehensive and generalized understanding of these inequalities, which this article aims to provide. By leveraging past research and adopting a novel approach, we hope to offer new insights and applications for these fundamental mathematical tools.

2. Notation and Definition

Notation: We first give some notations [8,9] used in this article. Let R and C be the sets of real and complex numbers, respectively. The symbols M m , n , M n represent complex linear spaces formed by m × n and n order matrix. H n represents the set of Hermite matrices of order n. Furthermore, H n + , H n + + designate convex cones formed by positive semidefinite matrices and positive definite matrices of order n respectively. For a matrix A M m , n , the notation A , A * represent the transpose and the conjugate transpose of A. Additionally, for a matrix A H n + + , let M A , m A , κ ( A ) = M A m A be the maximum, minimum eigenvalue and condition number of A. For A , B H n , A > 0 , A 0 indicate that A is a positive-definite and semipositive-definite matrix, respectively. The inequality A B implies that A B is a semipositive-definite matrix. The symbol I n represents the identity matrix of order n, which can also be noted simply as I without loss of generality. For a vector x C n , the 2-norm of x is defined as x 2 = ( x * x ) 1 2 . For a random variable ξ with a finite first-order moment, its mathematical expectation is denoted by E ξ . Furthermore, assuming ξ , η are p-dimensional and q-dimensional random vectors respectively, cov ( ξ , η ) = E ( ξ E ξ ) ( η E η ) represents the covariance matrix of ξ and η . Especially, D ( ξ ) = cov ( ξ , ξ ) is defined as the variance of ξ . Note that all the statistical applications in this paper are referred to the real-valued random variables.
Definition 1. 
If a linear functional Φ of M k C satisfies Φ ( A ) 0 for any A H k + , then Φ is called a positive linear functional on M k . Furthermore, Φ is called a strictly positive linear functional if Φ ( A ) > 0 for any A H k + + .
Lemma 1. 
Let ξ i j , i , j = 1 , 2 , , n be random variables with finite first-order moment on probability space ( Ω , Σ , P ) and matrix Γ = [ ξ i j ] n × n H n + . It follows that the expectation matrix A = E ( Γ ) = [ E ξ i j ] n × n H n + . Note that [ ξ i j ] n × n represents a matrix composed by elements ξ i j , i = 1 , 2 , , n , j = 1 , 2 , , n .
Proof. 
For any w = ( w 1 , w 2 , , w n ) T C n , since Γ = [ ξ i j ] n × n H n + , then w * Γ w = i = 1 n j = 1 n ξ i j w i * w j 0 . It follows that w * A w = i = 1 n j = 1 n E ξ i j w i * w j = E ( i = 1 n j = 1 n ξ i j w i * w j ) 0 . □
Lemma 2. 
Let A = a 11 a 12 a 13 a 21 a 22 0 a 31 0 a 33 H 3 + and a 11 a 33 0 , then
a 12 2 a 11 a 22 1 a 13 2 a 11 a 33 .
Proof. 
Lemma 2 can be obtained via det ( A ) 0 . Since
det ( A ) = a 11 a 22 a 33 a 13 a 22 a 31 a 12 a 33 a 21 = a 11 a 22 a 33 a 13 2 a 22 a 12 2 a 33 0 .
Hence, a 12 2 a 33 a 11 a 22 a 33 a 13 2 a 22 , and a 11 a 33 0 , we have
| a 12 | 2 a 11 a 22 | a 13 | 2 a 22 a 33 ,
which completes the proof. □
Lemma 3 
([3]). Assume A = A 11 A 12 A 21 A 11 H n + , and 0 < m I A M I , then
A 12 A 22 1 A 21 M m M + m 2 A 11 .
Lemma 4. 
Suppose A = I p X * X I q H p + q , then
1 X 2 I p + q A 1 + X 2 I p + q .
Proof. 
Just need to prove m A 1 X 2 , and M A 1 + X 2 . By Rayleigh-Ritz Theorem,
M A = max z * A z : z C p + q , z 2 = 1 , m A = min z * A z : z C p + q , z 2 = 1 .
where z = u v , and u C p , v C q . Since z 2 = 1 , then u * u + v * v = 1 , it follows that
z * A z = 1 + 2 Re u * X * v .
Combined with the Cauchy-Schwarz inequality, the Arithmetic-Geometric mean inequality and X 2 = sup { ξ * X * X ξ : ξ C p , ξ * ξ = 1 } , we have
2 Re u * X * v 2 u X v 2 u * X * X u v * v 2 X 2 u * u v * v X 2 u * u + v * v = X 2 .
This completes the proof of Lemma 4. □
Lemma 5 
([2,8]). If A H n + + , X M n , p , and X * X = I p , then
X * A X 1 X * A 1 X M A + m A 2 4 M A m A X * A X 1 .
Lemma 6. 
If Φ is a strictly positive linear functional from M n C , and A H n + + , then
Φ ( A ) Φ A 1 Φ 2 I n .
Proof. 
For any parameter t > 0 , since t A + 1 t A 1 2 I , then 2 Φ ( I ) t Φ ( A ) + 1 t Φ A 1 , it follows that
Φ ( I ) 1 2 min t > 0 t Φ ( A ) + 1 t Φ A 1 = Φ ( A ) Φ A 1 .
This completes the proof. □
Lemma 7. 
Suppose A , B M are invertible matrix, x , y C n , and x , y 0 , x * y = 0 , then
x * A B * y 2 x * A A * x y * B B * y 1 x * A B 1 x 2 x * A A * x x * ( B B * ) 1 x .
Proof. 
Since
x * A y * B x * ( B 1 ) * A * x B * y B 1 x = x * A A * x x * A B * y x * A B 1 x y * B A * x y * B B * y 0 x * ( A B 1 ) * x 0 x * ( B B * ) 1 x 0 ,
and Lemma 4, the results can be obtained. □

3. Some Generalizations on Kantorovich Inequality

Firstly, we utilize the properties of mathematical expectations to derive the following Kantorovich type inequality regarding mathematical expectations.
Theorem 1. 
Let ξ , η , ζ be random variables on probability space ( Ω , Σ , P ) and there exist constants m ξ , M ξ , m η , M η such that 0 < m ξ ξ M ξ and 0 < m η η M η , we then have
κ ( ξ ) + κ ( η ) κ ( ξ ) κ ( η ) + 1 2 E ( ξ η ζ ) E ( ξ ) E ( ξ ζ ) E ( η ζ ) κ ( ξ ) κ ( η ) + 1 κ ( ξ ) + κ ( η ) 2 .
and
0 E 1 ξ 1 E ( ξ ) ( M ξ m ξ ) 2 M ξ m ξ ,
where ζ 0 , 0 < E ζ < , and κ ( ξ ) = M ξ m ξ , κ ( η ) = M η m η .
Proof. 
(1). By homogeneity, we can assume E ( ζ ) = 1 . The proof of inequality (1) can be easily obtained for M ξ = m ξ . We now only need to prove the situation for case M ξ > m ξ .
Since
E [ ( M ξ ξ ) ζ ] + E [ ( ξ m ξ ) ζ ] = ( M ξ m ξ ) E ( ζ ) > 0 ,
then at least one of the terms E ( ( M ξ ξ ) ζ ) and E ( ( ξ m ξ ) ζ ) must be strictly positive. Without generality, we now give the proof under E [ ( ξ m ξ ) ζ ] > 0 , the other situation can be proved similarly.
Let t = E [ ( M ξ ξ ) ζ ] E [ ( ξ m ξ ) ζ ] , obviously t 0 . Then
E ( ξ ζ ) = M ξ + m ξ t t + 1 .
Note that m η E [ ( M ξ ξ ) ( M η η ) ζ ] + M η t E [ ( ξ m ξ ) ( η m η ) ζ ] 0 . Hence
m η E ( ξ η ζ ) + M η E [ ( M ξ ξ ) ζ ] M ξ E ( η ζ ) + M η t [ E ( ξ η ζ ) m η E [ ( ξ m ξ ) ζ ] m ξ E ( η ζ ) ] = m η [ E ( ξ η ζ ) + M η t E [ ( ξ m ξ ) ζ ] M ξ E ( η ζ ) ] + M η t [ E ( ξ η ζ ) m η E [ ( ξ m ξ ) ζ ] m ξ E ( η ζ ) ] = ( m η + M η t ) E ( ξ η ζ ) ( m η M ξ + M η m ξ t ) E ( η ζ ) 0 .
Equivalently, we have
E ( ξ η ζ ) E ( η ζ ) M η m ξ t + M ξ m η M η t + m η .
From Equations (3) and (5), it can be obtained that,
E ( ξ η ζ ) E ( ξ ζ ) E ( η ζ ) ( t + 1 ) M η m ξ t + M ξ m η ( m ξ t + M ξ ) M η t + m η = M η m ξ t 2 + M η m ξ + M ξ m η t + M ξ m η M η m ξ t 2 + M ξ M η + m ξ m η t + M ξ m η = κ ( η ) t 2 + ( κ ( ξ ) + κ ( η ) ) t + κ ( ξ ) κ ( η ) t 2 + ( κ ( ξ ) κ ( η ) + 1 ) t + κ ( ξ ) = κ ( ξ ) + κ ( η ) 2 t + κ ( η ) t κ ( ξ ) 2 κ ( ξ ) κ ( η ) + 1 2 t + κ ( η ) t κ ( ξ ) 2 .
Similarly with (3) and
M η E [ ( M ξ ξ ) ( η m η ) ζ ] + m η t E [ ( ξ m ξ ) ( M η η ) ζ ] 0 ,
the following results holds,
E ( ξ η ζ ) E ( ξ ζ ) E ( η ζ ) t 2 + κ ( ξ ) κ ( η ) + 1 t + κ ( ξ ) κ ( η ) t 2 + κ ( ξ ) + κ ( η ) t + κ ( ξ ) κ ( η ) = κ ( ξ ) κ ( η ) + 1 2 t + t κ ( ξ ) κ ( η ) 2 κ ( ξ ) + κ ( η ) 2 t + t κ ( ξ ) κ ( η ) 2 .
Note that κ ( ξ ) , κ ( η ) 1 and ( κ ( ξ ) 1 ) ( κ ( η ) 1 ) = κ ( ξ ) κ ( η ) + 1 κ ( ξ ) κ ( η ) 0 , thus
κ ( ξ ) κ ( η ) + 1 κ ( ξ ) + κ ( η ) ,
For t = 0 , it follows that E ( ξ η ζ ) E ( ξ ζ ) E ( η ζ ) = 1 from inequalities (6) and (7). By (8), we have κ ( ξ ) + κ ( η ) κ ( ξ ) κ ( η ) + 1 2 1 κ ( ξ ) κ ( η ) + 1 κ ( ξ ) + κ ( η ) 2 . Thus the result holds. For t > 0 , note that the function f ( x ) = a + x b + x is decreasing on [ 0 , + ) when a b , and is increasing on [ 0 , + ) when a b . Hence, by (6)–(8), the result holds.
(2). The left-hand side of inequality (2) holds via Cauchy-Schwarz inequality as E 1 ξ E ( ξ ) 1 . Now we prove the ride-hand side of inequality (2).
Let x = M ξ E ( ξ ) , y = E ( ξ ) m ξ , thus x 0 , y 0 and
x + y = M ξ m ξ , E ( ξ ) = m ξ x + M ξ y x + y .
Since M ξ ξ 0 and 1 m ξ 1 ξ 0 , then
E ( M ξ ξ ) 1 m ξ 1 ξ = x m ξ M ξ E 1 ξ + 1 0 .
Similarly,
E ( ξ m ξ ) 1 ξ 1 M ξ = y M ξ m ξ E 1 ξ + 1 0 .
If x = 0 , then E ( ξ ) = M ξ , Hence the result follows immediatedly from (10). The same conclusion can be derived if y = 0 . Now we prove that the result holds when x > 0 , y > 0 .
By calculating y M ξ × ( 10 ) + x m ξ × ( 11 ) , we have
E 1 ξ M ξ x + m ξ y M ξ m ξ ( x + y ) .
From (9) and (12), we can get that
E 1 ξ 1 E ( ξ ) M ξ x + m ξ y M ξ m ξ ( x + y ) x + y m ξ x + M ξ y = ( M ξ m ξ ) 2 x y M ξ m ξ ( m ξ x 2 + M ξ y 2 + ( M ξ + m ξ ) x y ) ( M ξ m ξ ) 2 x y M ξ m ξ ( M ξ + m ξ ) 2 x y ) ( M ξ m ξ ) 2 M ξ m ξ
The third inequality holds since the arithmetic-geometric mean inequality leads m ξ x 2 + M ξ y 2 + ( M ξ + m ξ ) x y M ξ + m ξ + 2 M ξ m ξ = ( M ξ + m ξ ) 2 . This completes the proof. □
Below, we use Theorem 1 and the spectral decomposition of matrices to provide some Kantorovich type inequalities for positive definite matrices.
The basic matrix version of the kantorovich inequality [2,6,8,10,11,12] can be stated as follows. Let A H n + + , x C n , and x 0 , then
x * A x x * A 1 x x * x 2 M A + m A 2 4 M A m A .
Corollary 1. 
Let A , B H n + + , then
κ ( A ) + κ ( B ) κ ( A ) κ ( B ) + 1 2 n t r ( A B ) tr ( A ) t r ( B ) κ ( A ) κ ( B ) + 1 κ ( A ) + κ ( B ) 2 .
and
κ ( A ) + κ ( B ) κ ( A ) κ ( B ) + 1 2 n t r ( A B ) tr ( A ) tr ( B ) κ ( A ) κ ( B ) + 1 κ ( A ) + κ ( B ) 2 .
Proof. 
The proof of the above two inequalities are quite similar, thus we only give the details of (14). Let the spectral decomposition of A , B be
A = i = 1 k λ i P i , B = j = 1 l μ j Q j , ( 1 k , l n )
where P i , Q j are orthogonal projection matrix, λ i , μ j are eigenvalues of A and B, separately, and i = 1 k P i = j = 1 l Q j = I n . Thus,
A B = i = 1 k j = 1 l λ i μ j P i Q j ,
and
t r ( A B ) = i = 1 k j = 1 l λ i μ j t r ( P i Q j ) .
Note that
t r ( P i Q j ) 0 , i = 1 k j = 1 l t r ( P i Q j ) = t r i = 1 k j = 1 l P i Q j = n .
Let Ω = { ( i , j ) : i = 1 , , k , j = 1 , , l } , Σ be the σ -fields composed of all subsets of Ω . For any A Σ , define P ( A ) = 1 n ( i , j ) A t r ( P i Q j ) , ξ ( i , j ) = λ i , η ( i , j ) = μ j , ζ = 1 . Hence it’s easily to verified that
E ( ξ η ζ ) = 1 n i = 1 k j = 1 l λ i μ j t r ( P i Q j ) = 1 n t r ( A B ) , E ( ξ ζ ) = 1 n i = 1 k λ i t r ( P i ) = 1 n t r ( A ) , E ( η ζ ) = 1 n j = 1 l μ j t r ( Q j ) = 1 n t r ( B ) ,
and κ ( ξ ) = κ ( A ) , κ ( η ) = κ ( B ) . This completes the proof. □
Corollary 2. 
Let A , B H n + + , x C n , x 0 satisfying that A B = B A , then
κ ( A ) + κ ( B ) κ ( A ) κ ( B ) + 1 2 x * x x * A B x x * A x x * B x κ ( A ) κ ( B ) + 1 κ ( A ) + κ ( B ) 2 .
Proof. 
With homogeneity on x, we assume x * x = 1 . Since A B = B A , thus there exists an unitary matrix U such that
U * A U = d i a g { λ 1 , λ 2 , , λ n } , U * B U = d i a g { μ 1 , μ 2 , , μ n } , U * A B U = d i a g { λ 1 μ 1 , λ 2 μ 2 , , λ n μ n } ,
Let x = U y , then x * x = y * y = 1 , and
x * A x = y * d i a g { λ 1 , λ 2 , , λ n } y = i = 1 n λ i | y i | 2 , x * B x = y * d i a g { μ 1 , μ 2 , , μ n } y = i = 1 n μ i | y i | 2 , x * A B x = y * d i a g { λ 1 μ 1 , λ 2 μ 2 , , λ n μ n } y = i = 1 n λ i μ i | y i | 2 ,
Set Ω = { 1 , 2 , , n } , Σ be the σ -fields composed of all subsets of Ω . For any A Σ , let P ( A ) = i A | y i | 2 , ξ ( i ) = λ i , η ( j ) = μ j , ζ = 1 . We can easily get that
E ( ξ η ζ ) = x * A B x , E ( ξ ζ ) = x * A x , E ( η ζ ) = x * B x .
and κ ( ξ ) = κ ( A ) , κ ( η ) = κ ( B ) . This completes the proof via Theorem 1. □
Remark 1. 
Note that the basic matrix version of the kantorovich inequality (13) can be seen as a special case of Corollary 2. Let B = A 1 in (16), and κ ( A 1 ) = κ ( A ) . From the left-hand side of (16), if follows that
x * A x x * A 1 x x * x 2 κ ( A ) + 1 2 4 κ ( A ) ,
which leads to the basic matrix version of the kantorovich inequality (13).
Now we use inequality (2) and matrix spectral decomposition to derive a Kantorovich type inequality for positive linear functionals on matrix space.
Corollary 3. 
Suppose Φ is a strictly positive linear functional on M n and A H n + + , then
0 Φ A 1 Φ ( I ) Φ ( I ) Φ ( A ) M A m A 2 M A m A .
Proof. 
Let the spectral decomposition of A be A = i = 1 k λ i P i , ( 1 k n ) , where P i are orthogonal projection matrix, λ i are eigenvalues of A. Thus, A 1 = i = 1 k 1 λ i P i .
Note that
Φ ( P i ) 0 , i = 1 k Φ ( P i ) = Φ i = 1 k P i = Φ ( I ) .
Let Ω = { 1 , 2 , , k } , Σ be the σ -fields composed of all subsets of Ω . For any A Σ , let P ( A ) = 1 Φ ( I ) i A Φ ( P i ) , ξ ( i ) = λ i . Hence it’s easily to verified that
E ( ξ ) = 1 P ( I ) i = 1 k λ i Φ ( P i ) = 1 P ( I ) Φ ( A ) , E 1 ξ = 1 P ( I ) i = 1 k 1 λ i Φ ( P i ) = 1 P ( I ) Φ ( A 1 ) .
and
M ξ = M A , m ξ = m A .
This completes the proof by Theorem 1. □
Corollary 4. 
Let Φ be a strictly positive linear functional on M n , and A H n + + , X M n , p , X * X = I p . Then
Φ X * A 1 X 1 Φ X * A X 1 Φ 2 ( I ) Φ X * A 1 X 1 Φ X * A X 4 M A m A M A + m A 2 ,
and
Φ X * A X Φ X * A 1 X 1 Φ ( I ) M A m A 2 .
Proof. 
We first prove (19). By Lemma 6, Φ ( X * A X ) Φ ( X * A X ) 1 Φ 2 ( I ) or equivalently Φ ( X * A X ) 1 Φ 2 ( I ) 1 Φ ( X * A X ) . Consequently, the left-hand side of (19) is obtained.
Replace A with A 1 in the right-hand side of Lemma 5, we have
X * A 1 X 1 4 M A 1 m A 1 M A 1 + m A 1 2 X * A X .
Obviously, 4 M A 1 m A 1 M A 1 + m A 1 2 = 4 M A m A M A + m A 2 . Substitute the result into the above equation, we have
X * A 1 X 1 4 M A m A M A + m A 2 X * A X
That is, Φ X * A 1 X 1 4 M A m A M A + m A 2 Φ X * A X , which leads to the right-hand side of (19).
Next we prove (20). Let Ψ ( Y ) = Φ X * Y X , Y M n . That is, Ψ is a strictly positive linear functional on M n . Hence, by Corollary 3, we have
Ψ A 1 Ψ ( I ) Ψ ( I ) Ψ ( A ) M A m A 2 M A m A .
or equivalently
Φ X * A 1 X Φ ( I ) Φ ( I ) Φ X * A X M A m A 2 M A m A .
If A is replaced with A 1 in (22), then M A 1 m A 1 2 M A 1 m A 1 = M A m A 2 . Hence
Φ X * A X Φ ( I ) Φ ( I ) Φ X * A 1 X M A m A 2 .
By Lemma 6,
Φ X * A 1 X Φ X * A 1 X 1 Φ 2 ( I ) .
Therefore,
Φ ( I ) Φ X * A 1 X = Φ ( I ) Φ X * A 1 X 1 Φ X * A 1 X Φ X * A 1 X 1 Φ X A 1 X 1 Φ ( I ) .
Combined with (23) and (24), we have
Φ X * A X Φ ( I ) Φ X * A 1 X 1 Φ ( I ) Φ X * A X Φ ( I ) Φ ( I ) Φ X A 1 X M A m A 2 .
This completes the proof. □

4. Some Generalizations on the Wielandt Inequality

Firstly, we use Theorem 1 to provide a Wielandt type inequality regarding mathematical expectations. The above corollary can be obtained if η and ζ are replaced by 1 / ξ , η 2 in Theorem 1. Note that κ ( 1 ξ ) = κ ( ξ ) .
Corollary 5. 
Let ξ , η be random variables on probability space ( Ω , Σ , P ) with 0 < E ( η 2 ) < + and there exist constants 0 < m M such that m ξ M , then
E ( ξ η 2 ) E ( η 2 / ξ ) E 2 ( η 2 ) ( M + m ) 2 4 M m .
The basic matrix version of the Wielandt inequality [13,14] is stated as follows. Let A H n + + , x , y C n and x 0 , y 0 , x * y = 0 . Then
x * A y 2 M A m A M A + m A 2 x * A x y * A y .
Ref. [3] provided the following matrix form of the Wielandt inequality. Let A H n + + , X M n , p , Y M n , q are full column rank matrices such that X Y = 0 , then
X * A Y ( Y * A Y ) 1 Y * A X M A m A M A + m A 2 X * A X .
By variational method, [15] proved the following useful results. For x , y C n , define cos θ = | x * y | x * x y * y , then
| x * A y | 2 M A 1 + cos θ m A 1 cos θ M A 1 + cos θ + m A 1 cos θ 2 x * A x y * A y .
Now, we give more generalization on the Wielandt inequality.
Theorem 2. 
Suppose A H n + + , X M n , p , Y M n , q such that X * X = I p , Y * Y = I q , and | | X * Y | | 2 < 1 , then
X * A Y Y * A Y 1 Y * A X M A 1 + | | X * Y | | 2 m A 1 | | X * Y | | 2 M A 1 + | | X * Y | | 2 + m A 1 | | X * Y | | 2 2 X * A X .
Proof. 
Let B = X * Y * A X Y = X * A X X * A Y Y * A X Y * A Y H p + q + . Since m A I n A M A I n , it follows that
m A X * Y * X Y B M A X * Y * X Y .
and
X * Y * X Y = I p X * Y Y * X I q H p + q
In view of Lemma 4, it can be shown that
1 | | X * Y | | 2 I p + q I p X * Y Y * X I q 1 + | | X * Y | | 2 I p + q
Hence with (27)–(29), we have
m A 1 | | X * Y | | 2 I p + q B M A 1 + | | X * Y | | 2 I p + q .
This completes the proof with Lemma 3. □
Remark 2. 
For full column rank matrix X M n , p , Y M n · q , let X 1 = X X * X 1 2 , Y 1 = Y Y * Y 1 2 . Thus X 1 * X 1 = I p , Y 1 * Y 1 = I q , and X 1 * Y 1 = X * X 1 2 X * Y Y * Y 1 2 . From Theorem 2, the following corollary can be obtained.
Corollary 6. 
From Theorem 2, it can been shown
X * A Y Y * A Y 1 Y * A X M A ( 1 + R ) m A ( 1 R ) M A ( 1 + R ) + m A ( 1 R ) 2 X * A X .
where A H n + + and R = : | | ( X * X ) 1 2 X * Y ( Y * Y ) 1 2 | | 2 < 1 .
Remark 3. 
Note that the results of [15] (26) can be seen as a special case of Corollary 6 by setting p = q = 1 in (31).
Remark 4. 
If A H n + + and X M n , p , Y M n , q are both full column rank matrix such that X * Y = 0 , then it follows from (31) that
X * A Y Y * A Y 1 Y * A X M A m A M A + m A 2 X * A X .
Corollary 7. 
Let A , B be two invertible matrices of order n, and x , y C n , x , y 0 , x * y = 0 , such that
min | | u | | 2 = 1 u * A B 1 u 2 = ρ , min | | u | | 2 = 1 u * B A 1 u 2 = σ .
Then
x * A B * y 2 x * A A * x y * B B * y 1 ρ m B B * M A A * .
x * B A * y 2 x * A A * x y * B B * y 1 σ m A A * M B B * .
Proof. 
Without loss of generality, we assume x 2 = y 2 = 1 . Since
x A B 1 x 2 x * A A * x x B B * 1 x ρ M A A * 1 m B B * = ρ m B B * M A A ,
Thus (33) can obtained by Lemma 7. And after exchanging A and B, (34) holds. □

5. Applications in Statistics

5.1. An Application on Correlation Coefficient

For any real matrix X with order n × p , X * = X , where X denotes the transpose of matrix X. Let ρ ξ η be the correlation between two random variables ξ and η .
Suppose the covariance matrix of X = ( X 1 , X 2 , , X n ) is denoted by cov ( X ) = Σ . a 1 , a 2 , , a n are the corresponding standard orthogonalized eigenvectors of Σ . Thus a 1 X , a 2 X , , a n X are called principal components of X. Note that the correlation coefficient between any two principal components should be zero in theory. However, in real data analysis, this condition is difficult to be satisfied as rounding error in actual calculations for the standard orthogonalized eigenvectors. Hence, it’s quite important to estimate the correlation coefficient [16] ρ u v in real world, where u = a X , v = b X , a , b R n . Here the variables are assumed standardized, which are scaled and centered using unit scaling.
We now give a useful bound of correlation coefficient ρ u v .
Corollary 8. 
If | a b | a a b b = cos θ < 1 , then
| ρ u v |   M Σ ( 1 + cos θ ) m Σ ( 1 cos θ ) M Σ ( 1 + cos θ ) + m Σ ( 1 cos θ )
Proof. 
Since cov ( u , v ) = a Σ b = b Σ a , D ( u ) = a Σ a , D ( v ) = b Σ b , then the conclusion holds by replacing x , y with a , b in (32). □
Remark 5. 
The result of [3] can be easily obtained from a special case of Corollary 8. Suppose a and b are orthogonal, the following result holds.
| ρ u v |   M Σ m Σ M Σ + m Σ ,
which leads to the result of [3].

5.2. An Application on Parameter Estimation

Consider the general linear regression model
Y = X β + ϵ ,
with E ( ϵ ) = 0 , cov ( ϵ ) = σ 2 V , V > 0 , X is an n × p column full rank real matrix, β R p is the parameter vector to be estimated. Without loss of generality, X can be assumed orthonormal, that is, the above model can be modified as
Y = X ( X X ) 1 / 2 ( X X ) 1 / 2 β + ϵ , E ( ϵ ) = 0 , cov ( ϵ ) = σ 2 V , V > 0 .
Let Z = X ( X X ) 1 / 2 , γ = ( X X ) 1 / 2 β , hence Z Z = I p , which leads to the standard orthonormal model,
Y = Z γ + ϵ , E ( ϵ ) = 0 , cov ( ϵ ) = σ 2 V , V > 0 , Z Z = I p .
If V is known, the best linear estimator of parameter γ is γ * = ( Z V 1 Z ) 1 Z V 1 y . It follows that
cov ( γ * ) = σ 2 ( Z V 1 Z ) 1 .
If V is unknown, the least squares estimator of parameter γ is γ ^ = Z y . It follows that
cov ( γ ^ ) = σ 2 Z V Z .
For any real vector c R p , the two estimators c γ * , c γ ^ are unbiased. Their variances equal to
D ( c γ * ) = σ 2 c ( Z V 1 Z ) 1 c , D ( c γ ^ ) = σ 2 c Z V Z c .
How to compare the two unbiased estimators is quite important in estimation theory. We now give several indexes to compare the relative efficiency of these estimators.
R E 1 ( γ ^ ) = tr cov γ * tr ( cov ( γ ^ ) ) , R E 2 γ ^ = 1 σ 2 tr ( cov ( γ ^ ) ) tr cov γ * , R E 3 c γ ^ = D c γ * D c γ ^ , R E 5 c γ ^ = 1 σ 2 c c D c γ ^ D c γ * , R E 4 c γ ^ = 1 D c γ * D c γ ^ D c γ * ,
where index R E 1 , R E 2 are based on the comparion of covariance matrix of γ * and γ ^ , and other index R E 3 , R E 4 , R E 5 are based on the various comparion of variance of c γ * and c γ ^ . According to Corollary 8, we have the general results of the above index of relative efficiency as follows.
Corollary 9. 
For index R E 1 , R E 2 , R E 3 , R E 4 , R E 5 , we have
R E 1 ( γ ^ ) 4 M V m V M V + m V 2 . R E 2 ( γ ^ ) p M A m A 2 . R E 3 c γ ^ 4 M V m V M V + m V 2 . R E 4 c γ ^ M A m A 2 . R E 5 c γ ^ M V m V 2 4 M V m V .
Proof. 
The first two inequalities can be obtained by setting Φ ( Y ) = tr ( Y ) ( Y M p ) in Corollary 4. Similarly, the third and forth inequalities can be proved by setting Φ ( Y ) = c Y c . Note that R E 5 = 1 R E 3 1 , then the last inequality can be inferred. □
It’s quite interesting to discuss whether the methods can be applied to partial least squares regression and penalized regression techniques, such as LASSO. However, the discussion becomes quite complex at the moment, and it leads to another direction for our future research.

Author Contributions

Conceptualization, Y.Z. and X.C.; Methodology, Y.Z. and J.L.; Validation, X.C.; Investigation, Y.Z., X.G. and J.L.; Writing—original draft, Y.Z.; Writing—review & editing, X.G., J.L. and X.C.; Supervision, J.L. and X.C.; Funding acquisition, X.C. All authors have read and agreed to the published version of the manuscript.

Funding

The work is supported by Natural Science Foundation of China (No. 12271270), Natural Science Foundation of Jiangsu Province of China (No. BK20200108), the third level training object of the sixth “333 project” in Jiangsu Province and the Zhongwu Youth Innovative Talent Program of Jiangsu University of Technology.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Gustafson, K. The geometrical meaning of the Kantorovich-Wielandt inequalities. Linear Algebra Appl. 1999, 296, 143–151. [Google Scholar] [CrossRef]
  2. Marshall, A.W.; Olkin, I. Matrix versions of Cauchy and Kantorovich inequalities. Aequationes Math. 1990, 40, 89–93. [Google Scholar] [CrossRef]
  3. Wang, S.G.; Cheung, I.W. A matrix version of the Wielandt inequality and its applications to statistics. Linear Algebra Appl. 1999, 296, 171–181. [Google Scholar] [CrossRef]
  4. Liu, S.Z.; Neudecker, H. A survey of Cauchy-Schwarz and Kantorovich-type matrix inequalities. Stat Pap. 1999, 40, 55–73. [Google Scholar] [CrossRef]
  5. Zhang, F. Equivalence of the Wielandt inequality and the Kantorovich inequality. Linear Multilinear Algebra 2001, 48, 275–279. [Google Scholar] [CrossRef]
  6. Moradi, H.R.; Gumus, I.H.; Heydarbeygi, Z. A glimpse at the operator Kantorovich inequality. Linear Multilinear Algebra 2019, 67, 1–6. [Google Scholar] [CrossRef]
  7. Chen, X.P.; Liu, J.Z.; Chen, J.D. A new result on recovery sparse signals using orthogonal matching pursuit. Stat. Theory Relat. Fields 2022, 6, 220–226. [Google Scholar] [CrossRef]
  8. Bhatia, R. Positive Definite Matrices; Princeton University Press: Princeton, NJ, USA, 2007; pp. 141–176. [Google Scholar]
  9. Horn, R.A.; Johnson, C.R. Matrix Analysis; Cambridge University Press: Cambridge, UK, 2012; pp. 108–126. [Google Scholar]
  10. Greub, W.; Rheinboldt, W. On a generalization of an inequality of LV Kantorovich. Proc. Am. Math. Soc. 1959, 10, 407–415. [Google Scholar] [CrossRef]
  11. Lin, M. On an operator Kantorovich inequality for positive linear maps. J. Math. Anal. Appl. 2013, 402, 127–132. [Google Scholar] [CrossRef]
  12. Sabancigil, P.; Kara, M.; Mahmudov, N.I. Higher order Kantorovich-type Szász—CMirakjan operators. J. Inequal. Appl. 2022, 2022, 1–15. [Google Scholar] [CrossRef]
  13. Wang, L.T.; Yang, H. Matrix Euclidean norm Wielandt inequalities and their applications to statistics. Stat. Pap. 2012, 53, 521–530. [Google Scholar] [CrossRef]
  14. Liu, S.Z.; Lu, C.Y.; Puntanen, S. Matrix trace Wielandt inequalities with statistical applications. J. Stat. Plan Inference 2009, 139, 2254–2260. [Google Scholar] [CrossRef]
  15. Yan, Z.Z. A unified version of Cauchy-Schwarz and Wielandt inequalities. Linear Algebra Appl. 2008, 428, 2079–2084. [Google Scholar] [CrossRef]
  16. Liu, S.Z. Efficiency comparisons between the OLSE and the BLUE in a singular linear model. J. Stat. Plan Inference 2000, 84, 191–200. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, Y.; Guo, X.; Liu, J.; Chen, X. Generalizations of the Kantorovich and Wielandt Inequalities with Applications to Statistics. Mathematics 2024, 12, 2860. https://doi.org/10.3390/math12182860

AMA Style

Zhang Y, Guo X, Liu J, Chen X. Generalizations of the Kantorovich and Wielandt Inequalities with Applications to Statistics. Mathematics. 2024; 12(18):2860. https://doi.org/10.3390/math12182860

Chicago/Turabian Style

Zhang, Yunzhi, Xiaotian Guo, Jianzhong Liu, and Xueping Chen. 2024. "Generalizations of the Kantorovich and Wielandt Inequalities with Applications to Statistics" Mathematics 12, no. 18: 2860. https://doi.org/10.3390/math12182860

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop