Next Article in Journal
The Road to Safety: A Review of Uncertainty and Applications to Autonomous Driving Perception
Next Article in Special Issue
Remarks on Limit Theorems for the Free Quadratic Forms
Previous Article in Journal
Congestion Transition on Random Walks on Graphs
Previous Article in Special Issue
A Spectral Investigation of Criticality and Crossover Effects in Two and Three Dimensions: Short Timescales with Small Systems in Minute Random Matrices
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Optimized Tail Bounds for Random Matrix Series

by
Xianjie Gao
1,*,
Mingliang Zhang
2 and
Jinming Luo
3
1
Department of Basic Sciences, Shanxi Agricultural University, Jinzhong 030801, China
2
School of Mathematics and Statistics, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China
3
School of Mathematical Sciences, Dalian University of Technology, Dalian 116024, China
*
Author to whom correspondence should be addressed.
Entropy 2024, 26(8), 633; https://doi.org/10.3390/e26080633
Submission received: 24 June 2024 / Revised: 22 July 2024 / Accepted: 26 July 2024 / Published: 26 July 2024
(This article belongs to the Special Issue Random Matrix Theory and Its Innovative Applications)

Abstract

:
Random matrix series are a significant component of random matrix theory, offering rich theoretical content and broad application prospects. In this paper, we propose modified versions of tail bounds for random matrix series, including matrix Gaussian (or Rademacher) and sub-Gaussian and infinitely divisible (i.d.) series. Unlike present studies, our results depend on the intrinsic dimension instead of ambient dimension. In some cases, the intrinsic dimension is much smaller than ambient dimension, which makes the modified versions suitable for high-dimensional or infinite-dimensional setting possible. In addition, we obtain the expectation bounds for random matrix series based on the intrinsic dimension.

1. Introduction

Random matrix theory is a significant branch of mathematics, which delves into the properties and behavior of random matrices. Its applications span across various fields, including wireless communications [1], combinatorial optimization [2], matrix low-rank approximation [3], neural networks [4,5], and deep learning [6]. Random matrices have a wide range of applications in physics, entropy, and information science. They can provide comprehensive descriptions and analyses when dealing with multiple interacting elements, high-dimensional systems, and complex statistical relationships. Random matrices can better capture the complex interactions between multiple particles or multiple physical processes. When dealing with high-dimensional physical systems with a large number of degrees of freedom, random matrices provide a more natural and effective representation [7,8]. Random matrices can be used to calculate the entropy of complex systems to measure the degree of chaos and uncertainty of the system [9,10]. In the field of information science, random matrices can be used for performance optimization and signal processing of communication systems [11]. Random matrix theory provides a powerful theoretical basis for dealing with problems in these fields. Among them, the random matrix series is an important research topic in the field of random matrix theory, and has wide application and research value.
The study of random matrix theory comprises two branches: asymptotic theory and non-asymptotic theory. There have been several notable asymptotic results in random matrix theory, including Wigner’s semicircle law [12], the Marchenko–Pastur law [13], and the Bai–Yin law [14]. While these asymptotic statements can offer precise limiting results as the matrix dimension approaches infinity, they do not specify the rate at which these probability terms converge to their limits. In response to this challenge, non-asymptotic approaches to analyzing these probability terms have emerged.
Ahlswede and Winter [15] illustrated the application of the Golden–Thompson inequality [16,17] in extending the Laplace transform method to the matrix scenario to derive tail bounds for sums of random matrices. Tropp [18] utilized a corollary of Lieb’s theorem [19] to achieve a significant improvement over the Ahlswede–Winter outcome. To address the notable limitation of results being dependent on the intrinsic dimensions of the matrix, where the bounds become excessively loose in scenarios involving high-dimensional matrices, Hsu et al. [20] presented a tighter analogy to matrix Bernstein’s inequality. Minsker [21] extended Bernstein’s concentration inequalities for random matrices by enhancing the results in [20] through the introduction of the concept of effective rank. Zhang et al. [22] introduced dimension-free tail bounds for the largest singular value of sums of random matrices.
The matrix series form k x k A k has played a crucial role in recent studies [23,24,25], where x k represents a random variable and A k is a fixed matrix. The variable x k can encompass various types of random variables, including Gaussian, Bernoulli, infinitely divisible random variables, and more. Tropp [18] utilized Gaussian series to study the key characteristics of matrix tail bounds. Zhang et al. [26] studied the tail inequalities of the largest eigenvalue of a matrix infinitely divisible (i.d.) series and applied them to optimization problems and compressed sensing.

1.1. Related Works

Consider the sum k = 1 n α k a k , where a 1 , a 2 , , a n are real numbers and α 1 , α 2 , , α n are independent standard Gaussian variables. There is the probability inequality
P { k = 1 n α k a k 2 δ 2 t } e t w h e r e δ 2 : = k = 1 n a k 2 .
Let { A k } k = 1 n be a finite sequence of fixed Hermitian matrices with dimension d. Tropp [18] gave the following result for any t 0 :
P λ max k α k A k 2 η 2 t d · e t w h e r e η 2 : = k A k 2 .
A significant distinction between (1) and (2) is the presence of the matrix dimension factor d in the latter. Hsu et al. [20] obtained the following tail bound:
P λ max k α k A k 2 η 2 t tr ( Ξ ) λ max ( Ξ ) · t e t t 1 w h e r e Ξ = k = 1 n A k 2 .
We observe that the right side of the inequality is the product of two terms. When both terms are smaller, the result will be tighter. Compared with (2), we know that tr ( Ξ ) / λ max ( Ξ ) d but t ( e t t 1 ) 1 > e t for t > 0 . That is, one term of the results (2) and (3) becomes smaller, while the other term becomes larger. In other words, both outcomes have their respective limitations.
Let { β k } k = 1 n be a finite sequence of independent sub-Gaussian random variables. The tail bound can be obtained from
P λ max k β k A k t d · e t 2 / ( 4 c 2 η 2 ) .
where c is an absolute constant.
Let { γ k } k = 1 n be a finite sequence of independent infinitely divisible random variables. Let B 1 , , B n be fixed d-dimensional Hermitian matrices with λ max ( B k ) 1 , k = 1 , , n . For any 0 < t < ρ h ( M ) , Zhang et al. [26] deduced the following results:
P λ max k γ k B k > t d exp ρ · 0 t / ρ h 1 ( s ) d s ,
where h ( M ) : = lim s M h ( s ) and h 1 ( s ) is the inverse of h ( s ) . For any t ρ h ( M ) ,
P λ max k γ k B k > t d exp ρ ϕ ( M ) M t ,
where ρ : = λ max k = 1 K B k 2 .
In addition, Tropp [27] gave the expectation bound for the matrix Gaussian series,
E λ max ( k α k A k ) 2 η 2 log d .
and Zhang et al. [26] also proposed an expectation bound for infinitely divisible matrix series under some given conditions.
However, the significant drawback of the above results lies in the reliance on the ambient dimension of the matrix. The bounds tend to very loose when the matrices have a high dimension. To solve this problem, we optimize the existing theory. Tighter tail bounds for random matrices mean more precise and reliable probability estimates, which enables people to have a more accurate grasp of the behavior of random matrices and helps to improve the accuracy, efficiency and reliability of theory and application.

1.2. Overview of Main Results

With the aim of enhancing the limitations of the existing theory and to complement and refine the existing random matrix theory, we put forward optimized tail and expectation bounds for random matrix series in this paper, including matrix Gaussian (or Rademacher), sub-Gaussian, and infinitely divisible (i.d.) series. This makes the modified version potentially adaptable to high-dimensional or infinite-dimensional matrix settings. Taking the matrix Gaussian series as an example, we obtain the tighter conclusion:
P λ max k α k A k 2 ω 2 t 2 d ˜ · e t f o r t > ω ,
and
E λ max k α k A k 2 ω 2 + log ( 1 + d ˜ )
The d ˜ and ω 2 will be introduced in detail later in the paper.
The rest of this paper is organized as follows. Section 2 introduces some preliminary knowledge on the intrinsic dimension and Gaussian (or Rademacher), sub-Gaussian, and infinitely divisible (i.d.) distributions. Section 3 gives tail and expectation bounds based on the intrinsic dimension bounds for Gaussian (or Rademacher), sub-Gaussian, and infinitely divisible (i.d.) matrix series. The last section concludes the paper.

2. Notations and Preliminaries

In this section, some preliminary knowledge will be provided about the intrinsic dimension of the matrix, and also about Gaussian (or Rademacher), sub-Gaussian, infinitely divisible distributions, and matrix series.

2.1. The Intrinsic Dimension

Existing tail bounds on random matrix series depend on the ambient dimension of the matrix. We introduce the concept of the intrinsic dimension, which is much smaller than the ambient dimension in some cases (see also [27]).
Definition 1.
For a positive-semidefinite matrix S , the intrinsic dimension is defined as
intdim ( S ) = tr S S .
It can be seen from the definition that the intrinsic dimensions do not significantly affected by changes in the size of the matrix. Actually, when the eigenvalues of S decrease very powerfully, the intrinsic dimension is much smaller than the ambient dimension.

2.2. Several Distributions

In this section, we briefly introduce three random distributions and their moment generating functions, including Gaussian (or Rademacher), sub-Gaussian, and infinitely divisible (i.d.) distributions.
The Gaussian distribution is a very important continuous distribution in probability theory and statistics, and is often used to represent real-valued random variables with unknown distribution. Given a Gaussian variable α , the moment generating function (mgf) is given by
E e θ α = e θ 2 / 2 , θ R .
The Rademacher distribution is a discrete probability distribution in which the random variable takes on the value of 1 or 1 with probability 1 2 . Given a Rademacher variable ξ , the moment generating function is given by
E e θ ξ e θ 2 / 2 , θ R .
The sub-Gaussian distribution has strong tail decay, including many distributions, such as uniform and all bounded random distributions. Given a central sub-Gaussian random variable β , it holds that
E e θ β e c 2 θ 2 , θ R
where c is an absolute constant.
Infinitely divisible (i.d.) distributions are referring to a large class of probability distributions that play an important role in probability theory with limit theorems. A random variable γ has an i.d. distribution if, for any n N + , there exists independent and identically distributed (i.i.d.) random variables γ 1 , , γ n , such that γ has the same distribution as γ 1 + + γ n .
In discrete distributions, infinitely divisible distributions include Poisson distribution, negative binomial distribution, and geometric distribution. Among the continuous distributions, Cauchy distribution, Lévy distribution, stable distribution and Gamma distribution are examples of infinitely divisible distributions.
A real-valued random variable γ is i.d. if and only if there exists a triplet ( b , σ 2 , ν ) , where the characteristic function of γ is defined by
E { e i θ γ } = exp i b θ σ 2 θ 2 2 + R e i θ u 1 i θ u 1 | u | < 1 ν ( d u ) , θ R
where b R , σ 0 and ν is a Lévy measure. This necessary and sufficient condition is Lévy–Khintchine Theorem.
Let γ be an i.d. random variable with the triplet ( b , σ 2 , ν ) , and suppose that E γ = 0 . Let M : = sup { θ 0 : E { e θ | γ | } < + } . For any λ 1 and 0 < θ < M ,
E e λ θ γ exp σ 2 θ 2 λ 2 2 + λ 2 R e θ | u | θ | u | 1 ν ( d u ) .
The proof can be referred to in [26].

2.3. Random Matrix Series

Given n fixed matrices A 1 , A 2 , , A n , a random matrix series is represented as k = 1 n x k A k , where x 1 , x 2 , , x n are independent variables. The tail and expectation bounds for random matrix series can be bounded to P λ max k x k A k t and E λ max k x k A k .

3. Intrinsic Dimension Bounds for Matrix Series

In this section, we present tail bounds for random matrix series based on intrinsic dimension bounds, and also obtain the expectation bounds.

3.1. Matrix Gaussian (or Rademacher) Series with Intrinsic Dimension

This section presents the tail and expectation bounds for matrix Gaussian (or Rademacher) series with an intrinsic dimension.
Theorem 1.
Consider a finite sequence { A k : k = 1 , . . . , n } of fixed Hermitian matrices with the same dimensional d, with { α k } being a finite sequence of independent Gaussian (or Rademacher) variables. Introduce the matrix M k A k 2 . Define the following parameters:
d ˜ = intdim ( M ) a n d ω 2 = M
Then, it holds that
P λ max k α k A k t 2 d ˜ · e t 2 / ( 2 ω 2 ) f o r t > ω .
Compared with the previous results in (2) and (3), our result in (15) improves upon their respective shortcomings, and is more tight. Therefore, our bound is more applicable for the case of high-dimensional matrices.
Theorem 2.
Given a matrix Gaussian (or Rademacher) series k α k A k , then it holds that
E λ max k α k A k 2 ω 2 + log ( 1 + d ˜ )
Compared with the previous result in (7), our result in (16) depends on the intrinsic dimensions of the matrix, and is more applicable for the case of high-dimensional matrices.
The proofs of Theorems 1 and 2 are similar to the proofs of sub-Gaussian matrix series; we omit them here.

3.2. Matrix Sub-Gaussian Series with Intrinsic Dimension

This section presents the tail and expectation bounds for matrix sub-Gaussian series with an intrinsic dimension.
Theorem 3.
Consider a finite sequence { A k : k = 1 , . . . , n } of fixed Hermitian matrices with the same dimensional d, with { β k } being a finite sequence of independent central sub-Gaussian variables. Introduce the matrix M k A k 2 . Define the following parameters:
d ˜ = intdim ( M ) a n d ω 2 = M
Then, it holds that
P λ max k β k A k t 2 d ˜ · e t 2 / ( 4 c 2 ω 2 ) f o r t > 2 c ω .
where c is an absolute constant.
Before proving this theorem, we first introduce a proposition [27] that will be used in the proof process. This proposition is a key step in our proof.
Proposition 1.
Let Y be a random Hermitian matrix. Let ψ : R R + be a nonnegative function that is nondecreasing on [ 0 , ) . For each t 0 ,
P { λ max ( Y ) t } 1 ψ ( t ) E tr ψ ( Y ) .
Proof. 
Let the sum
Y = k β k A k .
Fix a number θ > 0 , and define the function ψ ( t ) = max { 0 , e θ t 1 } for t R . For t 0 , Proposition 1 states that
P { λ max ( Y ) t } 1 ψ ( t ) E tr ψ ( Y ) = 1 e θ t 1 E tr ( e θ Y I ) .
Introduce the matrix M k A k 2 . According to the mgf of a sub-Gaussian random variable in (12) and the transfer rule (consider a real-valued function f, if f ( a ) g ( a ) for a I , then f ( A ) g ( A ) , when the eigenvalues of A lie in I), it can be known that
E tr e θ Y tr exp ( θ 2 c 2 k A k 2 ) = tr exp ( g ( θ ) · M ) w h e r e g ( θ ) = c 2 θ 2 .
Introduce the function φ ( a ) = e a 1 , and observe that
E tr ( e θ Y I ) intdim ( M ) φ g ( θ ) M .
Define the following parameters:
d ˜ = intdim ( M ) a n d ω 2 = M
We have
E tr ( e θ Y I ) d ˜ · φ g ( θ ) ω 2 d ˜ · e g ( θ ) · ω 2 .
Next, combine the bound (21) and the probability bound to obtain
P { λ max ( Y ) t } d ˜ · e θ t e θ t 1 · e θ t + g ( θ ) · ω 2 d ˜ · ( 1 + 1 θ t ) · e θ t + g ( θ ) · ω 2 .
We use the following formula to control the fraction:
e a e a 1 = 1 + 1 e a 1 1 + 1 a f o r a 0 .
We select θ = t / ( 2 c 2 ω 2 ) to obtain
P λ max k β k A k t d ˜ ( 1 + 2 c 2 ω 2 t 2 ) e t 2 / ( 4 c 2 ω 2 )
Install the assumption that t > 2 c ω and yield the conclusion. □
Since the large deviation inequality considers the case where t is large, the limitation t > 2 c ω is reasonable.
Theorem 4.
Given a matrix sub-Gaussian series k β k A k , then it holds that
E λ max k β k A k 2 c ω 2 + log ( 1 + d ˜ ) .
Proof. 
Fix a number t > μ > 2 c ω .
E λ max k β k A k = 0 P { λ max k β k A k t } d t 0 μ P { λ max k β k A k t } d t + 2 d ˜ μ e t 2 / ( 4 c 2 ω 2 ) d t μ + 2 d ˜ μ e t 2 / ( 4 c 2 ω 2 ) d t μ + 2 d ˜ 2 c ω μ t 2 c 2 ω 2 e t 2 / ( 4 c 2 ω 2 ) d t = μ + 2 d ˜ · 2 c ω e μ 2 / ( 4 c 2 ω 2 )
Select μ = 2 c ω log ( 1 + d ˜ ) ,
E λ max k β k A k 2 c ω log ( 1 + d ˜ ) + 2 d ˜ · 2 c ω e log ( 1 + d ˜ ) = 2 c ω log ( 1 + d ˜ ) + 2 d ˜ 1 + d ˜ 2 c ω 2 c ω log ( 1 + d ˜ ) + 2 2 c ω = 2 c ω 2 + log ( 1 + d ˜ )

3.3. Matrix Infinite Divisible Series with Intrinsic Dimension

This section presents the tail and expectation bounds for matrix i.d. series with an intrinsic dimension.
Theorem 5.
Consider a finite sequence { B k : k = 1 , . . . , n } of fixed Hermitian matrices with the same dimensional d, λ max ( B k ) 1 , with { γ k } being a finite sequence of independent centered i.d. with the triplet ( b , σ 2 , ν ) variable, such that E e θ | γ | < + for some θ > 0 . Introduce the matrix M k B k 2 . Define the following parameters:
d ˜ = intdim ( M ) a n d ω 2 = M
Then, holds that
P λ max k γ k B k t 2 d ˜ · exp ω 2 · 0 t / ω 2 h 1 ( s ) d s f o r h ( M ) ω 2 > t > 1 h 1 ( t / ω 2 ) .
where h ( M ) is the left limit at M, with
M : = sup { θ > 0 : E e θ | γ | < + } ,
and h 1 is the inverse of
h ( s ) = σ 2 s + R | u | I ( e s | u | 1 ) ν ( d u ) , 0 < s < M .
For any t h ( M ) ω 2 , we have
P λ max k γ k B k > t 2 d ˜ · exp ω 2 ϕ ( M ) M t ,
where
ϕ ( θ ) : = σ 2 θ 2 2 + R e θ | u | θ | u | 1 ν ( d u ) .
Compared with the previous result in [26], our results depend on the intrinsic dimensions of the matrix and are more applicable for the case of high-dimensional matrices.
Proof. 
Let the sum
Y = k γ k B k .
Introduce the matrix M k B k 2 . Similar to the above proof, according to the mgf of i.d. random variable in (14) and the transfer rule, we can obtain
P { λ max ( Y ) t } d ˜ · e θ t e θ t 1 · e θ t + ϕ ( θ ) · ω 2 d ˜ · ( 1 + 1 θ t ) · e θ t + ϕ ( θ ) · ω 2 .
Next, we minimize the right-hand side of (29) with respect to θ . Since E e θ γ < + for all 0 < θ < M , ϕ ( θ ) is infinitely differentiable on ( 0 , M ) , with
ϕ ( θ ) : = h ( θ ) = σ 2 θ + R | u | e θ | u | 1 ν ( d u ) > 0 ,
and
ϕ ( θ ) = σ 2 + R | u | 2 e θ | u | ν ( d u ) > 0 .
Since ϕ ( 0 ) = h ( 0 ) = h 1 ( 0 ) = 0 , we have
ϕ h 1 ( t / ω 2 ) = 0 h 1 ( t / ω 2 ) h ( s ) d s = 0 t / ω 2 s d h 1 ( s ) = ( t / ω 2 ) · h 1 ( t / ω 2 ) 0 t / ω 2 h 1 ( s ) d s .
We select θ = h 1 ( t / ω 2 ) to obtain
min 0 < θ < M ω 2 · ϕ ( θ ) θ · t = ω 2 · ϕ h 1 ( t / ω 2 ) t · h 1 ( t / ω 2 ) = ω 2 · 0 t / ω 2 h 1 ( s ) d s .
Install the assumption that t > 1 h 1 ( t / ω 2 ) ; we have
P λ max k γ k B k t 2 d ˜ · exp ω 2 · 0 t / ω 2 h 1 ( s ) d s .
Actually, when t h ( M ) ω 2 , according to the convexity of ω 2 ϕ ( θ ) θ t with respect to θ > 0 and the monotonicity of h 1 ( s ) ( s > 0 ), the solution to the optimization problem is θ = M . Thus, for any t h ( M ) ω 2 , we have
P λ max k γ k B k > t 2 d ˜ · exp ω 2 ϕ ( M ) M t ,
Given some specific settings of the measure ν , we can obtain the following corollary.
Corollary 1.
Assume ν has a bounded support, i.e., there exists a positive constant a < such that ν ( ( , a ) ( a , ) ) = 0 and ν ( [ a , a ] ) 0 . Let
R = inf { a > 0 : ν ( { u : | u | > a } ) = 0 } .
It follows that R < . Then, for any t > 0 ,
P λ max k γ k B k > t 2 d ˜ · exp ω 2 ( σ 2 + V ) R 2 · Q R t ω 2 ( σ 2 + V ) ,
where V : = R | u | 2 ν ( d u ) , and
Q ( s ) : = ( 1 + s ) · log ( 1 + s ) s .
Proof. 
Since the support is supp ( ν ) [ R , R ] , it holds that E e θ | γ | < + for any θ > 0 . Thus, we have
h ( θ ) = σ 2 θ + R | u | e θ | u | 1 ν ( d u ) = σ 2 θ + | u | R | u | 2 k = 1 θ k | u | k 1 k ! ν ( d u ) σ 2 θ + | u | R | u | 2 k = 1 θ k R k 1 k ! ν ( d u ) = σ 2 θ + V e θ R 1 R ( σ 2 + V ) e θ R 1 R .
Denote p ( θ ) : = ( σ 2 + V ) e θ R 1 R with the inverse function p 1 ( s ) = 1 R · log 1 + R s σ 2 + V ( s > 0 ) . Since h ( θ ) and p ( θ ) ( θ > 0 ) are strictly increasing functions, their inverse functions satisfy the relation p 1 ( s ) h 1 ( s ) for all s > 0 . By combining (27) and (36), we obtain, for any t > 1 h 1 ( t / ω 2 ) ,
P λ max k γ k B k > t 2 d ˜ · exp ω 2 · 0 t / ω 2 h 1 ( s ) d s 2 d ˜ · exp ω 2 · 0 t / ω 2 1 R · log 1 + R s σ 2 + V d s = 2 d ˜ · exp ω 2 ( σ 2 + V ) R 2 · q R t ω 2 ( σ 2 + V ) ,
where q ( s ) : = ( 1 + s ) · log ( 1 + s ) s . This completes the proof. □
Given a matrix i.d. series k γ k B k , then it holds that
E λ max k γ k B k = 0 P { λ max k γ k B k t } d t .
In other words, in the case where the tail bound is integrable, we can use Formula (37) to obtain the expectation bound based on the intrinsic dimensions for matrix i.d. series.
Compared with existing studies, our results are based on the intrinsic dimension of the matrix. The tail and expectation bounds are tighter than the previous results. Therefore, our bounds are more applicable for the case of high-dimensional matrices.
In addition, by using the Hermitian dilation, our results can also be extended to the scenario of non-Hermitian random matrix series. Consider that the general random matrix series k x k C k , k x k C k = λ max k x k φ ( C k ) is established, among which
φ ( C k ) : = 0 C k C k * 0 .
Thus, we may invoke each theorem to obtain tail and expectation bounds for the norm of the random matrix series.

4. Conclusions

In this paper, we propose optimized tail and expectation bounds for random matrix series, including matrix Gaussian (or Rademacher) and sub-Gaussian and infinitely divisible (i.d.) series. Different from existing studies, our results depend on intrinsic dimension rather than ambient dimension, and are more suitable for the case of high-dimensional matrices.
In future work, we will use the obtained results to study tail bounds and expectation bounds for other eigenvalues of random matrix series.

Author Contributions

Conceptualization, X.G., M.Z. and J.L.; methodology, X.G., M.Z. and J.L.; validation, X.G., M.Z. and J.L.; resources, X.G.; writing—original draft preparation, X.G.; writing—review and editing, M.Z. and J.L.; supervision, X.G., M.Z. and J.L.; project administration, X.G.; funding acquisition, X.G. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (12101378); Shanxi Provincial Research Foundation for Basic Research, China (20210302124548).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

We are grateful to the anonymous reviewers and the editors for their valuable comments and suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Tulino, A.M.; Verdú, S. Random matrix theory and wireless communications. Found. Trends Commun. 2004, 1, 1–182. [Google Scholar] [CrossRef]
  2. Naor, A.; Regev, O.; Vidick, T. Efficient rounding for the noncommutative grothendieck inequality. In Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing, Palo Alto, CA, USA, 1–4 June 2013; pp. 71–80. [Google Scholar]
  3. Gittens, A.; Mahoney, M.W. Revisiting the nyström method for improved large-scale machine learning. J. Mach. Learn. Res. 2016, 17, 3977–4041. [Google Scholar]
  4. Louart, C.; Liao, Z.; Couillet, R. A random matrix approach to neural networks. Ann. Appl. Probab. 2018, 28, 1190–1248. [Google Scholar] [CrossRef]
  5. Wang, Z.; Zhu, Y. Deformed semicircle law and concentration of nonlinear random matrices for ultra-wide neural networks. Ann. Appl. Probab. 2024, 34, 1896–1947. [Google Scholar] [CrossRef]
  6. Martin, C.H.; Mahoney, M.W. Implicit self-regularization in deep neural networks: Evidence from random matrix theory and implications for learning. J. Mach. Learn. Res. 2021, 22, 1–73. [Google Scholar]
  7. Wigner, E.P. Random matrices in physics. SIAM Rev. 1967, 9, 1–23. [Google Scholar] [CrossRef]
  8. Guhr, T.; Müller-Groeling, A.; Weidenmüller, H.A. Random-matrix theories in quantum physics: Common concepts. Phys. Rep. 1998, 299, 189–425. [Google Scholar] [CrossRef]
  9. Bufetov, A.; Mkrtchyan, S.; Shcherbina, M.; Soshnikov, A. Entropy and the Shannon-McMillan-Breiman theorem for beta random matrix ensembles. J. Stat. Phys. 2013, 152, 1–14. [Google Scholar] [CrossRef]
  10. Calabrese, P.; Le Doussal, P.; Majumdar, S.N. Random matrices and entanglement entropy of trapped Fermi gases. Phys. Rev. A 2015, 91, 012303. [Google Scholar] [CrossRef]
  11. Collins, B.; Nechita, I. Random matrix techniques in quantum information theory. J. Math. Phys. 2016, 57. [Google Scholar] [CrossRef]
  12. Wigner, E.P. On the distribution of the roots of certain symmetric matrices. Ann. Math. 1958, 67, 325–327. [Google Scholar] [CrossRef]
  13. Marchenko, V.A.; Pastur, L.A. Distribution of eigenvalues for some sets of random matrices. Mat. Sb. 1967, 114, 507–536. [Google Scholar]
  14. Bai, Z.D.; Yin, Y.Q. Limit of the smallest eigenvalue of a large dimensional sample covariance matrix. Ann. Probab. 1993, 21, 1275–1294. [Google Scholar] [CrossRef]
  15. Ahlswede, R.; Winter, A. Strong converse for identification via quantum channels. IEEE Trans. Inf. Theory 2002, 48, 569–579. [Google Scholar] [CrossRef]
  16. Golden, S. Lower bounds for the Helmholtz function. Phys. Rev. 1965, 137, B1127. [Google Scholar] [CrossRef]
  17. Thompson, C.J. Inequality with applications in statistical mechanics. J. Math. Phys. 1965, 6, 1812–1813. [Google Scholar] [CrossRef]
  18. Tropp, J.A. User-friendly tail bounds for sums of random matrices. Found. Comput. Math. 2012, 12, 389–434. [Google Scholar] [CrossRef]
  19. Lieb, E.H. Convex trace functions and the Wigner-Yanase-Dyson conjecture. Adv. Math. 1973, 11, 267–288. [Google Scholar] [CrossRef]
  20. Hsu, D.; Kakade, S.M.; Zhang, T. Tail inequalities for sums of random matrices that depend on the intrinsic dimension. Electron. Commun. Probab. 2012, 17, 1–13. [Google Scholar] [CrossRef]
  21. Minsker, S. On some extensions of bernstein’s inequality for self-adjoint operators. Stat. Probab. Lett. 2017, 127, 111–119. [Google Scholar] [CrossRef]
  22. Zhang, C.; Du, L.; Tao, D. Lsv-based tail inequalities for sums of random matrices. Neural Comput. 2016, 29, 247–262. [Google Scholar] [CrossRef] [PubMed]
  23. Zhao, L.; Liao, S.; Wang, Y.; Li, Z.; Tang, J.; Yuan, B. Theoretical properties for neural networks with weight matrices of low displacement rank. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 4082–4090. [Google Scholar]
  24. Choromanski, K.; Sindhwani, V. Recycling randomness with structure for sublinear time kernel expansions. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 2502–2510. [Google Scholar]
  25. Cheng, Y.; Yu, F.X.; Feris, R.S.; Kumar, S.; Choudhary, A.; Chang, S.F. An exploration of parameter redundancy in deep networks with circulant projections. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 2857–2865. [Google Scholar]
  26. Zhang, C.; Gao, X.; Hsieh, M.H.; Hang, H.; Tao, D. Matrix infinitely divisible series: Tail inequalities and their applications. IEEE Trans. Inf. Theory 2019, 66, 1099–1117. [Google Scholar] [CrossRef]
  27. Tropp, J.A. An Introduction to Matrix Concentration Inequalities; Foundations and Trends® in Machine Learning: Hanover, MA, USA, 2015; Volume 8, pp. 1–230. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gao, X.; Zhang, M.; Luo, J. Optimized Tail Bounds for Random Matrix Series. Entropy 2024, 26, 633. https://doi.org/10.3390/e26080633

AMA Style

Gao X, Zhang M, Luo J. Optimized Tail Bounds for Random Matrix Series. Entropy. 2024; 26(8):633. https://doi.org/10.3390/e26080633

Chicago/Turabian Style

Gao, Xianjie, Mingliang Zhang, and Jinming Luo. 2024. "Optimized Tail Bounds for Random Matrix Series" Entropy 26, no. 8: 633. https://doi.org/10.3390/e26080633

APA Style

Gao, X., Zhang, M., & Luo, J. (2024). Optimized Tail Bounds for Random Matrix Series. Entropy, 26(8), 633. https://doi.org/10.3390/e26080633

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop