Optimized Tail Bounds for Random Matrix Series

Gao, Xianjie; Zhang, Mingliang; Luo, Jinming

doi:10.3390/e26080633

Open AccessArticle

Optimized Tail Bounds for Random Matrix Series

by

Xianjie Gao

^1,*

,

Mingliang Zhang

² and

Jinming Luo

³

¹

Department of Basic Sciences, Shanxi Agricultural University, Jinzhong 030801, China

²

School of Mathematics and Statistics, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China

³

School of Mathematical Sciences, Dalian University of Technology, Dalian 116024, China

^*

Author to whom correspondence should be addressed.

Entropy 2024, 26(8), 633; https://doi.org/10.3390/e26080633

Submission received: 24 June 2024 / Revised: 22 July 2024 / Accepted: 26 July 2024 / Published: 26 July 2024

(This article belongs to the Special Issue Random Matrix Theory and Its Innovative Applications)

Download Versions Notes

Abstract

:

Random matrix series are a significant component of random matrix theory, offering rich theoretical content and broad application prospects. In this paper, we propose modified versions of tail bounds for random matrix series, including matrix Gaussian (or Rademacher) and sub-Gaussian and infinitely divisible (i.d.) series. Unlike present studies, our results depend on the intrinsic dimension instead of ambient dimension. In some cases, the intrinsic dimension is much smaller than ambient dimension, which makes the modified versions suitable for high-dimensional or infinite-dimensional setting possible. In addition, we obtain the expectation bounds for random matrix series based on the intrinsic dimension.

Keywords:

tail bound; intrinsic dimension; random matrix series; expectation bound

1. Introduction

Random matrix theory is a significant branch of mathematics, which delves into the properties and behavior of random matrices. Its applications span across various fields, including wireless communications [1], combinatorial optimization [2], matrix low-rank approximation [3], neural networks [4,5], and deep learning [6]. Random matrices have a wide range of applications in physics, entropy, and information science. They can provide comprehensive descriptions and analyses when dealing with multiple interacting elements, high-dimensional systems, and complex statistical relationships. Random matrices can better capture the complex interactions between multiple particles or multiple physical processes. When dealing with high-dimensional physical systems with a large number of degrees of freedom, random matrices provide a more natural and effective representation [7,8]. Random matrices can be used to calculate the entropy of complex systems to measure the degree of chaos and uncertainty of the system [9,10]. In the field of information science, random matrices can be used for performance optimization and signal processing of communication systems [11]. Random matrix theory provides a powerful theoretical basis for dealing with problems in these fields. Among them, the random matrix series is an important research topic in the field of random matrix theory, and has wide application and research value.

The study of random matrix theory comprises two branches: asymptotic theory and non-asymptotic theory. There have been several notable asymptotic results in random matrix theory, including Wigner’s semicircle law [12], the Marchenko–Pastur law [13], and the Bai–Yin law [14]. While these asymptotic statements can offer precise limiting results as the matrix dimension approaches infinity, they do not specify the rate at which these probability terms converge to their limits. In response to this challenge, non-asymptotic approaches to analyzing these probability terms have emerged.

Ahlswede and Winter [15] illustrated the application of the Golden–Thompson inequality [16,17] in extending the Laplace transform method to the matrix scenario to derive tail bounds for sums of random matrices. Tropp [18] utilized a corollary of Lieb’s theorem [19] to achieve a significant improvement over the Ahlswede–Winter outcome. To address the notable limitation of results being dependent on the intrinsic dimensions of the matrix, where the bounds become excessively loose in scenarios involving high-dimensional matrices, Hsu et al. [20] presented a tighter analogy to matrix Bernstein’s inequality. Minsker [21] extended Bernstein’s concentration inequalities for random matrices by enhancing the results in [20] through the introduction of the concept of effective rank. Zhang et al. [22] introduced dimension-free tail bounds for the largest singular value of sums of random matrices.

The matrix series form

\sum_{k} x_{k} A_{k}

has played a crucial role in recent studies [23,24,25], where

x_{k}

represents a random variable and

A_{k}

is a fixed matrix. The variable

x_{k}

can encompass various types of random variables, including Gaussian, Bernoulli, infinitely divisible random variables, and more. Tropp [18] utilized Gaussian series to study the key characteristics of matrix tail bounds. Zhang et al. [26] studied the tail inequalities of the largest eigenvalue of a matrix infinitely divisible (i.d.) series and applied them to optimization problems and compressed sensing.

1.1. Related Works

Consider the sum

\sum_{k = 1}^{n} α_{k} a_{k}

, where

a_{1}, a_{2}, \dots, a_{n}

are real numbers and

α_{1}, α_{2}, \dots, α_{n}

are independent standard Gaussian variables. There is the probability inequality

P {\sum_{k = 1}^{n} α_{k} a_{k} \geq \sqrt{2 δ^{2} t}} \leq e^{- t} w h e r e δ^{2} : = \sum_{k = 1}^{n} a_{k}^{2} .

(1)

Let

{A_{k}}_{k = 1}^{n}

be a finite sequence of fixed Hermitian matrices with dimension d. Tropp [18] gave the following result for any

t \geq 0

:

P \{λ_{max} (\sum_{k} α_{k} A_{k}) \geq \sqrt{2 η^{2} t}\} \leq d \cdot e^{- t} w h e r e η^{2} : = ∥ \sum_{k} A_{k}^{2} ∥ .

(2)

A significant distinction between (1) and (2) is the presence of the matrix dimension factor d in the latter. Hsu et al. [20] obtained the following tail bound:

P \{λ_{max} (\sum_{k} α_{k} A_{k}) \geq \sqrt{2 η^{2} t}\} \leq \frac{tr (Ξ)}{λ_{max} (Ξ)} \cdot \frac{t}{e^{t} - t - 1} w h e r e Ξ = \sum_{k = 1}^{n} A_{k}^{2} .

(3)

We observe that the right side of the inequality is the product of two terms. When both terms are smaller, the result will be tighter. Compared with (2), we know that

tr (Ξ) / λ_{max} (Ξ) \leq d

but

t {(e^{t} - t - 1)}^{- 1} > e^{- t}

for

t > 0

. That is, one term of the results (2) and (3) becomes smaller, while the other term becomes larger. In other words, both outcomes have their respective limitations.

Let

{β_{k}}_{k = 1}^{n}

be a finite sequence of independent sub-Gaussian random variables. The tail bound can be obtained from

P \{λ_{max} (\sum_{k} β_{k} A_{k}) \geq t\} \leq d \cdot e^{- t^{2} / (4 c^{2} η^{2})} .

(4)

where c is an absolute constant.

Let

{γ_{k}}_{k = 1}^{n}

be a finite sequence of independent infinitely divisible random variables. Let

B_{1}, \dots, B_{n}

be fixed d-dimensional Hermitian matrices with

λ_{max} (B_{k}) \leq 1

,

k = 1, \dots, n

. For any

0 < t < ρ h (M^{-})

, Zhang et al. [26] deduced the following results:

\begin{matrix} P \{λ_{max} (\sum_{k} γ_{k} B_{k}) > t\} \\ \leq d exp (- ρ \cdot \int_{0}^{t / ρ} h^{- 1} (s) d s), \end{matrix}

(5)

where

h (M^{-}) : = {lim}_{s ↑ M} h (s)

and

h^{- 1} (s)

is the inverse of

h (s)

. For any

t \geq ρ h (M^{-})

,

P \{λ_{max} (\sum_{k} γ_{k} B_{k}) > t\} \leq d exp (ρ ϕ (M) - M t),

(6)

where

ρ : = λ_{max} (\sum_{k = 1}^{K} B_{k}^{2})

.

In addition, Tropp [27] gave the expectation bound for the matrix Gaussian series,

E λ_{max} (\sum_{k} α_{k} A_{k}) \leq \sqrt{2 η^{2} log d} .

(7)

and Zhang et al. [26] also proposed an expectation bound for infinitely divisible matrix series under some given conditions.

However, the significant drawback of the above results lies in the reliance on the ambient dimension of the matrix. The bounds tend to very loose when the matrices have a high dimension. To solve this problem, we optimize the existing theory. Tighter tail bounds for random matrices mean more precise and reliable probability estimates, which enables people to have a more accurate grasp of the behavior of random matrices and helps to improve the accuracy, efficiency and reliability of theory and application.

1.2. Overview of Main Results

With the aim of enhancing the limitations of the existing theory and to complement and refine the existing random matrix theory, we put forward optimized tail and expectation bounds for random matrix series in this paper, including matrix Gaussian (or Rademacher), sub-Gaussian, and infinitely divisible (i.d.) series. This makes the modified version potentially adaptable to high-dimensional or infinite-dimensional matrix settings. Taking the matrix Gaussian series as an example, we obtain the tighter conclusion:

P \{λ_{max} (\sum_{k} α_{k} A_{k}) \geq \sqrt{2 ω^{2} t}\} \leq 2 \tilde{d} \cdot e^{- t} f o r t > ω,

(8)

and

E λ_{max} (\sum_{k} α_{k} A_{k}) \leq \sqrt{2} ω (\sqrt{2} + \sqrt{log (1 + \tilde{d})})

(9)

The

\tilde{d}

and

ω^{2}

will be introduced in detail later in the paper.

The rest of this paper is organized as follows. Section 2 introduces some preliminary knowledge on the intrinsic dimension and Gaussian (or Rademacher), sub-Gaussian, and infinitely divisible (i.d.) distributions. Section 3 gives tail and expectation bounds based on the intrinsic dimension bounds for Gaussian (or Rademacher), sub-Gaussian, and infinitely divisible (i.d.) matrix series. The last section concludes the paper.

2. Notations and Preliminaries

In this section, some preliminary knowledge will be provided about the intrinsic dimension of the matrix, and also about Gaussian (or Rademacher), sub-Gaussian, infinitely divisible distributions, and matrix series.

2.1. The Intrinsic Dimension

Existing tail bounds on random matrix series depend on the ambient dimension of the matrix. We introduce the concept of the intrinsic dimension, which is much smaller than the ambient dimension in some cases (see also [27]).

Definition 1.

For a positive-semidefinite matrix

S

, the intrinsic dimension is defined as

intdim (S) = \frac{tr S}{∥ S ∥} .

It can be seen from the definition that the intrinsic dimensions do not significantly affected by changes in the size of the matrix. Actually, when the eigenvalues of

S

decrease very powerfully, the intrinsic dimension is much smaller than the ambient dimension.

2.2. Several Distributions

In this section, we briefly introduce three random distributions and their moment generating functions, including Gaussian (or Rademacher), sub-Gaussian, and infinitely divisible (i.d.) distributions.

The Gaussian distribution is a very important continuous distribution in probability theory and statistics, and is often used to represent real-valued random variables with unknown distribution. Given a Gaussian variable

α

, the moment generating function (mgf) is given by

E e^{θ α} = e^{θ^{2} / 2}, \forall θ \in R .

(10)

The Rademacher distribution is a discrete probability distribution in which the random variable takes on the value of 1 or

- 1

with probability

\frac{1}{2}

. Given a Rademacher variable

ξ

, the moment generating function is given by

E e^{θ ξ} \leq e^{θ^{2} / 2}, \forall θ \in R .

(11)

The sub-Gaussian distribution has strong tail decay, including many distributions, such as uniform and all bounded random distributions. Given a central sub-Gaussian random variable

β

, it holds that

E e^{θ β} \leq e^{c^{2} θ^{2}}, \forall θ \in R

(12)

where c is an absolute constant.

Infinitely divisible (i.d.) distributions are referring to a large class of probability distributions that play an important role in probability theory with limit theorems. A random variable

γ

has an i.d. distribution if, for any

n \in N_{+}

, there exists independent and identically distributed (i.i.d.) random variables

γ_{1}, \dots, γ_{n}

, such that

γ

has the same distribution as

γ_{1} + \dots + γ_{n}

.

In discrete distributions, infinitely divisible distributions include Poisson distribution, negative binomial distribution, and geometric distribution. Among the continuous distributions, Cauchy distribution, Lévy distribution, stable distribution and Gamma distribution are examples of infinitely divisible distributions.

A real-valued random variable

γ

is i.d. if and only if there exists a triplet

(b, σ^{2}, ν)

, where the characteristic function of

γ

is defined by

E {e^{i θ γ}} = exp (i b θ - \frac{σ^{2} θ^{2}}{2} + \int_{R} (e^{i θ u} - 1 - i θ u 1_{| u | < 1}) ν (d u)), \forall θ \in R

(13)

where

b \in R

,

σ \geq 0

and

ν

is a Lévy measure. This necessary and sufficient condition is Lévy–Khintchine Theorem.

Let

γ

be an i.d. random variable with the triplet

(b, σ^{2}, ν)

, and suppose that

E γ = 0

. Let

M : = sup {θ \geq 0 : E {e^{θ | γ |}} < + \infty}

. For any

λ \leq 1

and

0 < θ < M

,

\begin{matrix} E e^{λ θ γ} \leq exp (\frac{σ^{2} θ^{2} λ^{2}}{2} + λ^{2} \int_{R} (e^{θ | u |} - θ | u | - 1) ν (d u)) . \end{matrix}

(14)

The proof can be referred to in [26].

2.3. Random Matrix Series

Given n fixed matrices

A_{1}, A_{2}, \dots, A_{n}

, a random matrix series is represented as

\sum_{k = 1}^{n} x_{k} A_{k}

, where

x_{1}, x_{2}, \dots, x_{n}

are independent variables. The tail and expectation bounds for random matrix series can be bounded to

P \{λ_{max} (\sum_{k} x_{k} A_{k}) \geq t\}

and

E λ_{max} (\sum_{k} x_{k} A_{k}) .

3. Intrinsic Dimension Bounds for Matrix Series

In this section, we present tail bounds for random matrix series based on intrinsic dimension bounds, and also obtain the expectation bounds.

3.1. Matrix Gaussian (or Rademacher) Series with Intrinsic Dimension

This section presents the tail and expectation bounds for matrix Gaussian (or Rademacher) series with an intrinsic dimension.

Theorem 1.

Consider a finite sequence

{A_{k} : k = 1, . . ., n}

of fixed Hermitian matrices with the same dimensional d, with

{α_{k}}

being a finite sequence of independent Gaussian (or Rademacher) variables. Introduce the matrix

M ⪰ \sum_{k} A_{k}^{2}

. Define the following parameters:

\tilde{d} = intdim (M) a n d ω^{2} = ∥ M ∥

Then, it holds that

P \{λ_{max} (\sum_{k} α_{k} A_{k}) \geq t\} \leq 2 \tilde{d} \cdot e^{- t^{2} / (2 ω^{2})} f o r t > ω .

(15)

Compared with the previous results in (2) and (3), our result in (15) improves upon their respective shortcomings, and is more tight. Therefore, our bound is more applicable for the case of high-dimensional matrices.

Theorem 2.

Given a matrix Gaussian (or Rademacher) series

\sum_{k} α_{k} A_{k}

, then it holds that

E λ_{max} (\sum_{k} α_{k} A_{k}) \leq \sqrt{2} ω (\sqrt{2} + \sqrt{log (1 + \tilde{d})})

(16)

Compared with the previous result in (7), our result in (16) depends on the intrinsic dimensions of the matrix, and is more applicable for the case of high-dimensional matrices.

The proofs of Theorems 1 and 2 are similar to the proofs of sub-Gaussian matrix series; we omit them here.

3.2. Matrix Sub-Gaussian Series with Intrinsic Dimension

This section presents the tail and expectation bounds for matrix sub-Gaussian series with an intrinsic dimension.

Theorem 3.

Consider a finite sequence

{A_{k} : k = 1, . . ., n}

of fixed Hermitian matrices with the same dimensional d, with

{β_{k}}

being a finite sequence of independent central sub-Gaussian variables. Introduce the matrix

M ⪰ \sum_{k} A_{k}^{2}

. Define the following parameters:

\tilde{d} = intdim (M) a n d ω^{2} = ∥ M ∥

Then, it holds that

P \{λ_{max} (\sum_{k} β_{k} A_{k}) \geq t\} \leq 2 \tilde{d} \cdot e^{- t^{2} / (4 c^{2} ω^{2})} f o r t > \sqrt{2} c ω .

(17)

where c is an absolute constant.

Before proving this theorem, we first introduce a proposition [27] that will be used in the proof process. This proposition is a key step in our proof.

Proposition 1.

Let

Y

be a random Hermitian matrix. Let

ψ : R \to R_{+}

be a nonnegative function that is nondecreasing on

[0, \infty)

. For each

t \geq 0

,

P {λ_{max} (Y) \geq t} \leq \frac{1}{ψ (t)} E tr ψ (Y) .

(18)

Proof.

Let the sum

Y = \sum_{k} β_{k} A_{k} .

Fix a number

θ > 0

, and define the function

ψ (t) = max {0, e^{θ t} - 1}

for

t \in R

. For

t \geq 0

, Proposition 1 states that

P {λ_{max} (Y) \geq t} \leq \frac{1}{ψ (t)} E tr ψ (Y) = \frac{1}{e^{θ t} - 1} E tr (e^{θ Y} - I) .

(19)

Introduce the matrix

M ⪰ \sum_{k} A_{k}^{2}

. According to the mgf of a sub-Gaussian random variable in (12) and the transfer rule (consider a real-valued function f, if

f (a) \leq g (a)

for

a \in I

, then

f (A) ⪯ g (A)

, when the eigenvalues of

A

lie in I), it can be known that

E tr e^{θ Y} \leq tr exp (θ^{2} c^{2} \sum_{k} A_{k}^{2}) = tr exp (g (θ) \cdot M) w h e r e g (θ) = c^{2} θ^{2} .

(20)

Introduce the function

φ (a) = e^{a} - 1

, and observe that

E tr (e^{θ Y} - I) \leq intdim (M) φ (g (θ) ∥ M ∥) .

Define the following parameters:

\tilde{d} = intdim (M) a n d ω^{2} = ∥ M ∥

We have

E tr (e^{θ Y} - I) \leq \tilde{d} \cdot φ (g (θ) ω^{2}) \leq \tilde{d} \cdot e^{g (θ) \cdot ω^{2}} .

(21)

Next, combine the bound (21) and the probability bound to obtain

P {λ_{max} (Y) \geq t} \leq \tilde{d} \cdot \frac{e^{θ t}}{e^{θ t} - 1} \cdot e^{- θ t + g (θ) \cdot ω^{2}} \leq \tilde{d} \cdot (1 + \frac{1}{θ t}) \cdot e^{- θ t + g (θ) \cdot ω^{2}} .

(22)

We use the following formula to control the fraction:

\frac{e^{a}}{e^{a} - 1} = 1 + \frac{1}{e^{a} - 1} \leq 1 + \frac{1}{a} f o r a \geq 0 .

We select

θ = t / (2 c^{2} ω^{2})

to obtain

P \{λ_{max} (\sum_{k} β_{k} A_{k}) \geq t\} \leq \tilde{d} (1 + \frac{2 c^{2} ω^{2}}{t^{2}}) e^{- t^{2} / (4 c^{2} ω^{2})}

(23)

Install the assumption that

t > \sqrt{2} c ω

and yield the conclusion. □

Since the large deviation inequality considers the case where t is large, the limitation

t > \sqrt{2} c ω

is reasonable.

Theorem 4.

Given a matrix sub-Gaussian series

\sum_{k} β_{k} A_{k}

, then it holds that

E λ_{max} (\sum_{k} β_{k} A_{k}) \leq 2 c ω (\sqrt{2} + \sqrt{log (1 + \tilde{d})}) .

(24)

Proof.

Fix a number

t > μ > \sqrt{2} c ω

.

\begin{matrix} E λ_{max} (\sum_{k} β_{k} A_{k}) = & \int_{0}^{\infty} P {λ_{max} (\sum_{k} β_{k} A_{k}) \geq t} d t \\ \leq & \int_{0}^{μ} P {λ_{max} (\sum_{k} β_{k} A_{k}) \geq t} d t + 2 \tilde{d} \int_{μ}^{\infty} e^{- t^{2} / (4 c^{2} ω^{2})} d t \\ \leq & μ + 2 \tilde{d} \int_{μ}^{\infty} e^{- t^{2} / (4 c^{2} ω^{2})} d t \\ \leq & μ + 2 \tilde{d} \sqrt{2} c ω \int_{μ}^{\infty} \frac{t}{2 c^{2} ω^{2}} e^{- t^{2} / (4 c^{2} ω^{2})} d t \\ = & μ + 2 \tilde{d} \cdot \sqrt{2} c ω e^{- μ^{2} / (4 c^{2} ω^{2})} \end{matrix}

(25)

Select

μ = 2 c ω \sqrt{log (1 + \tilde{d})}

,

\begin{matrix} E λ_{max} (\sum_{k} β_{k} A_{k}) \leq & 2 c ω \sqrt{log (1 + \tilde{d})} + 2 \tilde{d} \cdot \sqrt{2} c ω e^{- log (1 + \tilde{d})} \\ = & 2 c ω \sqrt{log (1 + \tilde{d})} + \frac{2 \tilde{d}}{1 + \tilde{d}} \sqrt{2} c ω \\ \leq & 2 c ω \sqrt{log (1 + \tilde{d})} + 2 \sqrt{2} c ω \\ = & 2 c ω (\sqrt{2} + \sqrt{log (1 + \tilde{d})}) \end{matrix}

(26)

□

3.3. Matrix Infinite Divisible Series with Intrinsic Dimension

This section presents the tail and expectation bounds for matrix i.d. series with an intrinsic dimension.

Theorem 5.

Consider a finite sequence

{B_{k} : k = 1, . . ., n}

of fixed Hermitian matrices with the same dimensional d,

λ_{max} (B_{k}) \leq 1

, with

{γ_{k}}

being a finite sequence of independent centered i.d. with the triplet

(b, σ^{2}, ν)

variable, such that

E e^{θ | γ |} < + \infty

for some

θ > 0

. Introduce the matrix

M ⪰ \sum_{k} B_{k}^{2}

. Define the following parameters:

\tilde{d} = intdim (M) a n d ω^{2} = ∥ M ∥

Then, holds that

P \{λ_{max} (\sum_{k} γ_{k} B_{k}) \geq t\} \leq 2 \tilde{d} \cdot exp (- ω^{2} \cdot \int_{0}^{t / ω^{2}} h^{- 1} (s) d s) f o r h (M^{-}) ω^{2} > t > \frac{1}{h^{- 1} (t / ω^{2})} .

(27)

where

h (M^{-})

is the left limit at M, with

M : = sup {θ > 0 : E e^{θ | γ |} < + \infty},

and

h^{- 1}

is the inverse of

h (s) = σ^{2} s + \int_{R} | u | I (e^{s | u |} - 1) ν (d u), 0 < s < M .

For any

t \geq h (M^{-}) ω^{2}

, we have

P \{λ_{max} (\sum_{k} γ_{k} B_{k}) > t\} \leq 2 \tilde{d} \cdot exp (ω^{2} ϕ (M) - M t),

where

ϕ (θ) : = \frac{σ^{2} θ^{2}}{2} + \int_{R} (e^{θ | u |} - θ | u | - 1) ν (d u) .

(28)

Compared with the previous result in [26], our results depend on the intrinsic dimensions of the matrix and are more applicable for the case of high-dimensional matrices.

Proof.

Let the sum

Y = \sum_{k} γ_{k} B_{k} .

Introduce the matrix

M ⪰ \sum_{k} B_{k}^{2}

. Similar to the above proof, according to the mgf of i.d. random variable in (14) and the transfer rule, we can obtain

P {λ_{max} (Y) \geq t} \leq \tilde{d} \cdot \frac{e^{θ t}}{e^{θ t} - 1} \cdot e^{- θ t + ϕ (θ) \cdot ω^{2}} \leq \tilde{d} \cdot (1 + \frac{1}{θ t}) \cdot e^{- θ t + ϕ (θ) \cdot ω^{2}} .

(29)

Next, we minimize the right-hand side of (29) with respect to

θ

. Since

E e^{θ γ} < + \infty

for all

0 < θ < M

,

ϕ (θ)

is infinitely differentiable on

(0, M)

, with

\begin{matrix} ϕ^{'} (θ) & : = h (θ) = σ^{2} θ + \int_{R} | u | (e^{θ | u |} - 1) ν (d u) > 0, \end{matrix}

(30)

and

\begin{matrix} ϕ^{″} (θ) = σ^{2} + \int_{R} {| u |}^{2} e^{θ | u |} ν (d u) > 0 . \end{matrix}

(31)

Since

ϕ (0) = h (0) = h^{- 1} (0) = 0

, we have

\begin{matrix} ϕ (h^{- 1} (t / ω^{2})) = & \int_{0}^{h^{- 1} (t / ω^{2})} h (s) d s \\ = & \int_{0}^{t / ω^{2}} s d h^{- 1} (s) \\ = & (t / ω^{2}) \cdot h^{- 1} (t / ω^{2}) - \int_{0}^{t / ω^{2}} h^{- 1} (s) d s . \end{matrix}

(32)

We select

θ = h^{- 1} (t / ω^{2})

to obtain

\begin{matrix} min_{0 < θ < M} \{ω^{2} \cdot ϕ (θ) - θ \cdot t\} = & ω^{2} \cdot ϕ (h^{- 1} (t / ω^{2})) - t \cdot h^{- 1} (t / ω^{2}) \\ = & - ω^{2} \cdot \int_{0}^{t / ω^{2}} h^{- 1} (s) d s . \end{matrix}

Install the assumption that

t > \frac{1}{h^{- 1} (t / ω^{2})}

; we have

P \{λ_{max} (\sum_{k} γ_{k} B_{k}) \geq t\} \leq 2 \tilde{d} \cdot exp (- ω^{2} \cdot \int_{0}^{t / ω^{2}} h^{- 1} (s) d s) .

Actually, when

t \geq h (M^{-}) ω^{2}

, according to the convexity of

ω^{2} ϕ (θ) - θ t

with respect to

θ > 0

and the monotonicity of

h^{- 1} (s)

(

s > 0

), the solution to the optimization problem is

θ = M

. Thus, for any

t \geq h (M^{-}) ω^{2}

, we have

P \{λ_{max} (\sum_{k} γ_{k} B_{k}) > t\} \leq 2 \tilde{d} \cdot exp (ω^{2} ϕ (M) - M t),

□

Given some specific settings of the measure

ν

, we can obtain the following corollary.

Corollary 1.

Assume ν has a bounded support, i.e., there exists a positive constant

a < \infty

such that

ν ((- \infty, - a) \cup (a, \infty)) = 0

and

ν ([- a, a]) \neq 0

. Let

R = inf {a > 0 : ν ({u : | u | > a}) = 0} .

(33)

It follows that

R < \infty

. Then, for any

t > 0

,

\begin{matrix} P \{λ_{max} (\sum_{k} γ_{k} B_{k}) > t\} \\ \leq & 2 \tilde{d} \cdot exp (- \frac{ω^{2} (σ^{2} + V)}{R^{2}} \cdot Q (\frac{R t}{ω^{2} (σ^{2} + V)})), \end{matrix}

(34)

where

V : = \int_{R} {| u |}^{2} ν (d u)

, and

Q (s) : = (1 + s) \cdot log (1 + s) - s .

(35)

Proof.

Since the support is

supp (ν) \subseteq [- R, R]

, it holds that

E e^{θ | γ |} < + \infty

for any

θ > 0

. Thus, we have

\begin{matrix} h (θ) = & σ^{2} θ + \int_{R} | u | (e^{θ | u |} - 1) ν (d u) \\ = & σ^{2} θ + \int_{| u | \leq R} {| u |}^{2} (\sum_{k = 1}^{\infty} \frac{θ^{k} {| u |}^{k - 1}}{k!}) ν (d u) \\ \leq & σ^{2} θ + \int_{| u | \leq R} {| u |}^{2} (\sum_{k = 1}^{\infty} \frac{θ^{k} R^{k - 1}}{k!}) ν (d u) \\ = & σ^{2} θ + V (\frac{e^{θ R} - 1}{R}) \leq (σ^{2} + V) (\frac{e^{θ R} - 1}{R}) . \end{matrix}

(36)

Denote

p (θ) : = (σ^{2} + V) (\frac{e^{θ R} - 1}{R})

with the inverse function

p^{- 1} (s) = \frac{1}{R} \cdot log (1 + \frac{R s}{σ^{2} + V})

(s > 0)

. Since

h (θ)

and

p (θ)

(

θ > 0

) are strictly increasing functions, their inverse functions satisfy the relation

p^{- 1} (s) \leq h^{- 1} (s)

for all

s > 0

. By combining (27) and (36), we obtain, for any

t > \frac{1}{h^{- 1} (t / ω^{2})}

,

\begin{matrix} P \{λ_{max} (\sum_{k} γ_{k} B_{k}) > t\} \\ \leq & 2 \tilde{d} \cdot exp (- ω^{2} \cdot \int_{0}^{t / ω^{2}} h^{- 1} (s) d s) \\ \leq & 2 \tilde{d} \cdot exp (- ω^{2} \cdot \int_{0}^{t / ω^{2}} \frac{1}{R} \cdot log (1 + \frac{R s}{σ^{2} + V}) d s) \\ = & 2 \tilde{d} \cdot exp (- \frac{ω^{2} (σ^{2} + V)}{R^{2}} \cdot q (\frac{R t}{ω^{2} (σ^{2} + V)})), \end{matrix}

where

q (s) : = (1 + s) \cdot log (1 + s) - s

. This completes the proof. □

Given a matrix i.d. series

\sum_{k} γ_{k} B_{k}

, then it holds that

E λ_{max} (\sum_{k} γ_{k} B_{k}) = \int_{0}^{\infty} P {λ_{max} (\sum_{k} γ_{k} B_{k}) \geq t} d t .

(37)

In other words, in the case where the tail bound is integrable, we can use Formula (37) to obtain the expectation bound based on the intrinsic dimensions for matrix i.d. series.

Compared with existing studies, our results are based on the intrinsic dimension of the matrix. The tail and expectation bounds are tighter than the previous results. Therefore, our bounds are more applicable for the case of high-dimensional matrices.

In addition, by using the Hermitian dilation, our results can also be extended to the scenario of non-Hermitian random matrix series. Consider that the general random matrix series

\sum_{k} x_{k} C_{k}

,

∥ \sum_{k} x_{k} C_{k} ∥ = λ_{max} (\sum_{k} x_{k} φ (C_{k}))

is established, among which

φ (C_{k}) : = (\begin{matrix} 0 & C_{k} \\ C_{k}^{*} & 0 \end{matrix}) .

Thus, we may invoke each theorem to obtain tail and expectation bounds for the norm of the random matrix series.

4. Conclusions

In this paper, we propose optimized tail and expectation bounds for random matrix series, including matrix Gaussian (or Rademacher) and sub-Gaussian and infinitely divisible (i.d.) series. Different from existing studies, our results depend on intrinsic dimension rather than ambient dimension, and are more suitable for the case of high-dimensional matrices.

In future work, we will use the obtained results to study tail bounds and expectation bounds for other eigenvalues of random matrix series.

Author Contributions

Conceptualization, X.G., M.Z. and J.L.; methodology, X.G., M.Z. and J.L.; validation, X.G., M.Z. and J.L.; resources, X.G.; writing—original draft preparation, X.G.; writing—review and editing, M.Z. and J.L.; supervision, X.G., M.Z. and J.L.; project administration, X.G.; funding acquisition, X.G. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (12101378); Shanxi Provincial Research Foundation for Basic Research, China (20210302124548).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

We are grateful to the anonymous reviewers and the editors for their valuable comments and suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Tulino, A.M.; Verdú, S. Random matrix theory and wireless communications. Found. Trends Commun. 2004, 1, 1–182. [Google Scholar] [CrossRef]
Naor, A.; Regev, O.; Vidick, T. Efficient rounding for the noncommutative grothendieck inequality. In Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing, Palo Alto, CA, USA, 1–4 June 2013; pp. 71–80. [Google Scholar]
Gittens, A.; Mahoney, M.W. Revisiting the nyström method for improved large-scale machine learning. J. Mach. Learn. Res. 2016, 17, 3977–4041. [Google Scholar]
Louart, C.; Liao, Z.; Couillet, R. A random matrix approach to neural networks. Ann. Appl. Probab. 2018, 28, 1190–1248. [Google Scholar] [CrossRef]
Wang, Z.; Zhu, Y. Deformed semicircle law and concentration of nonlinear random matrices for ultra-wide neural networks. Ann. Appl. Probab. 2024, 34, 1896–1947. [Google Scholar] [CrossRef]
Martin, C.H.; Mahoney, M.W. Implicit self-regularization in deep neural networks: Evidence from random matrix theory and implications for learning. J. Mach. Learn. Res. 2021, 22, 1–73. [Google Scholar]
Wigner, E.P. Random matrices in physics. SIAM Rev. 1967, 9, 1–23. [Google Scholar] [CrossRef]
Guhr, T.; Müller-Groeling, A.; Weidenmüller, H.A. Random-matrix theories in quantum physics: Common concepts. Phys. Rep. 1998, 299, 189–425. [Google Scholar] [CrossRef]
Bufetov, A.; Mkrtchyan, S.; Shcherbina, M.; Soshnikov, A. Entropy and the Shannon-McMillan-Breiman theorem for beta random matrix ensembles. J. Stat. Phys. 2013, 152, 1–14. [Google Scholar] [CrossRef]
Calabrese, P.; Le Doussal, P.; Majumdar, S.N. Random matrices and entanglement entropy of trapped Fermi gases. Phys. Rev. A 2015, 91, 012303. [Google Scholar] [CrossRef]
Collins, B.; Nechita, I. Random matrix techniques in quantum information theory. J. Math. Phys. 2016, 57. [Google Scholar] [CrossRef]
Wigner, E.P. On the distribution of the roots of certain symmetric matrices. Ann. Math. 1958, 67, 325–327. [Google Scholar] [CrossRef]
Marchenko, V.A.; Pastur, L.A. Distribution of eigenvalues for some sets of random matrices. Mat. Sb. 1967, 114, 507–536. [Google Scholar]
Bai, Z.D.; Yin, Y.Q. Limit of the smallest eigenvalue of a large dimensional sample covariance matrix. Ann. Probab. 1993, 21, 1275–1294. [Google Scholar] [CrossRef]
Ahlswede, R.; Winter, A. Strong converse for identification via quantum channels. IEEE Trans. Inf. Theory 2002, 48, 569–579. [Google Scholar] [CrossRef]
Golden, S. Lower bounds for the Helmholtz function. Phys. Rev. 1965, 137, B1127. [Google Scholar] [CrossRef]
Thompson, C.J. Inequality with applications in statistical mechanics. J. Math. Phys. 1965, 6, 1812–1813. [Google Scholar] [CrossRef]
Tropp, J.A. User-friendly tail bounds for sums of random matrices. Found. Comput. Math. 2012, 12, 389–434. [Google Scholar] [CrossRef]
Lieb, E.H. Convex trace functions and the Wigner-Yanase-Dyson conjecture. Adv. Math. 1973, 11, 267–288. [Google Scholar] [CrossRef]
Hsu, D.; Kakade, S.M.; Zhang, T. Tail inequalities for sums of random matrices that depend on the intrinsic dimension. Electron. Commun. Probab. 2012, 17, 1–13. [Google Scholar] [CrossRef]
Minsker, S. On some extensions of bernstein’s inequality for self-adjoint operators. Stat. Probab. Lett. 2017, 127, 111–119. [Google Scholar] [CrossRef]
Zhang, C.; Du, L.; Tao, D. Lsv-based tail inequalities for sums of random matrices. Neural Comput. 2016, 29, 247–262. [Google Scholar] [CrossRef] [PubMed]
Zhao, L.; Liao, S.; Wang, Y.; Li, Z.; Tang, J.; Yuan, B. Theoretical properties for neural networks with weight matrices of low displacement rank. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 4082–4090. [Google Scholar]
Choromanski, K.; Sindhwani, V. Recycling randomness with structure for sublinear time kernel expansions. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 2502–2510. [Google Scholar]
Cheng, Y.; Yu, F.X.; Feris, R.S.; Kumar, S.; Choudhary, A.; Chang, S.F. An exploration of parameter redundancy in deep networks with circulant projections. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 2857–2865. [Google Scholar]
Zhang, C.; Gao, X.; Hsieh, M.H.; Hang, H.; Tao, D. Matrix infinitely divisible series: Tail inequalities and their applications. IEEE Trans. Inf. Theory 2019, 66, 1099–1117. [Google Scholar] [CrossRef]
Tropp, J.A. An Introduction to Matrix Concentration Inequalities; Foundations and Trends® in Machine Learning: Hanover, MA, USA, 2015; Volume 8, pp. 1–230. [Google Scholar]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gao, X.; Zhang, M.; Luo, J. Optimized Tail Bounds for Random Matrix Series. Entropy 2024, 26, 633. https://doi.org/10.3390/e26080633

AMA Style

Gao X, Zhang M, Luo J. Optimized Tail Bounds for Random Matrix Series. Entropy. 2024; 26(8):633. https://doi.org/10.3390/e26080633

Chicago/Turabian Style

Gao, Xianjie, Mingliang Zhang, and Jinming Luo. 2024. "Optimized Tail Bounds for Random Matrix Series" Entropy 26, no. 8: 633. https://doi.org/10.3390/e26080633

APA Style

Gao, X., Zhang, M., & Luo, J. (2024). Optimized Tail Bounds for Random Matrix Series. Entropy, 26(8), 633. https://doi.org/10.3390/e26080633

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimized Tail Bounds for Random Matrix Series

Abstract

1. Introduction

1.1. Related Works

1.2. Overview of Main Results

2. Notations and Preliminaries

2.1. The Intrinsic Dimension

2.2. Several Distributions

2.3. Random Matrix Series

3. Intrinsic Dimension Bounds for Matrix Series

3.1. Matrix Gaussian (or Rademacher) Series with Intrinsic Dimension

3.2. Matrix Sub-Gaussian Series with Intrinsic Dimension

3.3. Matrix Infinite Divisible Series with Intrinsic Dimension

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI