1. Introduction
The random matrix theory (RMT) dates back to the work of Wishart in multivariate statistics [
1], which was devoted to the joint distribution of the entries of sample covariance matrices. The next RMT milestone was the work of Wigner [
2] in the middle of the last century, in which the modelling of the Hamiltonian of excited heavy nuclei using a large dimensional random matrix was proposed, thereby replacing the study of the energy levels of nuclei with the study of the distribution of the eigenvalues of a random matrix. Wigner studied the eigenvalues of random Hermitian matrices with centred, independent and identically distributed elements (such matrices were later named Wigner matrices) and proved that the density of the empirical spectral distribution function of the eigenvalues of such matrices converges to the semicircle law as the matrix dimensions increase. Later, this convergence was named Wigner’s semicircle law and Wigner’s results were generalised in various aspects.
The breakthrough work of Marchenko and Pastur [
3] gave impetus to new progress in the study of sample covariance matrices. Under quite general conditions, they found an explicit form of the limiting density of the expected empirical spectral distribution function of sample covariance matrices. Later, this convergence was named the Marchenko–Pastur law.
Sample covariance matrices are of great practical importance for the problems of multivariate statistical analysis, particularly for the method of principal component analysis (PCA). In recent years, many studies have appeared that have connected RMT with other rapidly developing areas, such as the theory of wireless communication and deep learning. For example, the spectral density of sample covariance matrices is used in calculations that relate to multiple input multiple output (MIMO) channel capacity [
4]. An important object of study for neural networks is the loss surface. The geometry and critical points of this surface can be predicted using the Hessian of the loss function. A number of works that have been devoted to deep networks have suggested the application of various RMT models for Hessian approximation, thereby allowing the use of RMT results to reach specific conclusions about the nature of the critical points of the surface.
Another area of application for sample covariance matrices is graph theory. The adjacency matrix of an undirected graph is asymmetric, so the study of its singular values leads to sample covariance matrices. An example of these graphs is the bipartite random graph, the vertices of which can be divided into two groups in which the vertices are not connected to each other.
If we assume that the probability
of having graph edges tends to zero as the number of vertices
n increases to infinity, we arrive at the concept of sparse random matrices. The behaviour of the eigenvalues and eigenvectors of a sparse random matrix significantly depends on its sparsity and results that are obtained for non-sparse matrices cannot be applied. Sparse sample covariance matrices have applications in random graph models [
5] and deep learning problems [
6] as well.
Sparse Wigner matrices have been considered in a number of papers (see [
7,
8,
9,
10]), in which many results have been obtained. With the symmetrisation of sample covariance matrices, it is possible to apply these results when observation matrices are square. However, when the sample size is greater than the observation dimensions, the spectral limit distribution has a singularity at zero, which requires a different approach. The spectral limit distribution of sparse sample covariance matrices with a sparsity of
(where
was arbitrary small) was studied in [
11,
12]. In particular, a local law was proven under the assumption that the matrix elements satisfied the moment conditions
. In this paper, we considered a case with a sparsity of
for
and assumed that the matrix element moments satisfied the conditions
and
for
.
2. Main Results
We let , where . We considered the independent and identically distributed zero mean random variables , and with and and an independent set of the independent Bernoulli random variables , and with . In addition, we supposed that as . In what follows, we omitted the index n from when this would not cause confusion.
We considered a sequence of random matrices:
Denoted by
, the singular values of
and the symmetrised empirical spectral distribution function (ESD) of the sample covariance matrix
were defined as:
where
stands for the event
A indicator.
We let
and
be the symmetrised Marchenko–Pastur distribution function with the density:
where
and
. We assumed that
for
. When the Stieltjes transformation of the distribution function
was denoted by
and the Stieltjes transformation of the distribution function
was denoted by
, we obtained:
In this paper, we proved the so called
local Marchenko–Pastur law for sparse covariance matrices. We let:
For a constant , we defined the value . We assumed that a sparse probability of and that the moments of the matrix elements satisfied the following conditions:
Condition : for and , we have
Condition : for , we have
Condition : a constant exists, such that for all and , we have
We introduced the quantity
with a positive constant
. We then introduced the region:
For constants
and
V, we defined the region:
Next, we introduced some notations. We let:
We introduced the quantity:
and put:
We stated the improved bounds for
and put:
Theorem 1. Assuming that the conditions – are satisfied. Then, for any the positive constants , and exist, such that for : We also proved the following result.
Theorem 2. Under the conditions of Theorem 1 and for , the positive constants , , and exist, such that for : 2.1. Organisation
The paper is organised as follows. In
Section 3, we state Theorems 3–5 and several corollaries. In
Section 4, the delocalisation is considered. In
Section 4, we prove the corollaries that were stated in
Section 3.
Section 6 is devoted to the proof of Theorems 3–5. In
Section 7, we state and prove some auxiliary results.
2.2. Notation
We use C for large universal constants, which may be different from line to line. and denote the Stieltjes transformations of the symmetrised Marchenko–Pastur distribution and the spectral distribution function, respectively. denotes the resolvent matrix. We let , , and . We consider the -algebras , which were generated by the elements of (with the exception of the rows from and the columns from ). We write instead of and instead of for brevity. The symbol denotes the matrix , from which the rows with numbers in and columns with numbers in were deleted. In a similar way, we denote all objects in terms of , such that the resolvent matrix is , the ESD Stieltjes transformation is , , etc. The symbol denotes the conditional expectation with respect to the -algebra and denotes the conditional expectation with respect to -algebra . We let and .
3. Main Equation and Its Error Term Estimation
Note that
is the ESD of the block matrix:
where
is a
matrix with zero elements.
We let
be the resolvent matrix of
:
By applying the Schur complement, we obtained:
For the diagonal elements of
, we could write:
for
and:
for
. The correction terms
for
and
for
were defined as:
and
By summing Equation (
4) (
and
), we obtained the self-consistent equation:
with the error term:
We let
be positive constant
V, depending on
. The exact values of these constants were defined as below. For
, we defined
as:
The function
was defined in (
2). For a given
, we considered the event:
and the event:
For any
value, the constant
existed, such that:
It could be
, for example. In what follows, we assumed that
and
V were chosen so that (
6) was satisfied and we wrote:
In this section, we demonstrate the following results.
Theorem 3. Under the condition , the positive constants , and exist, such that for : Remark 1. Theorem 3 was auxiliary. was the perturbation of the main equation in the Stieltjes transformation of the limit distribution. The size of was responsible for the stability of the solution of the perturbed equation. We were interested in the estimates of that were uniform in the domain and had an order of (such estimates were needed for the proof of the delocalisation of Theorem 6). It was important to know to what extent the estimates depended on both and . The estimates behaved differently on the beam and at the ends of the support of the limit distribution (the introduced functions and were responsible for the behaviour of the estimates, depending on the real part of the argument: on the beam or at the ends of the support of the limit distribution). For estimation, there were two regimes: for , we used the inequality (10) and for , we used the inequality (18). Corollary 1. Under conditions of Theorem 3, the following inequalities hold: Corollary 2. Under the conditions of Theorem 3 and in the domain: for any , a constant C exists that depends on Q, such that: Moreover, for to satisfy and and for , a constant C exists that depends on Q, such that: Corollary 3. Under the conditions of Theorem 3, for , a constant C that depends on Q exists, such that: Theorem 4. Under the conditions of Theorem 1, for , the positive constants and exists, such that for : Moreover, for , the positive constants , and exist, such that for satisfying and : To prove the main result, we needed to estimate the entries of the resolvent matrix.
Theorem 5. Under the condition and for and , the constants , , , and exist, such that for and , we have: Corollary 4. Under the conditions of Theorem 5, for and , a constant H exists, such that for : 4. Delocalisation
In this section, we demonstrate some applications of the main result. We let
and
be orthogonal matrices from the SVD of matrix
s.t.:
where
and
. Here and in what follows,
denotes a
matrix with zero entries. The eigenvalues of matrix
are denoted by
(
for
,
for
and
for
). We let
be the eigenvector of matrix
, corresponding to eigenvalue
, where
.
We proved the following result.
Theorem 6. Under the conditions –, for , the positive constants and exist, such that: Moreover, for , we have: Proof. First, we noted that according to [
13] based on [
14] and Theorem 1,
exists, such that:
Furthermore, by Lemma 11, we obtained:
where
We chose
. Then, by Corollary 4, we obtained:
We obtained the bounds for in a similar way. Thus, the theorem was proven. □
5. Proof of the Corollaries
5.1. The Proof of Corollary 4
Proof.
Combining this inequality with
, we found that:
By applying Theorem 5, we obtained what was required.
Thus, the corollary was proven. □
5.2. The Proof of Corollary 2
Proof.
We considered the domain
. We noted that for
, we obtained:
and
First, we considered the case
. This inequality implied that:
From there, it followed that:
Furthermore, for the case
, we obtained
. We used the inequality:
By Chebyshev’s inequality, we obtained:
By applying Corollary 1, we obtained:
where
First, we noted that for
:
Moreover, for
:
From there, it followed that:
Using these estimations, we could show that:
By choosing
and
, we obtained:
Then, we considered the case
. In this case:
By applying the inequality
and Corollary 1, we obtained:
It was then simple to show that:
Thus, the first inequality was proven. The proof of the second inequality was similar to the proof of the first. We had to use the inequality:
which was valid on the real line, instead of
, which held in the domain
. Moreover, we noted that for any
z value, we obtained:
Thus, the corollary was proven. □
5.3. Proof of Corollary 3
Proof. We noted that for
:
We split the interval
into subintervals by
, such that for
:
We noted that the event
implied the event
. From there, for
,
, we obtained:
□
6. Proof of the Theorems
6.1. Proof of Theorem 1
Proof. The second term in the RHS of the last inequality was bounded by Corollary 3. For
z (such that
), we used the inequality:
the inequality:
and the Markov inequality. We could write:
We recalled that in the case
:
In the case
and using Corollary 1, we obtained:
First, we considered the case
. By our definition of
, we obtained:
This inequality completed the proof for .
We then considered
. We used inequality
and Corollary 1 to obtain:
By choosing a sufficiently large K value, we obtained the proof. Thus, the theorem was proven. □
6.2. Proof of Theorem 2
Proof. The proof of Theorem 2 was similar to the proof of Theorem 1. We only noted that inequality:
held for all
. □
6.3. The Proof of Theorem 5
Proof.
Using the definition of the Stieltjes transformation, we obtained:
and
It is also well known that for
:
and
We considered the following event for
and
:
For
,
and
u, we obtained:
We introduced the events:
In what follows, we used .
Equations (
4) and (
5) and Lemma 10 yielded that for
and for
that satisfied
, the following inequalities held:
and
,
We noted that for and under appropriate and , we obtained .
We considered the off-diagonal elements of the resolvent matrix. It could be shown that for
:
for
:
and
where
Inequalities (
21) and (
22) implied that:
for
and
and that:
for
and
. Equations (
23)–(
25) produced:
for
and:
for
. Similarly, we obtained:
and
We noted that for
, we obtained:
Using Rosenthal’s inequality, we found that:
for
and that:
for
. We noted that:
Using Chebyshev’s inequality, we obtained:
By applying the triangle inequality to the results of Lemmas (1)–(3) (which were the property of the multiplicative gradient descent of the resolvent matrix), we arrived at the inequality:
When we set
,
and
and took into account that
and
, then we obtained:
Moreover, the constant
c could be made arbitrarily large. We could obtain similar estimates for the quantities of
,
,
,
,
. Inequalities (
27) and (
28) implied:
The last inequalities produced:
We noted that
for
. So, by choosing
c large enough, we obtained:
This completed the proof of the theorem. □
6.4. The Proof of Theorem 3
Proof.
First, we noted that for
, a constant
exists, such that:
Without a loss of generality, we could assume that
. We recalled that:
We considered the smoothing of the indicator
:
We noted that:
where, as before:
To estimate
, we used the approach developed in [
15], which refers back to Stein’s method. We let:
The equality:
implied that a constant
C exists that depends on
in the definition of
, such that:
By the definition of
, we could rewrite the last inequality as:
We obtained:
and this yielded:
Inequality (
30) implied that for
:
where
We noted that by the Jensen inequality, for
:
We represented
in the form:
where
Since
, we found:
From there, it was easy to obtain:
6.4.1. Estimation of
Using the representation of
, we could write:
where
In the case
, we obtained:
Furthermore, in the case
and
, we obtained:
For
, we could write:
Using this, we concluded that:
By applying Lemmas 2 and 3, we obtained:
By combining inequalities (
34) and (
35),
and Young’s inequality, we obtained:
where
Hölder’s inequality and (
35) produced:
6.4.2. Estimation of
Using Hölder’s inequality and Cauchy’s inequality, we obtained:
By applying Lemmas 2, 3 and 5, we obtained:
6.4.3. Estimation of
Using Taylor’s formula, we obtained:
where
is uniformly distributed across the interval
and the random variables are independent from each other. Since
yields
, we found that:
Taking into account the inequality:
we obtained:
By applying Hölder’s inequality, we obtained:
Jensen’s inequality produced:
To estimate
, we had to obtain the bounds for:
Using Cauchy’s inequality, we obtained:
where
6.4.4. Estimation of
Lemma 2 produced:
and, in turn, Lemma 3 produced:
By summing the obtained estimates, we arrived at the following inequality:
6.4.5. Estimation of
We considered
. Since
and
, we obtained:
Then, we returned to the estimation of
. Equality (
41) implied:
We could rewrite this as:
First, we found that:
and
It was straightforward to see that:
Further, since:
we could write:
By combining the estimates that were obtained for
, we concluded that:
Inequalities (
38) and (
39) implied the bounds:
Then, Inequality (
42) yielded:
We rewrote this as:
where
6.4.6. Estimation of
Using Inequalities (
40) and (
41) and
, we obtained:
By applying:
we obtained:
The last inequality produced:
By applying Lemma 5, we obtained:
Finally, using Lemma 6, we obtained:
By combining Inequalities (
29), (
31), (
32), (
33), (
36), (
37) and (
43) and applying Young’s inequality, we obtained the proof. □
6.5. The Proof of Theorem 4
Proof.
We considered the case
, where
This implied that the constant
exists, depending on
, such that:
First, we considered the case
. Without a loss of generality, we assumed that
, where
is the constant in the definition of
. This meant that
. Furthermore:
and
Using Theorem 3, we obtained:
The analysis of for .
By combining all of these estimations and using:
we obtained:
For
(such that
), we could write:
Then, we considered
. In this case, we used the inequality:
In what follows, we assumed that .
The bound of for .
By the definition of
, we obtained:
We could obtain from this that, for sufficiently small
values:
We noted that
. This immediately implied that:
We noted that for
, we obtained:
and
From there, it followed that:
Simple calculations showed that:
Simple calculation showed that:
It was straightforward to check that:
By applying the Markov inequality for
, we obtained:
On the other hand, when
, we used the inequality:
By applying the Markov inequality, we obtained:
We noted that
for
and that for
:
We chose
, such that:
It was enough to put
. We let
. For
, we defined:
and
. We noted that
and that:
We started with
. We noted that:
From there, it followed that:
By repeating this procedure and using the union bound, we obtained the proof.
Thus, Theorem 4 was proven. □
7. Auxiliary Lemmas
Lemma 1. Under the conditions of Theorem, for and , we have: Proof. For simplicity, we only considered the case
and
. We noted that:
By applying Schur’s formula, we obtained:
The second inequality was proven in a similar way. □
Lemma 2. Under the conditions of Theorem 5, for all , the following inequalities are valid: In addition, for , we have: and for , we have: Proof. For simplicity, we only considered the case
and
. The first two inequalities were obvious. We only considered
. By applying Rosenthal’s inequality, for
, we obtained:
We recalled that:
and under the conditions of the theorem:
By substituting the last inequality into Inequality (
44), we obtained:
The second inequality could be proven similarly. □
Lemma 3. Under the conditions of the theorem, for all , the following inequalities are valid: In addition, for , we have: and for , we have: Proof. It sufficed to apply the inequality from Corollary 1 of [
16]. □
We recalled the notation:
Lemma 4. Under the conditions of the theorem, the following bounds are valid: Proof. We considered the equality:
Further, we noted that for a sufficiently small
value, a constant
H existed, such that:
We introduced the events:
Further, we considered
. We obtained:
Next, the following inequality held:
Under the condition
and the inequality
, we obtained the bounds:
By applying Lemmas 2 and 3, for the first term on the right side of (
48), we obtained:
This completed the proof of Inequality (
45).
Furthermore, by using representation (
47), we obtained:
By applying Lemmas 2 and 3, we obtained:
By applying Young’s inequality, we obtained the required proof. Thus, the lemma was proven. □
Lemma 5. Under the conditions of the theorem, we have: Proof. We set
. Using Schur’s complement formula:
Since
was measurable with respect to
, we could write:
We introduced the notation:
Similarly, for the moment of
, we obtained the following estimate:
From the above estimates and Lemma 4, we concluded that:
Thus, the lemma was proven. □
Lemma 6. Under the conditions of the theorem, for , we have: Proof. We used the representation:
We noted that by using Rosenthal’s inequality:
Similarly, for the second moment of
, we obtained the following estimate:
From the estimates above and Lemma 4, we concluded that:
To finish the proof, we applied Lemma (
45) and Inequality (
46). Thus, the lemma was proven. □
Lemma 7. For , the following inequality holds: Proof. It was easy to show that for
:
The last expression was not positive for
. From the negativity of the real part, it followed that:
This implied the required proof. Thus, the lemma was proven. □
Lemma 8. There is an absolute constant , such that for : and that for to satisfy and , the following inequality is valid: Proof. We changed the variables by setting:
and
In this notation, we could rewrite the main equation in the form:
Then, it sufficed to repeat the proof of Lemma B.1 from [
17]. We noted that this lemma implied that Inequality (
50) held for all
w with
(and, therefore, for all
z) and that Inequality (
49) satisfied
for
w. From this, we concluded that Inequality (
49) held for
, such that
for a sufficiently small constant
.
Thus, the lemma was proven. □
Lemma 9. For , we have: Proof. Using this, we could write:
From there, it followed that:
Thus, the lemma was proven. □
Lemma 10. A positive absolute constant B exists, such that: Proof. First, we considered
. Then, for
:
In the case
, we obtained:
we then considered the case
:
To prove the second inequality, we considered the equality:
Thus, the lemma was proven. □
We let
be a rectangular
matrix with
. We let
be the singular values of matrix
. The diagonal matrix with
was denoted by
. We let
be an
matrix with zero entries. We put
and
. We let
and
be orthogonal (Hermitian) matrices, such that the singular value decomposition held:
Furthermore, we let be the identity of an matrix and . We introduced the matrices and . We noted that and . We introduced the matrix We considered the matrix . We then obtained the following:
Proof. The proof followed direct calculations. It was straightforward to see that:
□
8. Conclusions
In this work, we obtained results by assuming that the conditions – were fulfilled. The condition was of a technical nature. In our investigation on the asymptotic behaviour of the Stieltjes transformation on a beam, this restriction could be eliminated. However, this was a technically cumbersome task that requires separate consideration.