Random covariance matrices: Universality of local statistics of eigenvalues up to the edge
Ke Wang
Introduction
The goal of this paper is to extend the Four Moment theorem established by Tao and Vu for iid covariance matrices from the bulk of spectrum to the edge. Let us first specify the matrix ensembles that will be studied.
Consider a random matrix , where is an integer parameter such that and for some . We say that the matrix ensemble obeys condition if the random variables are jointly independent, have mean zero and variance , and obey the moment condition for some constant independent of .
Given such a random matrix , we form the covariance matrix . This (non-negative) matrix has rank and the first eigenvalues are . We order its remaining eigenvalues as
The empirical spectral distribution (ESD) of the matrix (which is Hermitian and thus has real eigenvalues) is a one-dimensional function
where we use to denote the cardinality of a set .
The first fundamental result concerning the asymptotic limiting behavior of ESD for large covariance matrices is the due to (see also ).
(Marchenko-Pastur Law) Assume a random matrix obeys condition with , and such that , the empirical spectral distribution of the matrix converges in distribution to the Marchenko-Pastur Law with a density function
We introduce the notation of frequent events, depending on , in increasing order of likelihood.
holds asymptotically almost surely if .
holds with high probability if for some constant (independent of ).
holds with overwhelming probability if for every constant (or equivalently, that ).
holds almost surely if .
We say that two complex random variables match to order for some integer if one has for all with .
Our main result is the following Four Moment theorem, which extends the result (Theorem 6) in to the edge of the spectrum. The proof is analogous to the proofs in , and and will be presented in Section 5.
For sufficiently small and sufficiently large ( will suffice) the following holds for every . Let and be two random matrices satisfying condition C1 with the indicated constant , and assume that for each that and match to order 4. Let be the associated covariance matrices. Assume also that for some .
Then for any , and for sufficiently large depending on , we have
If and only match to order 3 rather 4, then the conclusion (1.2) still holds provided that one strengthens (1.1) to
The next theorem is an extension of Theorem 17 in , which is used in the proof of Theorem 1.5 and is of independent interest as well. The proof is delayed to Section 5.
Let be a random matrix obeying condition C1. We say M obeys the gap property if for every and every , one has with high probability.
Let be a random matrix satisfying condition C1. Then obeys the gap property.
When , the singular value statistics around turn out to be different since the density function has a singularity at . The hard edge is not really an edge, which makes it easier to deal with. In this paper, we will focus on the edge case when .
We consider as an asymptotic parameter tending to infinity. We use , , , or to denote the bound for all sufficiently large and for some constant . Notations such as mean that the hidden constant depend on another constant . or means that as ; the rate of decay here will be allowed to depend on other parameters. We write for . We view vectors as column vectors. The Euclidean norm of a vector is defined as .
This paper is organized as follows: in Section 2, we prove a variant of universality result regarding the smallest singular value as an application of the Four Moment theorem. In Section 3, we mention a few basic results from linear algebra and probability. In Section 4, we provide the proofs of two technical lemmas, which are the major content of this paper. Finally, in Section 5, we give the proofs of the Gap theorem (Theorem 1.7) and Four Moment theorem (Theorem 1.5). The argument draws heavily from those in , and , thus we only focus on the changes needed to complete the proofs.
Acknowledgments: The author would like to thank Van H. Vu for useful discussion and his guidance through to the completion of this paper.
Applications
In a similar way as (Section 1.3), equipped with the Four Moment theorem, we can obtain universality results for large classes of random matrices. Let us demonstrate through some examples, focusing on the results for the lower edge of the spectrum. Recall denotes the smallest singular value of .
For the case when , the limiting distribution for Gaussian models was computed by Edelman . Recently a universality result has been established by Tao and Vu for the entries with bounded sufficiently high moments.
Let be a random covariance matrix, where tends to infinity as and . Let be independent for all .
where denote the Tracy-Widom distributions.
In a same way as the authors proving (, Theorem 9) and (, Theorem 11), one can get the following (also see Figure 1 for numerical simulations):
The conclusions of Theorem 2.1 can be extended to the case when tends to infinity as and , and when obeying condition C1 with sufficiently large constant , and have vanishing third moment.
Recently, Ben Arous and Péché proved universality at the edge for random matrices with i.i.d. entries of Gaussian divisible distribution. And with the matching theorem (Corollary 30, ), we can drop the third moment condition whereas is assumed to be supported on at least three points.
The conclusion (2.2) of Theorem 2.1 can be extended to the case when tends to infinity as and , and when obeying condition C1 with sufficiently large constant , and are supported on at least three points.
General Tools
In this section, we collect some basic tools from linear algebra and probability that will be used repeatedly in the sequel.
We start with the Cauchy interlacing law and the Weyl inequalities.
If is an Hermitian matrix, and is an minor, then for all .
If is a matrix, and is an minor, then for all .
If , if is a matrix, and is a minor, then for all , with the understanding that . (For , one can of course use the transpose of (ii) instead.)
If are Hermitian matrices, then for all .
If are matrices, then for all .
The following formula for an entry of a singular vector, in terms of the singular values and singular vectors of a minor, is very useful:
The next lemma is the well-known Cauchy interlacing identities:
Let be a Hermitian matrix, and
Let be the eigenvalues of and be the eigenvalues of . Suppose that is not orthogonal to any of the unit eigenvectors of . Then we have
From this lemma, one immediately gets an interlacing identity for singular values:
Assume the notations in Lemma 3.3, then for every ,
with eigenvalue .
Since we have and
(3.1) follows. Similarly, to show (3.2), apply Lemma 3.5 to the matrix
By Schur’s complement, it has the following alternate representation:
Let be a Hermitian matrix, and let be a complex number not in the spectrum of . Then we have
2. Tools from probability theory
We will rely frequently on the next concentration of measure result for projections of random vectors.
for some . Let be independent complex random variables with mean zero, variance one, and for some . For each , let be the complex random variable
(Upper tail bound on ) For , we have for some absolute constant .
(Lower tail bound on ) For any , one has .
The same claim holds if one of the is assumed to have variance instead of for some absolute constant .
Main technical lemmas
Recall in the proofs of the Four Moment Theorem and the Gap Theorem as in , and , a crucial input was the Delocalization Theorem of Erdős, Schlein, and Yau (, and ). The material in this section is analogous to Section 3 of . We will first extend the concentration of ESD result to the edge of the spectrum and use this concentration theorem to show the delocalization of singular vectors. The proof of the delocalization result in the edge of spectrum is significantly different from that in the bulk of spectrum as in . However, similar to , the Cauchy interlacing identities for singular values in Theorem 3.5 will help us to deal with this problem.
First observe that if obeys condition C1 for some constant , then by Markov’s inequality and the union bound, one has for all with probability . By a truncation technique (see for details) and Lemma 3.2, one may assume that
We will derive the eigenvalue concentration theorem (up to the edge) which is an analogue of Theorem 19 in :
Suppose that for some . Assume . Let obey condition for some and the probability distribution of be continuous. Assume further almost surely for some for all , where (which can depend on ). Then for any interval of length , one has with overwhelming probability(uniformly in ) that the number of eigenvalues of in obeys the concentration estimate
As a consequence of Theorem 4.1, one can deduce the following delocalization theorem:
Let the hypothesis be as in Theorem 4.1, then with overwhelming probability, all the unit left and right singular vectors of have all coefficients uniformly of size at most .
The continuity hypothesis in the above theorems, which guarantees the singular values are almost surely simple, is only a technical one. In practice we are able to eliminate this hypothesis by a limiting argument using Lemma 3.2.
with overwhelming probability. The delocalization of left singular vectors can be proved similarly.
The “bulk” case is treated in . Now we consider the edge case when or (say). Using the Marchenko-Pastur law, we have with overwhelming probability that
By Lemma 3.3, it suffices to show with overwhelming probability that
From Lemma 3.7, we conclude that with overwhelming probability for each (and hence for all , by the union bound). Then it is enough to show that with overwhelming probability
By the Cauchy-Schwarz inequality, it thus suffices to show that
with overwhelming probability for some . Noticed that , we thus need to show
with overwhelming probability for some , which is equivalent to prove that
with overwhelming probability for some .
In the interlacing identity in Lemma 3.5, we have
By Lemma 3.7, one gets with overwhelming probability. And since , one has
with overwhelming probability. In order to show (4.2),we will evaluate
for some , where .
The Machenko-Pastur law implies for every .
Let be a large constant to be chosen later. From Theorem 4.1, we have that (by taking )
with overwhelming probability for any interval of length , where . For such an interval, we see from Lemma 3.7 that with overwhelming probability
Set . If (say), then
for all in the above sum, and since , we get
We now partition the real line into intervals of length , and sum (4.7) over all intervals with . Bounding crudely by , we see that . Similarly, one has
Finally, Riemann integration of the principal value integral
If , using the formula for the Stieltjes transform, one obtains from residue calculus that
When , . (4.2) follows by comparing (4.8) and (4.9).
If , we have
When , . Then (4.2) follows by comparing (4.10) and (4.11).
By the concentration theorem 4.1 and the Cauchy interlacing law, the interval with will contribute at most eigenvalues and we can set accordingly. The proof is now complete.
2. Proof of Theorem 4.1:
We first have a crude upper bound on the number of eigenvalues of on an interval. The proof can be found in Section 5.2, .
with overwhelming probability, where is the number of eigenvalues in the interval .
The strategy is to compare the Stieltjes transform of the ESD of matrix
with the Stieltjes transform of Marchenko-Pastur Law
And thanks to the next proposition, one gets control on ESD through control on the Stieltjes transforms.
(Lemma 29, ) Let , and . Suppose that one has the bound
with (uniformly) overwhelming probability for all with and . Then for any interval in with , one has
By Proposition 4.5, our objective is to show
with (uniformly) overwhelming probability for all with and
Since is the unique solution to the equation
in the upper half plane (see ), we investigate a similar equation for .
The entries of are independent of each other and of , and have mean and variance . Noticed is a unit vector. By linearity of expectation we have
is the Stieltjes transform for the ESD of . From the Cauchy interlacing law, we can get
In fact a similar estimate holds for itself:
For , holds with (uniformly) overwhelming probability for all with and .
Let . Let be the space spanned by for and be the orthogonal projection onto . Thus
By Lemma 3.7, we conclude with overwhelming probability
Let , where and . We will use two auxiliary parameters , in later estimation.
First, for those such that , the function has magnitude . From Proposition 4.4, , the contribution for these ,
For the contribution of the remaining indices, we subdivide them as
for , and then sum over .
For each such interval, the function has magnitude and fluctuates by at most . Say is the set of all ’s in this interval, by Proposition 4.4, . Together with bounds (4.14), (4.15), the contribution for these on such an interval,
Summing over (taking into account that ), we will get
Recall has an explicit expression
where we take the branch of with cut at that is asymptotically as .
From (4.13) and Proposition 4.6, we have with overwhelming probability that
where we used Lemma 3.7 to obtain that with overwhelming probability.
By assumption , when is large enough,
In (4.16), for the error term , one has either or . In the latter case, we get . In the first case, we impose a Taylor expansion on (4.16),
Completing a perfect square for in the above identity, one can solve the equation for ,
If , by a Taylor expansion on the right hand side of (4.17), we have . Therefore, or . If , from (4.17) and the explicit formula for , we still have .
To summarize the above discussion, one has, with overwhelming probability, either
We may assume the above trichotomy holds for all with and where .
When , from and , we have and are both and therefore holds in this case. By continuity, we conclude that either (4.18) holds in the domain of interest or there exists some in the domain such that (4.18) and (4.19) or (4.18) and (4.20) hold together.
On the other hand, (4.18) or (4.20) cannot hold at the same time. Otherwise, . However, from and , one can see that is bounded from below, which implies a contradiction.
Similarly, (4.18) or (4.19) cannot both hold except when . Otherwise, we can conclude that . From the explicit formula of ,
One can conclude , which contradicts our assertion. Actually, if , (4.18) and (4.19) are equivalent.
In conclusion, (4.18) holds with overwhelming probability in the domain of interest.
Gap theorem and Four Moment theorem
In this section, we complete the proofs of the main results, Theorem 1.5 and Theorem 1.7. The proofs follow closely those in (as well as in , ), so we shall focus on the changes needed to that argument. We assume substantial familiarity with the materials in , , and will cite from them repeatedly.
It is convenient to use the augmented matrix
which is a Hermitian matrix with eigenvalues and zeros. In this way, we can import the results obtained in , and to the model discussed in this paper.
As mentioned in the beginning of Section 5, one can assume that
almost surely for all . We also assume that the distributions of are continuous to ensure the singular values are almost surely simple.
Let us first state a weaker version the Four Moment Theorem as we assume gap properties for the matrices considered:
For sufficiently small and sufficiently large ( will suffice) the following holds for every . Let and be two random matrices satisfying condition C1 with the indicated constant , and assume that for each that and match to order 4. Let be the associated covariance matrices. Assume also that obey the gap property and for some .
Then for any , and for sufficiently large depending on , we have
If and only match to order 3 rather 4, then the conclusion (5.3) still holds provided that one strengthens (5.2) to
The Four Moment theorem follows directly from Theorem 1.7 and Theorem 5.1. The next two sections are devoted to the proofs of Theorem 1.7 and Theorem 5.1.
The key technical step (also used in proving Theorem 1.7) is the truncated Four Moment Theorem, which follows by applying [12, Proposition 6.1 and Proposition 6.2] (or see [10, Proposition 35]) to the argumented matrix M. The proof is omitted here.
For sufficiently small and sufficiently large ( will suffice) the following holds for every . Let and be two random matrices satisfying condition C1 with the indicated constant , and assume that for each that and match to order 4. Assume also that and for some .
Then for any , and for sufficiently large depending on , we have
If and only match to order 3 rather 4, then the conclusion (5.5) still holds provided that one strengthens (5.4) to
for any , provided that is sufficiently small depending on .
As in the arguments in Section 6 in , we use the qualities for ,
The gap property (up to the edge) on M ensures an upper bound on . The proof repeats exactly the proof of Lemma 32 in .
If M satisfies the gap property, then for any (independent of n), and any , one has with high probability.
where is a smooth cutoff to the region which equals on . From Propositon 5.3, we have
for some , and a similar relation holds for . The proof is complete by using the above relations and Theorem 5.2.
2. Proof of the Gap Theorem
We first have a gap theorem under additional exponential decay hypothesis on the ensembles of . The proof is presented in Section 5.3.
Let be a random matrix obeying condition C1, and the entries satisfy exponential decay in the sense that for all for all and some constants . Then obeys the gap property.
The next observation is the following matching lemma (See Lemma 33 in ), which together with Theorem 5.2, ensures us to remove the exponential decay hypothesis in Theorem 5.4.
Now consider the matrix in Theorem 1.7. By the matching lemma, we can find a random matrix such that satisfies the exponential decay hypothesis and matches to third order for each . By Theorem 5.4, the matrix obeys the gap property. Similarly as in Section 6, , let be a smooth cutoff to the region . Then by Proposition 5.3, , which, by using Theorem 5.2, implies that for some independent of . Hence, also satisfies the gap property.
3. Proof of Theorem 5.4:
The proof follows closely to that discussed in Section 5, . We shall mainly mention the corresponding changes. Interested readers can find the detailed proofs in . First, in order to operate an induction argument, we need to treat the edge case separately.
By symmetry, it suffices to show for . In the interlacing identity (Lemma 3.5),
From Theorem 4.2 and Lemma 3.8, one can conclude that with high probability. Therefore, with high probability. The conclusion follows from the Cauchy interlacing law. ∎
For the general case for the gap theorem, we write instead of and define , as in , we introduce the regularized gap
where is a large constant to be determined later. To show Theorem 5.4, it is enough to show that
By repeating the arguments in Section 3.5, , the proof relies on the following two key propositions. The idea is to propagate a narrow gap for backwards in until one can use Theorem 4.1 to control the occurrence of the gap.
. Suppose and is such that
for some (which can depend on ), and that
Let be the row of , and let be an orthonormal system of right singular vectors of associated to . Then one of the following statement holds:
(Macroscopic spectral concentration) There exists with such that
(Small inner products) There exists with such that
(Large singular value) For some one has
(Large inner product) There exists such that
(Large inner product near ) There exists with such that
Apply Lemma 5.3 in to the augmented matrix
Noticed is with the rightmost column and bottom column(which is and zeros) removed. The eigenvalues of are and , and an orthonormal eigenbasis includes the vectors \left(\begin{array}[]{c}u_{j}(M_{p,n})\\ v_{j}(M_{p,n})\end{array}\right) for . (The ”Large coefficient” event in Lemma 5.3, ) cannot occur as has zero diagonals.) ∎
The next proposition claims that the events (i)-(vi) occurs with small probability.
. Suppose that and and set for some sufficiently small fixed . Then
The events (i), (iii), (iv), (v) in Proposition 5.7 all fail with high probability.
There is a constant such that all the coefficients of the right singular vectors for are of magnitude at most with overwhelming probability. Conditioning to be a matrix with this property, the events (ii) and (vi) occur with a conditional probability of at most .
Furthermore, there is a constant (depending on ) such that if and is conditioned as in (b), then (ii) and (vi) in fact occur with a conditional probability of at most .
The proof of the above proposition repeats the proof of Proposition 53 in with the major difference being that Theorem 4.1 and Theorem 4.2 are applied instead of Theorem 60 and Proposition 62 in .