Random matrices: Universality of local eigenvalue statistics up to the edge
Terence Tao, Van Vu
Introduction
In our recent paper , a universality result (the Four Moment Theorem) was established for the eigenvalue spacings in the bulk of the spectrum of random Hermitian matrices. (See for an extended discussion of the universality phenomenon, and for further references on universality results in the context of Wigner Hermitian matrices.)
The main purpose of this paper is to extend this universality result to the edge of the spectrum as well.
To recall the Four Moment Theorem, we need some notation.
A random Hermitian matrix is said to obey condition C0 if
The are independent (but not necessarily identically distributed) for . For , they have mean zero and variance ; for , they have mean zero and variance for some fixed independent of .
(Uniform exponential decay) There exist constants such that
for all and .
Examples of random Hermitian matrices obeying Condition C0 include the GUE and GOE ensembles, or the random symmetric Bernoulli ensemble in which each of the are equal to with equal probability . In GOE one has , but in the other two cases one has . The arguments in the previous paper were largely phrased for the case , but it is not difficult to see that the arguments extend without difficulty to other values of (the main point being that a modification of the variance of a single entry of a row vector does not significantly affect the Talagrand concentration inequality, [27, Lemma 43], or Lemma 2.1 below.).
Given an Hermitian matrix , we denote its eigenvalues as
It will be convenient to introduce the following notation for frequent events depending on , in increasing order of likelihood:
holds asymptotically almost surely ifSee Section 1.19 for our conventions on asymptotic notation. .
holds with high probability if for some constant .
holds with overwhelming probability if for every constant (or equivalently, that ).
holds almost surely if .
We say that two complex random variables and match to order if
for all such that .
The first main result can now be stated as follows:
If and only match to order rather than , then there is a positive constant independent of such that the conclusion (3) still holds provided that one strengthens (2) to
Informally, this theorem asserts that the distribution of any bounded number of eigenvalues in the bulk of the spectrum of a random Hermitian matrix obeying condition C0 depends only on the first four moments of the coefficients.
There is also a useful companion result to Theorem 1.5, which is used both in the proof of that theorem, and in several of its applications:
[27, Theorem 17] Let be a constant, and let be a random matrix obeying Condition C0. Set . Then for every , and for sufficiently large depending on , and the constants in Definition 1.2, and for each , one has with high probability. In fact, one has
for some depending on (and independent of ).
Theorem 1.5 (and to a lesser extent, Theorem 1.6) can be used to extend the range of applicability for various results on eigenvalue statistics in the bulk for Hermitian or symmetric matrices, for instance in extending results for special ensembles such as GUE or GOE (or ensembles obeying some regularity or divisibility conditions) to more general classes of matrices. See , , for some examples of this type of extension. We also remark that a level repulsion estimate which has a similar spirit to Theorem 1.6 was established in [9, Theorem 3.5], although the latter result establishes repulsion of eigenvalues in a fixed small interval , rather than at a fixed index of the sequence of eigenvalues, and does not seem to be directly substitutable for Theorem 1.6 in the arguments of this paper.
The results of Theorem 1.5 and Theorem 1.6 only control eigenvalues in the bulk region for some fixed (independent of ). The reason for this restriction was technical, and originated from the use of the following two related results (which are variants of previous results of Erdős, Schlein, and Yau), whose proof relied on the assumption that one was in the bulk:
[27, Theorem 56] For any and any random Hermitian matrix whose upper-triangular entries are independent with mean zero and variance , and such that almost surely for all and some , and any interval in of width , the number of eigenvalues of in obeys the concentration estimate
with overwhelming probability, where is the semicircular distribution
In particular, with overwhelming probability.
[27, Proposition 58] Let be as in Theorem 1.7. Then for any with , if denotes a unit eigenvector corresponding to , then with overwhelming probability each coordinate of is .
In the bulk region , the semicircular function is bounded away from zero. Thus, Theorem 1.7 ensures that the eigenvalues of in the bulk tend to have a mean spacing of on the average. Applying the Cauchy interlacing law
where is an minor of , this implies that the bulk eigenvalues of are within of the corresponding eigenvalues of on the average. Using linear algebra to express the coordinates of the eigenvector in terms of and a minor (see Lemma 4.1 below), and using some concentration of measure results concerning the projection of a random vector to a subspace (see Lemma 2.1), we eventually obtain Proposition 1.8.
9. Universality up to the edge
The main results of this paper are that the above four theorems can be extended to the edge of the spectrum (thus effectively sending to zero). Let us now state these results more precisely. Firstly, we have the following extension of Theorem 1.7:
Consider a random Hermitian matrix whose upper-triangular entries are independent with mean zero and variance , and such that almost surely for all and some . Let be a quantity which can depend on , and let be an interval such that
We also make the mild assumption . Then the number of eigenvalues of in obeys the concentration estimate
The powers of , and here are probably not best possible, but this will not be relevant for our purposes. In our applications will be a power of , and will be a negative power of . (This allows the error term in the above estimate for to exceed the main term when one is very near the edge, but this will not impact our arguments.)
We prove this theorem in Section 3, using the same (standard) Stieltjes transform method that was used to prove Theorem 1.7 in (see also ), with a somewhat more careful analysis. We next use it to obtain the following extension of Proposition 1.8:
Let be a random matrix obeying condition C0. Then with overwhelming probability, every unit eigenvector of has coefficients at most , thus
where is the standard basis.
The deduction of Proposition 1.12 from Theorem 1.10 differs significantly from the analogous deduction of Proposition 1.8 in Theorem 1.7 in . The main difference is that in the current case is no longer bounded away from zero, which causes the average eigenvalue spacing between and to be considerably larger than . For instance, the gap between the second largest eigenvalue and the largest eigenvalue is typically of size .
The key new ingredient that help us to deal with this problem is the following observation: the Cauchy interlacing law (5), when applied to the eigenvalues of the edge, is strongly bias. In particular, the gap between and is significantly smaller than the gap between and . We can show that (with high probability), the first gap is of order while the second can be as large as (and similarly for the gap between and when ). This new ingredient will be sufficient to recover Proposition 1.12; see Section 4, where the above proposition is proved.
Using Theorem 1.10 and Proposition 1.12, one can continue the arguments from to establish the following extensions of Theorem 1.5 and Theorem 1.6:
Let be a random matrix obeying Condition C0. Set . Then for every , and for sufficiently large depending on , and the constants in Definition 1.2, and for each , one has with high probability, uniformly in .
The novelty here is that we have no assumption on the indices and . We present the proof of these Theorems in Sections 5, 6, following the arguments in closely.
15. Applications
As Theorems 1.13, 1.14 extend Theorems 1.5, 1.6, all the applications of the latter theorems in (concerning the bulk of the spectrum) can also be viewed as applications of these theorems. But because these results extend all the way to the edge, we can now obtain some results on the edge of the spectrum as well. For instance, we can prove
Let be a fixed integer and be a matrix obeying Condition C0, and suppose that the real and imaginary part of the atom variables have the same covariance matrix as the GUE ensemble (i.e. both components have variance , and have covariance ). Assume furthermore that all third moments of the atom variables vanish. Set . Then the joint distribution of the dimensional random vector
has a weak limit as , which coincides with that in the GUE case (in particular, the limit for is the GUE Tracy-Widom distribution, and for higher is controlled by the Airy kernel ). The result also holds for the smallest eigenvalues , with the offset replaced by .
If the atom variables have the same covariance matrix as the GOE ensemble (i.e. they are real with variance off the diagonal, and on the diagonal), instead of the GUE ensemble, then the same conclusion applies but with the GUE distribution replaced of course by the GOE distribution (see for the case).
This result was previously established by Soshnikov (see also , ) in the case when is a Wigner Hermitian matrix (i.e. the off-diagonal entries are iid, and the matrix matches GUE to second order at least) with symmetric distribution (which implies, but is stronger than, matching to third order). For some additional partial results in the non-symmetric case see , . The exponential decay condition in Soshnikov’s result has been lowered to a finite number of moments; see , . It is reasonable to conjecture that the exponential decay conditions in this current paper can similarly be lowered, but we will not pursue this issue here. It also seems plausible that the third moment matching conditions could be dropped, though this is barely beyond the reach of the current methodNote added in proof: the third moment condition has recently been dropped in , by combining the four moment theorem here with a new proof of universality for the distribution of the largest eigenvalue for gauss divisible matrices..
We just prove the claim for the largest eigenvalues and for GUE, as the claim for the smallest and/or GOE is similar.
only changes by when the matrix is replaced with GUE. But this follows from the final conclusion of Theorem 1.13, thanks to the extra factor . ∎
Notice that there is some room to spare in this argument, as the gain in (8) is far more than is needed for (6). Because of this, one can obtain similar universality results for suitably normalised eigenvalues with or for any (where the normalisation factor for is now , and the offset is replaced by , where ). We omit the details.
In analogy with , one should be able to drop the third moment condition in Theorem 1.16 if one can control the distribution of the largest (or smallest) eigenvalues from random matrices obtained from a suitable Ornstein-Uhlenbeck process, as in .
19. Notation
We consider as an asymptotic parameter tending to infinity. We use , , , or to denote the bound for all sufficiently large and for some constant . Notations such as mean that the hidden constant depend on another constant . or means that as ; the rate of decay here will be allowed to depend on other parameters. We write for .
Eigenvalues are always ordered in increasing order, thus for instance ) is the largest eigenvalue of a Hermitian matrix , and is the smallest.
20. Acknowledgements
The authors thank the anonymous referee for helpful comments and references, and Horng-Tzer Yau for additional references.
General tools
In this section we record some general tools (proven in ) which we will use repeatedly in the sequel. We begin with a very useful concentration of measure result that describes the projection of a random vector to a subspace.
The same conclusion holds (with replaced by another explicit constant) if one of the entries of is assumed to have variance instead of , for some absolute constant .
See [27, Lemma 40]. (The main tool in the proof is Talagrand’s concentration inequality.) It is clear from the triangle inequality that the modification of variance in a single entry does not significantly affect the conclusion except for constants. ∎
Next, we record a crude but useful upper bound on the number of eigenvalues in a short interval.
with overwhelming probability, where is the number of eigenvalues of in .
See [27, Proposition 62]. (The main tools in the proof are the Stieltjes transform method, Lemma 3.3 below, and Lemma 2.1.) Again, the generalisation to variances other than on the diagonal do not cause significant changes to the argument. ∎
Finally, we recall a Berry-Esséen type theorem:
for some . Let be independent complex random variables with mean zero, variance equal to , and obeying for some . For each , let be the complex random variable
(Upper tail bound on ) For , we have for some absolute constant .
(Lower tail bound on ) For any , one has .
The same claim holds if one of the is assumed to have variance instead of for some absolute constant .
See [27, Theorem 41]. Again, the modification of the variance on a single entry can be easily seen to have no substantial effect on the conclusion. ∎
Asymptotics for the ESD
In this section we prove Theorem 1.10, using the Stieltjes transform method (see for a general discussion of this method). We may assume throughout that is large, since the claim is vacuous for small.
It is known by the moment method (see e.g. or ) that with overwhelming probability, all eigenvalues of have magnitude at most . Because of this, we may restrict attention to the case when lies in interval $$ (say).
We recall the Stieltjes transform of a Hermitian matrix , defined for complex by the formula
We also introduce the semicircular counterpart
which by a standard contour integral computation can be given explicitly as
where we use the branch of the square root of with cut at $z$ at infinity.
It is well known that one can control the empirical spectral distribution via the Stieltjes transform. We will use the following formalization of this principle:
There is a positive constant such that the following holds for any Hermitian matrix . Let and . Suppose that one has the bound
with (uniformly) overwhelming probability for all with and . Then for any interval in with , one has
with overwhelming probability, where is the number of eigenvalues of in .
As a consequence of this lemma (with and , say), we see that Theorem 1.10 follows from
Consider a random Hermitian matrix whose upper-triangular entries are independent with mean zero and variance , with variance on the diagonal for some absolute constant , and such that almost surely for all and some . Set . Let (which can depend on ), and suppose that . Then (12) holds with (uniformly) overwhelming probability for all with and
The remainder of this section is devoted to proving Theorem 3.2. Fix as in Theorem 3.2, thus and , where
Our objective is to show (12) with (uniformly) overwhelming probability.
As in previous works (in particular, ), the key is to exploit the fact that when , is the unique solution to the equation
with ; this is immediate from (11).
We now seek a similar relation for . Note that by (10). We use the following standard matrix identity (cf. [27, Lemma 39], or [2, Chapter 11]):
is the matrix with the row and column removed, and is the row of with the element removed.
By Schur’s complement, is the diagonal entry of . Taking traces, one obtains the claim. ∎
For each , one has with overwhelming probability.
The entries of are independent of each other and of , and have mean zero and variance . By linearity of expectation we thus have, on conditioning on
is the Stieltjes transform of . From the Cauchy interlacing law (5) and (13), we have
and so it will remain to show the concentration estimate
Rewriting , it suffices to show that
with overwhelming probability, where .
where is the space spanned by the for . From Lemma 2.1 and the union bound, we conclude that with overwhelming probability
By the triangle inequality, this implies that
and hence by a further application of the triangle inequality
The plan is to use (17) and (18) to establish (16). Accordingly, we split the LHS of (16), into several subsums according to the distance . Lemma 2.2 provides a sharp estimate on the number of terms of each subsum which will allow us to obtain a good upper bound on the absolute value.
We turn to the details. From (13) we can choose two auxiliary parameters such that
Indeed, one could set and and use (13).
Fix such parameters, and consider the contribution to (16) of the indices for which
By Lemma 2.2 and (19), the interval of for which this occurs has cardinality (with overwhelming probability). On this interval, the quantity has magnitude . Applying (18) (and (19)), we see that the contribution of this case is thus
Next, we consider the contribution to (16) of those indices for which
for some integer , and then sum over . By Lemma 2.2 and (19), the set of for which this occurs is contained (with overwhelming probability) in at most two intervals of cardinality . On each of these intervals, the quantity has magnitude and fluctuates by . Applying (17), (18) (and noting that exceeds , by (19)) we see that the contribution of a single to (16) is at most
We now conclude the proof of Theorem 1.10. By hypothesis,
almost surely. Inserting these bounds into (15), we see that with overwhelming probability
By the triangle inequality (and square rooting the decay), we can assume that either the error term is , or that is . Suppose the former holds. Then by Taylor expansion
If we assume (say), we conclude that . Multiplying out by and rearranging, we obtain
(treating the case when separately).
To summarise, we have shown (with overwhelming probability) that in the region
that one either has , , or . It is not hard to see that the first two cases are disconnected from the third (for large enough) in this region, because is bounded away from zero, as is . Furthermore, the first and second possibilities are also disconnected from each other except when . Also, the second and third possibilities can only hold for since and both have positive real part. A continuity argument thus shows that the first possibility must hold throughout the region except when , in which case either the first or second possibility can hold; but in that region, the first and second possibility are equivalent, and (12) follows. The proof of Theorem 1.10 is now complete.
Delocalization of eigenvectors
Without loss of generalization, we can assume that the entries are continuously distributed. Having established Theorem 1.10, we now use this theorem to establish Proposition 1.12.
Let obey condition C0. Then by Markov’s inequality, one has with overwhelming probability (here and in the sequel we allow implied constants in the notation to depend on the constants in (1)). By conditioning the to this eventStrictly speaking, this distorts the mean and variance of by an exponentially small amount, but one can easily check that this does not significantly impact any of the arguments in this section., we may thus assume that
almost surely for some .
Fix ; by symmetry we may take . By the union bound and another application of symmetry, it suffices to show that
To compute we use the following identity from (see also [27, Lemma 38]):
where is a unit eigenvector corresponding to the eigenvalue .
By subtracting from we may assume . The eigenvector equation then gives
Since , we conclude
Since , the claim follows. ∎
Let be the bottom right minor of . As we are assuming that the coefficients of are continuously distributed, we see almost surely that none of the eigenvalues of are equal to . We may thus apply Lemma 4.1 and conclude that
where is the bottom left vector of (and thus has entries for ). It thus suffices to show that
It will be convenient to eliminate the exponent in the denominator, as follows. From Lemma 2.1, one has with overwhelming probability for each (and hence for all , by the union bound). It thus suffices to show that
with overwhelming probability. By the Cauchy-Schwarz inequality, it thus suffices to show that
with overwhelming probability for some . It is convenient to work with the normalized matrix , thus we need to show
with overwhelming probability for some , where has entries for .
There are two cases: the bulk case and the edge case; the former was already treated in , but the latter is new.
Suppose that . Then from the semicircular law (or Theorem 1.10) we see that with overwhelming probability for some absolute constant . Let be a large constant to be chosen later. A further application of Theorem 1.10 then shows that there is an interval of length centered at which contains eigenvalues of . If lie in , then by the Cauchy interlacing property (5), . One can thus lower bound the left-hand side of (21) (for suitable values of ) by
One can rewrite this as , where is the span of the for . The claim then follows from Lemma 2.1 (for large enough).
3. The edge case
We now turn to the more interesting edge case when . Using the semicircular law, we now see that
Next, we can exploit the following identity:
[27, Lemma 37] If is non-zero for every , then
By diagonalising (noting that this does not affect either side of (23)), we may assume that and for . One then easily verifies that the characteristic polynomial of is equal to
when is distinct from . Since is non-zero by hypothesis, we see that this polynomial does not vanish at any of the . Substituting for , we obtain (23). ∎
Again, the continuity of the entries of ensure that the hypothesis of Lemma 4.4 is obeyed almost surely. From (20), (22), (23) one has
with overwhelming probability, so to show (21), it will suffice by the triangle inequality to show that
(say) with overwhelming probability for some .
Let be a large constant to be chosen later. By Theorem 1.10, we see (if is large enough) that
with overwhelming probability for any interval of length , where . For any such interval, we see from Lemma 2.1 (and Cauchy interlacing (5)) that with overwhelming probability
Set . If (say), then
We now partition the real line into intervals of length , and sum (26) over all with . Bounding crudely by , we see that . Similarly, one has
if is large enough. Finally, Riemann integration of the principal value integral
The operator norm of is with overwhelming probability (see e.g. , ), so . Using the formula (11) for the Stieltjes transform, one obtains from residue calculus that
for , with the right-hand side replaced by for . In either event, we have
The intervals with will contribute at most eigenvalues, by (25) (and Cauchy interlacing (5)). The claim (24) now follows by setting and appropriately. The proof of Proposition 1.12 is now complete.
with overwhelming probability for all , and similarly one has
with overwhelming probability for all . On the other hand, according to the Tracy-Widom law, the gap between and (or between and ) can be expected to be as large as . Thus we see that there is a significant bias at the edge in the interlacing law (5), which can ultimately be traced to the imbalance of “forces” in the interlacing identity (23) at that edge.
Lower bound on eigenvalue gap
We now give the proof of Theorem 1.14. Most of the proof will follow closely the proof of Theorem 1.6 in , so we shall focus on the changes needed to that argument. As such, this section will assume substantial familiarity with the material from , and will cite from it repeatedly. Similarly for the next section.
For technical reasons relating to an induction argument, it turns out that one has to treat the extreme cases separately:
Theorem 1.14 is true when or .
By symmetry it suffices to do this for . By a limiting argument we may assume that the entries of are continuously distributed. From Lemma 4.4 one has (almost surely) that
Recall that with overwhelming probability; also, with overwhelming probability. As all the terms in the left-hand side have the same sign, we conclude that
From Theorem 2.3 and Proposition 1.12, we have (say) with high probability, and so
with high probability. The claim now follows from the Cauchy interlacing property (5). ∎
In fact, at the edge, one should be able to improve the lower bound on the eigenvalue gap substantially, from to , in accordance to the Tracy-Widom law, but we will not need to do so here.
Now we handle the general case of Theorem 1.14. Fix and . We write for , thus and our task is to show that
with high probability. By Proposition 5.1 we may assume . We may also assume to be large, as the claim is vacuous otherwise. As in previous sections, we may truncate so that all coefficients are of size (as before, the exponentially small corrections to the mean and variance of caused by this are easily controlled), and approximate so that the distribution is continuous rather than discrete.
For each , let be the top left minor of . As in [27, Section 3.4], we introduce the regularized gap
for all and , where is a large constant to be chosen later. It will suffice to show that for each , that
with high probability. By symmetry we may assume that .
We will need two key lemmas. First, we have the following deterministic lemma (a minor variant of [27, Lemma 47]), showing that a narrow gap can be propagated backwards in unless one of a small number of exceptional events happen:
Suppose that and is such that
for some (which can depend on ), and that
Then one of the following statements hold:
(Macroscopic spectral concentration) There exists with such that .
(Small inner products) There exists with such that
(Large eigenvalue) For some one has
(Large inner product) There exists such that
(Large inner product near ) There exists with such that
The proof of this proposition repeats the proof of [27, Lemma 47] in [27, Section 6] almost exactly. Only the following changes have to be made:
We have the upper bound , which together with (29) forces and thus as required.
The variable now lies in the range rather than .
has to be defined as rather than just (and similarly for ).
Next, we need the following result that asserts that the events (i)-(vii) are rare:
Suppose that and , and set for some sufficiently small fixed . Let be such that . Then:
The events (i), (iii), (iv), (v), (vi) in Lemma 5.3 all fail with high probability.
There is a constant such that all the coefficients of the eigenvectors for are of magnitude at most with overwhelming probability. Conditioning to be a matrix with this property, the events (ii) and (vii) occur with a conditional probability of at most .
Furthermore, there is a constant (depending on ) such that if and is conditioned as in (b), then (ii) and (vii) in fact occur with a conditional probability of at most .
The proof of this proposition repeats the proof of [27, Proposition 49] in [27, Section 7] almost exactly. Only the following changes have to be made:
All references to [27, Theorem 56] (i.e. Theorem 1.7) need to be replaced with Theorem 1.10.
The variable now lies in the range rather than .
Given Lemma 5.3 and Proposition 5.4, the proof of Theorem 1.14 exactly follows the proof of Theorem 1.6 in [27, Section 3.5], with the following minor changes:
In the definition of the event , the range needs to be expanded to .
In the definition of the event , the events that (29) fail for some have to be included; but these events occur with polynomially small probability, thanks to Proposition 5.1 and the union bound.
This concludes the proof of Theorem 1.14.
The four moment theorem
We now prove Theorem 1.13. As in [27, Section 3.3], the proof is based on two key propositions. The first proposition asserts that one can swap a single coefficient (or more precisely, two coefficients) of a (deterministic) matrix as long as obeys a certain “good configuration condition”:
There exists a positive constant such that the following holds. Let and , and assume sufficiently large depending on these parameters. Let . For a complex parameter , let be a (deterministic) family of Hermitian matrices of the form
where are unit vectors. We assume that for every and every whose real and imaginary parts are multiples of , we have
(Eigenvalue separation) For any with , we have
(Delocalization at ) If is the orthogonal projection to the eigenspace associated to , then
whenever is the orthogonal projection to the eigenspaces corresponding to eigenvalues with .
We say that are a good configuration for if the above properties hold. Assuming this good configuration, then we have
for all , and are random variables with almost surely, which match to order for some .
If obeys the improved derivative bounds
for and some sufficiently large absolute constant , then we can strengthen in (36) to .
The second proposition asserts that these good configurations occur very frequently:
The proof of this proposition repeats the proof of [27, Proposition 44] in [27, Section 5] almost exactly. Only the following changes have to be made:
All references to [27, Theorem 56] (i.e. Theorem 1.7) need to be replaced with Theorem 1.10.
All references to [27, Proposition 58] (i.e. Proposition 1.8) need to be replaced with Proposition 1.12.
The edge regions in which do not fall inside the bulk region no longer need to be treated separately, thus simplifying the last paragraph of the proof somewhat.
Given these two propositions, the proof of Theorem 1.13 repeats the proof of [27, Theorem 15] in [27, Section 3.3] almost exactly. Only the following changes have to be made:
All references to [27, Proposition 44] need to be replaced with Proposition 6.2.
The proof of Theorem 1.13 is now complete.