Random matrices: Universality of local eigenvalue statistics up to the edge

Terence Tao, Van Vu

Introduction

In our recent paper , a universality result (the Four Moment Theorem) was established for the eigenvalue spacings in the bulk of the spectrum of random Hermitian matrices. (See for an extended discussion of the universality phenomenon, and for further references on universality results in the context of Wigner Hermitian matrices.)

The main purpose of this paper is to extend this universality result to the edge of the spectrum as well.

To recall the Four Moment Theorem, we need some notation.

A random Hermitian matrix Mn=(ζij)1i,jnM_{n}=(\zeta_{ij})_{1\leq i,j\leq n} is said to obey condition C0 if

The ζij\zeta_{ij} are independent (but not necessarily identically distributed) for 1ijn1\leq i\leq j\leq n. For 1i<jn1\leq i<j\leq n, they have mean zero and variance 11; for i=ji=j, they have mean zero and variance cc for some fixed c>0c>0 independent of nn.

(Uniform exponential decay) There exist constants C,C>0C,C^{\prime}>0 such that

for all tCt\geq C^{\prime} and 1i,jn1\leq i,j\leq n.

Examples of random Hermitian matrices obeying Condition C0 include the GUE and GOE ensembles, or the random symmetric Bernoulli ensemble in which each of the ζij\zeta_{ij} are equal to ±1\pm 1 with equal probability 1/21/2. In GOE one has c=2c=2, but in the other two cases one has c=1c=1. The arguments in the previous paper were largely phrased for the case c=1c=1, but it is not difficult to see that the arguments extend without difficulty to other values of cc (the main point being that a modification of the variance of a single entry of a row vector does not significantly affect the Talagrand concentration inequality, [27, Lemma 43], or Lemma 2.1 below.).

Given an n×nn\times n Hermitian matrix AA, we denote its nn eigenvalues as

It will be convenient to introduce the following notation for frequent events depending on nn, in increasing order of likelihood:

EE holds asymptotically almost surely ifSee Section 1.19 for our conventions on asymptotic notation. P(E)=1o(1){\mathbf{P}}(E)=1-o(1).

EE holds with high probability if P(E)1O(nc){\mathbf{P}}(E)\geq 1-O(n^{-c}) for some constant c>0c>0.

EE holds with overwhelming probability if P(E)1OC(nC){\mathbf{P}}(E)\geq 1-O_{C}(n^{-C}) for every constant C>0C>0 (or equivalently, that P(E)1exp(ω(logn)){\mathbf{P}}(E)\geq 1-\exp(-\omega(\log n))).

EE holds almost surely if P(E)=1{\mathbf{P}}(E)=1.

We say that two complex random variables ζ\zeta and ζ\zeta^{\prime} match to order kk if

for all m,l0m,l\geq 0 such that m+lkm+l\leq k.

The first main result can now be stated as follows:

If ζij\zeta_{ij} and ζij\zeta^{\prime}_{ij} only match to order 33 rather than 44, then there is a positive constant CC independent of c0c_{0} such that the conclusion (3) still holds provided that one strengthens (2) to

Informally, this theorem asserts that the distribution of any bounded number of eigenvalues in the bulk of the spectrum of a random Hermitian matrix obeying condition C0 depends only on the first four moments of the coefficients.

There is also a useful companion result to Theorem 1.5, which is used both in the proof of that theorem, and in several of its applications:

[27, Theorem 17] Let 0<ε<10<{\varepsilon}<1 be a constant, and let MnM_{n} be a random matrix obeying Condition C0. Set An:=nMnA_{n}:=\sqrt{n}M_{n}. Then for every c0>0c_{0}>0, and for nn sufficiently large depending on ε{\varepsilon}, c0c_{0} and the constants C,CC,C^{\prime} in Definition 1.2, and for each εni(1ε)n{\varepsilon}n\leq i\leq(1-{\varepsilon})n, one has λi+1(An)λi(An)nc0\lambda_{i+1}(A_{n})-\lambda_{i}(A_{n})\geq n^{-c_{0}} with high probability. In fact, one has

for some c1>0c_{1}>0 depending on c0c_{0} (and independent of ε{\varepsilon}).

Theorem 1.5 (and to a lesser extent, Theorem 1.6) can be used to extend the range of applicability for various results on eigenvalue statistics in the bulk for Hermitian or symmetric matrices, for instance in extending results for special ensembles such as GUE or GOE (or ensembles obeying some regularity or divisibility conditions) to more general classes of matrices. See , , for some examples of this type of extension. We also remark that a level repulsion estimate which has a similar spirit to Theorem 1.6 was established in [9, Theorem 3.5], although the latter result establishes repulsion of eigenvalues in a fixed small interval II, rather than at a fixed index ii of the sequence of eigenvalues, and does not seem to be directly substitutable for Theorem 1.6 in the arguments of this paper.

The results of Theorem 1.5 and Theorem 1.6 only control eigenvalues λi(An)\lambda_{i}(A_{n}) in the bulk region εni(1ε)n{\varepsilon}n\leq i\leq(1-{\varepsilon})n for some fixed ε>0{\varepsilon}>0 (independent of nn). The reason for this restriction was technical, and originated from the use of the following two related results (which are variants of previous results of Erdős, Schlein, and Yau), whose proof relied on the assumption that one was in the bulk:

[27, Theorem 56] For any ε,δ>0{\varepsilon},\delta>0 and any random Hermitian matrix Mn=(ζij)1i,jnM_{n}=(\zeta_{ij})_{1\leq i,j\leq n} whose upper-triangular entries are independent with mean zero and variance 11, and such that ζijK|\zeta_{ij}|\leq K almost surely for all i,ji,j and some 1Kn1/2ε1\leq K\leq n^{1/2-{\varepsilon}}, and any interval II in [2+ε,2ε][-2+{\varepsilon},2-{\varepsilon}] of width IK2log20nn|I|\geq\frac{K^{2}\log^{20}n}{n}, the number of eigenvalues NIN_{I} of Wn:=1nMnW_{n}:=\frac{1}{\sqrt{n}}M_{n} in II obeys the concentration estimate

with overwhelming probability, where ρsc\rho_{sc} is the semicircular distribution

In particular, NI=Θε(nI)N_{I}=\Theta_{\varepsilon}(n|I|) with overwhelming probability.

[27, Proposition 58] Let ε,Mn,Wn,ζij,K{\varepsilon},M_{n},W_{n},\zeta_{ij},K be as in Theorem 1.7. Then for any 1in1\leq i\leq n with λi(Wn)[2+ε,2ε]\lambda_{i}(W_{n})\in[-2+{\varepsilon},2-{\varepsilon}], if ui(Wn)u_{i}(W_{n}) denotes a unit eigenvector corresponding to λi(Wn)\lambda_{i}(W_{n}), then with overwhelming probability each coordinate of ui(Mn)u_{i}(M_{n}) is Oε(K2log20nn1/2)O_{{\varepsilon}}(\frac{K^{2}\log^{20}n}{n^{1/2}}).

In the bulk region [2+ε,2ε][-2+{\varepsilon},2-{\varepsilon}], the semicircular function ρsc\rho_{sc} is bounded away from zero. Thus, Theorem 1.7 ensures that the eigenvalues of WnW_{n} in the bulk tend to have a mean spacing of Θε(1/n)\Theta_{\varepsilon}(1/n) on the average. Applying the Cauchy interlacing law

where Wn1W_{n-1} is an n1×n1n-1\times n-1 minor of WnW_{n}, this implies that the bulk eigenvalues of Wn1W_{n-1} are within Θε(1/n)\Theta_{\varepsilon}(1/n) of the corresponding eigenvalues of WnW_{n} on the average. Using linear algebra to express the coordinates of the eigenvector ui(Mn)u_{i}(M_{n}) in terms of WnW_{n} and a minor Wn1W_{n-1} (see Lemma 4.1 below), and using some concentration of measure results concerning the projection of a random vector to a subspace (see Lemma 2.1), we eventually obtain Proposition 1.8.

9. Universality up to the edge

The main results of this paper are that the above four theorems can be extended to the edge of the spectrum (thus effectively sending ε{\varepsilon} to zero). Let us now state these results more precisely. Firstly, we have the following extension of Theorem 1.7:

Consider a random Hermitian matrix Mn=(ζij)1i,jnM_{n}=(\zeta_{ij})_{1\leq i,j\leq n} whose upper-triangular entries are independent with mean zero and variance 11, and such that ζijK|\zeta_{ij}|\leq K almost surely for all i,ji,j and some K1K\geq 1. Let 0<δ<1/20<\delta<1/2 be a quantity which can depend on nn, and let II be an interval such that

We also make the mild assumption K=o(n1/2δ2)K=o(n^{1/2}\delta^{2}). Then the number of eigenvalues NIN_{I} of Wn:=1nMnW_{n}:=\frac{1}{\sqrt{n}}M_{n} in II obeys the concentration estimate

The powers of KK, δ\delta and logn\log n here are probably not best possible, but this will not be relevant for our purposes. In our applications KK will be a power of logn\log n, and δ\delta will be a negative power of logn\log n. (This allows the error term O(δnI)O(\delta n|I|) in the above estimate for NIN_{I} to exceed the main term nIρsc(x) dxn\int_{I}\rho_{sc}(x)\ dx when one is very near the edge, but this will not impact our arguments.)

We prove this theorem in Section 3, using the same (standard) Stieltjes transform method that was used to prove Theorem 1.7 in (see also ), with a somewhat more careful analysis. We next use it to obtain the following extension of Proposition 1.8:

Let MnM_{n} be a random matrix obeying condition C0. Then with overwhelming probability, every unit eigenvector ui(Mn)u_{i}(M_{n}) of MnM_{n} has coefficients at most n1/2logO(1)nn^{-1/2}\log^{O(1)}n, thus

where e1,,ene_{1},\ldots,e_{n} is the standard basis.

The deduction of Proposition 1.12 from Theorem 1.10 differs significantly from the analogous deduction of Proposition 1.8 in Theorem 1.7 in . The main difference is that in the current case ρsc\rho_{sc} is no longer bounded away from zero, which causes the average eigenvalue spacing between λi(Wn)\lambda_{i}(W_{n}) and λi+1(Wn)\lambda_{i+1}(W_{n}) to be considerably larger than 1/n1/n. For instance, the gap between the second largest eigenvalue λn1(Wn)\lambda_{n-1}(W_{n}) and the largest eigenvalue λn(Wn)\lambda_{n}(W_{n}) is typically of size n2/3n^{-2/3}.

The key new ingredient that help us to deal with this problem is the following observation: the Cauchy interlacing law (5), when applied to the eigenvalues of the edge, is strongly bias. In particular, the gap between λi(Wn1)\lambda_{i}(W_{n-1}) and λi(Wn)\lambda_{i}(W_{n}) is significantly smaller than the gap between λi(Wn1)\lambda_{i}(W_{n-1}) and λi+1(Wn)\lambda_{i+1}(W_{n}). We can show that (with high probability), the first gap is of order n1+o(1)n^{-1+o(1)} while the second can be as large as n2/3n^{-2/3} (and similarly for the gap between λi+1(Wn)\lambda_{i+1}(W_{n}) and λi(Wn1)\lambda_{i}(W_{n-1}) when n/2inn/2\leq i\leq n). This new ingredient will be sufficient to recover Proposition 1.12; see Section 4, where the above proposition is proved.

Using Theorem 1.10 and Proposition 1.12, one can continue the arguments from to establish the following extensions of Theorem 1.5 and Theorem 1.6:

Let MnM_{n} be a random matrix obeying Condition C0. Set An:=nMnA_{n}:=\sqrt{n}M_{n}. Then for every c0>0c_{0}>0, and for nn sufficiently large depending on ε{\varepsilon}, c0c_{0} and the constants C,CC,C^{\prime} in Definition 1.2, and for each 1in1\leq i\leq n, one has λi+1(An)λi(An)nc0\lambda_{i+1}(A_{n})-\lambda_{i}(A_{n})\geq n^{-c_{0}} with high probability, uniformly in ii.

The novelty here is that we have no assumption on the indices iji_{j} and ii. We present the proof of these Theorems in Sections 5, 6, following the arguments in closely.

15. Applications

As Theorems 1.13, 1.14 extend Theorems 1.5, 1.6, all the applications of the latter theorems in (concerning the bulk of the spectrum) can also be viewed as applications of these theorems. But because these results extend all the way to the edge, we can now obtain some results on the edge of the spectrum as well. For instance, we can prove

Let kk be a fixed integer and MnM_{n} be a matrix obeying Condition C0, and suppose that the real and imaginary part of the atom variables have the same covariance matrix as the GUE ensemble (i.e. both components have variance 1/21/2, and have covariance ). Assume furthermore that all third moments of the atom variables vanish. Set Wn:=1nMnW_{n}:=\frac{1}{\sqrt{n}}M_{n}. Then the joint distribution of the kk dimensional random vector

has a weak limit as nn\rightarrow\infty, which coincides with that in the GUE case (in particular, the limit for k=1k=1 is the GUE Tracy-Widom distribution, and for higher kk is controlled by the Airy kernel ). The result also holds for the smallest eigenvalues λ1,,λk\lambda_{1},\dots,\lambda_{k}, with the offset 2-2 replaced by +2+2.

If the atom variables have the same covariance matrix as the GOE ensemble (i.e. they are real with variance 11 off the diagonal, and 22 on the diagonal), instead of the GUE ensemble, then the same conclusion applies but with the GUE distribution replaced of course by the GOE distribution (see for the k=1k=1 case).

This result was previously established by Soshnikov (see also , ) in the case when MnM_{n} is a Wigner Hermitian matrix (i.e. the off-diagonal entries are iid, and the matrix matches GUE to second order at least) with symmetric distribution (which implies, but is stronger than, matching to third order). For some additional partial results in the non-symmetric case see , . The exponential decay condition in Soshnikov’s result has been lowered to a finite number of moments; see , . It is reasonable to conjecture that the exponential decay conditions in this current paper can similarly be lowered, but we will not pursue this issue here. It also seems plausible that the third moment matching conditions could be dropped, though this is barely beyond the reach of the current methodNote added in proof: the third moment condition has recently been dropped in , by combining the four moment theorem here with a new proof of universality for the distribution of the largest eigenvalue for gauss divisible matrices..

We just prove the claim for the largest kk eigenvalues and for GUE, as the claim for the smallest kk and/or GOE is similar.

only changes by o(1)o(1) when the matrix MnM_{n} is replaced with GUE. But this follows from the final conclusion of Theorem 1.13, thanks to the extra factor n1/3n^{-1/3}. ∎

Notice that there is some room to spare in this argument, as the n1/3n^{-1/3} gain in (8) is far more than is needed for (6). Because of this, one can obtain similar universality results for suitably normalised eigenvalues λi(An)\lambda_{i}(A_{n}) with in1εi\leq n^{1-{\varepsilon}} or inn1εi\geq n-n^{1-{\varepsilon}} for any ε>0{\varepsilon}>0 (where the normalisation factor for λi(An)\lambda_{i}(A_{n}) is now n2/3min(i,ni)1/3n^{2/3}\min(i,n-i)^{1/3}, and the offset 2-2 is replaced by t-t, where 2tρsc(x) dx=in\int_{-2}^{t}\rho_{sc}(x)\ dx=\frac{i}{n}). We omit the details.

In analogy with , one should be able to drop the third moment condition in Theorem 1.16 if one can control the distribution of the largest (or smallest) eigenvalues from random matrices obtained from a suitable Ornstein-Uhlenbeck process, as in .

19. Notation

We consider nn as an asymptotic parameter tending to infinity. We use XYX\ll Y, YXY\gg X, Y=Ω(X)Y=\Omega(X), or X=O(Y)X=O(Y) to denote the bound XCYX\leq CY for all sufficiently large nn and for some constant CC. Notations such as XkY,X=Ok(Y)X\ll_{k}Y,X=O_{k}(Y) mean that the hidden constant CC depend on another constant kk. X=o(Y)X=o(Y) or Y=ω(X)Y=\omega(X) means that X/Y0X/Y\rightarrow 0 as nn\rightarrow\infty; the rate of decay here will be allowed to depend on other parameters. We write X=Θ(Y)X=\Theta(Y) for YXYY\ll X\ll Y.

Eigenvalues are always ordered in increasing order, thus for instance λn(An\lambda_{n}(A_{n}) is the largest eigenvalue of a Hermitian matrix AnA_{n}, and λ1(An)\lambda_{1}(A_{n}) is the smallest.

20. Acknowledgements

The authors thank the anonymous referee for helpful comments and references, and Horng-Tzer Yau for additional references.

General tools

In this section we record some general tools (proven in ) which we will use repeatedly in the sequel. We begin with a very useful concentration of measure result that describes the projection of a random vector to a subspace.

The same conclusion holds (with 1010 replaced by another explicit constant) if one of the entries ξj\xi_{j} of XX is assumed to have variance cc instead of 11, for some absolute constant c>0c>0.

See [27, Lemma 40]. (The main tool in the proof is Talagrand’s concentration inequality.) It is clear from the triangle inequality that the modification of variance in a single entry does not significantly affect the conclusion except for constants. ∎

Next, we record a crude but useful upper bound on the number of eigenvalues in a short interval.

with overwhelming probability, where NIN_{I} is the number of eigenvalues of WnW_{n} in II.

See [27, Proposition 62]. (The main tools in the proof are the Stieltjes transform method, Lemma 3.3 below, and Lemma 2.1.) Again, the generalisation to variances other than 11 on the diagonal do not cause significant changes to the argument. ∎

Finally, we recall a Berry-Esséen type theorem:

for some σ>0\sigma>0. Let ζ1,,ζn\zeta_{1},\ldots,\zeta_{n} be independent complex random variables with mean zero, variance Eζj2{\mathbf{E}}|\zeta_{j}|^{2} equal to 11, and obeying Eζi3C{\mathbf{E}}|\zeta_{i}|^{3}\leq C for some C1C\geq 1. For each 1iN1\leq i\leq N, let SiS_{i} be the complex random variable

(Upper tail bound on SiS_{i}) For t1t\geq 1, we have P(Sit)exp(ct2)+Cσ{\mathbf{P}}(|S_{i}|\geq t)\ll\exp(-ct^{2})+C\sigma for some absolute constant c>0c>0.

(Lower tail bound on S\vec{S}) For any tNt\leq\sqrt{N}, one has P(St)O(t/N)N/4+CN4t3σ{\mathbf{P}}(|\vec{S}|\leq t)\ll O(t/\sqrt{N})^{\lfloor N/4\rfloor}+CN^{4}t^{-3}\sigma.

The same claim holds if one of the ζi\zeta_{i} is assumed to have variance cc instead of 11 for some absolute constant c>0c>0.

See [27, Theorem 41]. Again, the modification of the variance on a single entry can be easily seen to have no substantial effect on the conclusion. ∎

Asymptotics for the ESD

In this section we prove Theorem 1.10, using the Stieltjes transform method (see for a general discussion of this method). We may assume throughout that nn is large, since the claim is vacuous for nn small.

It is known by the moment method (see e.g. or ) that with overwhelming probability, all eigenvalues of WnW_{n} have magnitude at most 2+o(1)2+o(1). Because of this, we may restrict attention to the case when II lies in interval $$ (say).

We recall the Stieltjes transform sn(z)s_{n}(z) of a Hermitian matrix WnW_{n}, defined for complex zz by the formula

We also introduce the semicircular counterpart

which by a standard contour integral computation can be given explicitly as

where we use the branch of the square root of z24z^{2}-4 with cut at $whichisasymptotictowhich is asymptotic toz$ at infinity.

It is well known that one can control the empirical spectral distribution NIN_{I} via the Stieltjes transform. We will use the following formalization of this principle:

There is a positive constant CC such that the following holds for any Hermitian matrix WnW_{n}. Let 1/10η1/n1/10\geq\eta\geq 1/n and L,ε,δ>0L,{\varepsilon},\delta>0. Suppose that one has the bound

with (uniformly) overwhelming probability for all zz with Re(z)L|{\operatorname{Re}}(z)|\leq L and Im(z)η{\operatorname{Im}}(z)\geq\eta. Then for any interval II in [L+ε,Lε][-L+{\varepsilon},L-{\varepsilon}] with Imax(2η,ηδlog1δ)|I|\geq\max(2\eta,\frac{\eta}{\delta}\log\frac{1}{\delta}), one has

with overwhelming probability, where NIN_{I} is the number of eigenvalues of WnW_{n} in II.

As a consequence of this lemma (with L=4L=4 and ε=1{\varepsilon}=1, say), we see that Theorem 1.10 follows from

Consider a random Hermitian matrix Mn=(ζij)1i,jnM_{n}=(\zeta_{ij})_{1\leq i,j\leq n} whose upper-triangular entries are independent with mean zero and variance 11, with variance cc on the diagonal for some absolute constant c>0c>0, and such that ζijK|\zeta_{ij}|\leq K almost surely for all i,ji,j and some K1K\geq 1. Set Wn:=1nMnW_{n}:=\frac{1}{\sqrt{n}}M_{n}. Let 0<δ<1/20<\delta<1/2 (which can depend on nn), and suppose that K=o(n1/2δ2)K=o(n^{1/2}\delta^{2}). Then (12) holds with (uniformly) overwhelming probability for all zz with Re(z)4|{\operatorname{Re}}(z)|\leq 4 and

The remainder of this section is devoted to proving Theorem 3.2. Fix zz as in Theorem 3.2, thus Re(z)4|{\operatorname{Re}}(z)|\leq 4 and Im(z)=η{\operatorname{Im}}(z)=\eta, where

Our objective is to show (12) with (uniformly) overwhelming probability.

As in previous works (in particular, ), the key is to exploit the fact that when Imz>0{\operatorname{Im}}z>0, s(z)s(z) is the unique solution to the equation

with Ims(z)>0{\operatorname{Im}}s(z)>0; this is immediate from (11).

We now seek a similar relation for sns_{n}. Note that Imsn(z)>0{\operatorname{Im}}s_{n}(z)>0 by (10). We use the following standard matrix identity (cf. [27, Lemma 39], or [2, Chapter 11]):

Wn,kW_{n,k} is the matrix WnW_{n} with the kthk^{th} row and column removed, and aka_{k} is the kthk^{th} row of WnW_{n} with the kthk^{th} element removed.

By Schur’s complement, 1ζkkzak(WkzI)1ak\frac{1}{\zeta_{kk}-z-a_{k}^{*}(W_{k}-zI)^{-1}a_{k}} is the kthk^{th} diagonal entry of (WzI)1(W-zI)^{-1}. Taking traces, one obtains the claim. ∎

For each 1kn1\leq k\leq n, one has Yk=sn(z)+o(δ2)Y_{k}=s_{n}(z)+o(\delta^{2}) with overwhelming probability.

The entries of aka_{k} are independent of each other and of Wn,kW_{n,k}, and have mean zero and variance 1n\frac{1}{n}. By linearity of expectation we thus have, on conditioning on Wn,kW_{n,k}

is the Stieltjes transform of Wn,kW_{n,k}. From the Cauchy interlacing law (5) and (13), we have

and so it will remain to show the concentration estimate

Rewriting YkY_{k}, it suffices to show that

with overwhelming probability, where Rj:=uj(Wn,k)ak21/nR_{j}:=|u_{j}(W_{n,k})^{*}a_{k}|^{2}-1/n.

where HH is the space spanned by the uj(Wn,k)u_{j}(W_{n,k})^{*} for iji+i_{-}\leq j\leq i_{+}. From Lemma 2.1 and the union bound, we conclude that with overwhelming probability

By the triangle inequality, this implies that

and hence by a further application of the triangle inequality

The plan is to use (17) and (18) to establish (16). Accordingly, we split the LHS of (16), into several subsums according to the distance λjx|\lambda_{j}-x|. Lemma 2.2 provides a sharp estimate on the number of terms of each subsum which will allow us to obtain a good upper bound on the absolute value.

We turn to the details. From (13) we can choose two auxiliary parameters 0<δ,α<10<\delta^{\prime},\alpha<1 such that

Indeed, one could set δ:=δ2log0.01n\delta^{\prime}:=\delta^{2}\log^{-0.01}n and α:=δ2log1.01n\alpha:=\delta^{2}\log^{-1.01}n and use (13).

Fix such parameters, and consider the contribution to (16) of the indices jj for which

By Lemma 2.2 and (19), the interval of jj for which this occurs has cardinality O(δηn)O(\delta^{\prime}\eta n) (with overwhelming probability). On this interval, the quantity 1λj(Wn,k)(x+1η)\frac{1}{\lambda_{j}(W_{n,k})-(x+\sqrt{-1}\eta)} has magnitude O(1η)O(\frac{1}{\eta}). Applying (18) (and (19)), we see that the contribution of this case is thus

Next, we consider the contribution to (16) of those indices jj for which

for some integer 0llogn/α0\leq l\ll\log n/\alpha, and then sum over ll. By Lemma 2.2 and (19), the set of jj for which this occurs is contained (with overwhelming probability) in at most two intervals of cardinality O((1+α)lαδηn)O((1+\alpha)^{l}\alpha\delta^{\prime}\eta n). On each of these intervals, the quantity 1λj(Wn,k)(x+1η)\frac{1}{\lambda_{j}(W_{n,k})-(x+\sqrt{-1}\eta)} has magnitude O(1(1+α)lδη)O\left(\frac{1}{(1+\alpha)^{l}\delta^{\prime}\eta}\right) and fluctuates by O(α(1+α)lδη)O\left(\frac{\alpha}{(1+\alpha)^{l}\delta^{\prime}\eta}\right). Applying (17), (18) (and noting that (1+α)lαδηn(1+\alpha)^{l}\alpha\delta^{\prime}\eta n exceeds K2log2nK^{2}\log^{2}n, by (19)) we see that the contribution of a single ll to (16) is at most

We now conclude the proof of Theorem 1.10. By hypothesis,

almost surely. Inserting these bounds into (15), we see that with overwhelming probability

By the triangle inequality (and square rooting the o()o() decay), we can assume that either the error term o(δ2)o(\delta^{2}) is o(δ2sn(z)+z)o(\delta^{2}|s_{n}(z)+z|), or that sn(z)+z|s_{n}(z)+z| is o(1)o(1). Suppose the former holds. Then by Taylor expansion

If we assume z10|z|\leq 10 (say), we conclude that sn(z)100|s_{n}(z)|\leq 100. Multiplying out by sn(z)+zs_{n}(z)+z and rearranging, we obtain

(treating the case when z24=o(δ2)z^{2}-4=o(\delta^{2}) separately).

To summarise, we have shown (with overwhelming probability) that in the region

that one either has sn(z)=s(z)+o(δ)s_{n}(z)=s(z)+o(\delta), sn(z)=zs(z)+o(1)=s(z)z24+o(1)s_{n}(z)=-z-s(z)+o(1)=s(z)-\sqrt{z^{2}-4}+o(1), or sn(z)+z=o(1)|s_{n}(z)+z|=o(1). It is not hard to see that the first two cases are disconnected from the third (for nn large enough) in this region, because s(z)=1s(z)+zs(z)=\frac{-1}{s(z)+z} is bounded away from zero, as is s(z)+z=1s(z)s(z)+z=\frac{-1}{s(z)}. Furthermore, the first and second possibilities are also disconnected from each other except when z24=o(δ2)z^{2}-4=o(\delta^{2}). Also, the second and third possibilities can only hold for Im(z)=o(1){\operatorname{Im}}(z)=o(1) since sn(z)s_{n}(z) and zz both have positive real part. A continuity argument thus shows that the first possibility must hold throughout the region except when z24=o(δ2)z^{2}-4=o(\delta^{2}), in which case either the first or second possibility can hold; but in that region, the first and second possibility are equivalent, and (12) follows. The proof of Theorem 1.10 is now complete.

Delocalization of eigenvectors

Without loss of generalization, we can assume that the entries are continuously distributed. Having established Theorem 1.10, we now use this theorem to establish Proposition 1.12.

Let MnM_{n} obey condition C0. Then by Markov’s inequality, one has ζijlogO(1)n|\zeta_{ij}|\ll\log^{O(1)}n with overwhelming probability (here and in the sequel we allow implied constants in the O()O() notation to depend on the constants C,CC,C^{\prime} in (1)). By conditioning the ζij\zeta_{ij} to this eventStrictly speaking, this distorts the mean and variance of ζij\zeta_{ij} by an exponentially small amount, but one can easily check that this does not significantly impact any of the arguments in this section., we may thus assume that

almost surely for some K=O(logO(1)n)K=O(\log^{O(1)}n).

Fix 1in1\leq i\leq n; by symmetry we may take in/2i\geq n/2. By the union bound and another application of symmetry, it suffices to show that

To compute ui(Mn)e1u_{i}(M_{n})^{*}e_{1} we use the following identity from (see also [27, Lemma 38]):

where uj(An1)u_{j}(A_{n-1}) is a unit eigenvector corresponding to the eigenvalue λj(An1)\lambda_{j}(A_{n-1}).

By subtracting λi(A)I\lambda_{i}(A)I from AA we may assume λi(A)=0\lambda_{i}(A)=0. The eigenvector equation then gives

Since v2+x2=1\|v^{\prime}\|^{2}+|x|^{2}=1, we conclude

Since An11X2=j=1n1(λj(An1))2uj(An1)X2\|A_{n-1}^{-1}X\|^{2}=\sum_{j=1}^{n-1}(\lambda_{j}(A_{n-1}))^{-2}|u_{j}(A_{n-1})^{*}X|^{2}, the claim follows. ∎

Let Mn1M_{n-1} be the bottom right n1×n1n-1\times n-1 minor of MnM_{n}. As we are assuming that the coefficients of MnM_{n} are continuously distributed, we see almost surely that none of the eigenvalues of Mn1M_{n-1} are equal to λi(Mn)\lambda_{i}(M_{n}). We may thus apply Lemma 4.1 and conclude that

where XX is the bottom left n1×1n-1\times 1 vector of MnM_{n} (and thus has entries ζj1\zeta_{j1} for 1<jn1<j\leq n). It thus suffices to show that

It will be convenient to eliminate the exponent 22 in the denominator, as follows. From Lemma 2.1, one has uj(Mn1)XlogO(1)n|u_{j}(M_{n-1})^{*}X|\ll\log^{O(1)}n with overwhelming probability for each jj (and hence for all jj, by the union bound). It thus suffices to show that

with overwhelming probability. By the Cauchy-Schwarz inequality, it thus suffices to show that

with overwhelming probability for some 1T,T+logO(1)n1\leq T_{-},T_{+}\ll\log^{O(1)}n. It is convenient to work with the normalized matrix Wn:=1nMnW_{n}:=\frac{1}{\sqrt{n}}M_{n}, thus we need to show

with overwhelming probability for some 1T,T+logO(1)n1\leq T_{-},T_{+}\ll\log^{O(1)}n, where Y:=1nXY:=\frac{1}{\sqrt{n}}X has entries 1nζj1\frac{1}{\sqrt{n}}\zeta_{j1} for 1<jn1<j\leq n.

There are two cases: the bulk case and the edge case; the former was already treated in , but the latter is new.

Suppose that n/2i<0.999nn/2\leq i<0.999n. Then from the semicircular law (or Theorem 1.10) we see that λi(Wn)[2+ε,2+ε]\lambda_{i}(W_{n})\in[-2+{\varepsilon},2+{\varepsilon}] with overwhelming probability for some absolute constant ε>0{\varepsilon}>0. Let AA be a large constant to be chosen later. A further application of Theorem 1.10 then shows that there is an interval II of length logAn/n\log^{A}n/n centered at λi(Wn)\lambda_{i}(W_{n}) which contains Θ(logAn)\Theta(\log^{A}n) eigenvalues of WnW_{n}. If λj(Wn),λj+1(Wn)\lambda_{j}(W_{n}),\lambda_{j+1}(W_{n}) lie in II, then by the Cauchy interlacing property (5), λj(Wn1)λi(Wn)logAn/n|\lambda_{j}(W_{n-1})-\lambda_{i}(W_{n})|\ll\log^{A}n/n. One can thus lower bound the left-hand side of (21) (for suitable values of TT) by

One can rewrite this as logAnπHX2\log^{-A}n\|\pi_{H}X\|^{2}, where HH is the span of the uj(Wn1)u_{j}(W_{n-1}) for λj(Wn),λj+1(Wn)I\lambda_{j}(W_{n}),\lambda_{j+1}(W_{n})\in I. The claim then follows from Lemma 2.1 (for AA large enough).

3. The edge case

We now turn to the more interesting edge case when 0.999nin0.999n\leq i\leq n. Using the semicircular law, we now see that

Next, we can exploit the following identity:

[27, Lemma 37] If uj(Wn1)Xu_{j}(W_{n-1})^{*}X is non-zero for every jj, then

By diagonalising Wn1W_{n-1} (noting that this does not affect either side of (23)), we may assume that Wn1=diag(λ1(Wn1),,λn1(Wn1))W_{n-1}=\operatorname{diag}(\lambda_{1}(W_{n-1}),\ldots,\lambda_{n-1}(W_{n-1})) and uj(Wn1)=eju_{j}(W_{n-1})=e_{j} for j=1,,n1j=1,\ldots,n-1. One then easily verifies that the characteristic polynomial det(WnλI)\det(W_{n}-\lambda I) of WnW_{n} is equal to

when λ\lambda is distinct from λ1(Wn1),,λn1(Wn1)\lambda_{1}(W_{n-1}),\ldots,\lambda_{n-1}(W_{n-1}). Since uj(Wn1)Xu_{j}(W_{n-1})^{*}X is non-zero by hypothesis, we see that this polynomial does not vanish at any of the λj(Wn1)\lambda_{j}(W_{n-1}). Substituting λi(Wn)\lambda_{i}(W_{n}) for λ\lambda, we obtain (23). ∎

Again, the continuity of the entries of MnM_{n} ensure that the hypothesis of Lemma 4.4 is obeyed almost surely. From (20), (22), (23) one has

with overwhelming probability, so to show (21), it will suffice by the triangle inequality to show that

(say) with overwhelming probability for some T=logO(1)nT=\log^{O(1)}n.

Let A>100A>100 be a large constant to be chosen later. By Theorem 1.10, we see (if AA is large enough) that

with overwhelming probability for any interval II of length I=logAn/n|I|=\log^{A}n/n, where αI:=1IIρsc(x) dx\alpha_{I}:=\frac{1}{|I|}\int_{I}\rho_{sc}(x)\ dx. For any such interval, we see from Lemma 2.1 (and Cauchy interlacing (5)) that with overwhelming probability

Set dI:=dist(λi(Wn),I)Id_{I}:=\frac{\operatorname{dist}(\lambda_{i}(W_{n}),I)}{|I|}. If dIlognd_{I}\geq\log n (say), then

We now partition the real line into intervals II of length logAn/n\log^{A}n/n, and sum (26) over all II with dIlognd_{I}\geq\log n. Bounding αI\alpha_{I} crudely by O(1)O(1), we see that IO(αIdI2)=O(1logn)=o(1)\sum_{I}O\left(\frac{\alpha_{I}}{d_{I}^{2}}\right)=O\left(\frac{1}{\log n}\right)=o(1). Similarly, one has

if AA is large enough. Finally, Riemann integration of the principal value integral

The operator norm of WnW_{n} is 2+o(1)2+o(1) with overwhelming probability (see e.g. , ), so λi(Wn)2+o(1)|\lambda_{i}(W_{n})|\leq 2+o(1). Using the formula (11) for the Stieltjes transform, one obtains from residue calculus that

for λi(Wn)2|\lambda_{i}(W_{n})|\leq 2, with the right-hand side replaced by λi(Wn)/2+λi(Wn)24/2-\lambda_{i}(W_{n})/2+\sqrt{\lambda_{i}(W_{n})^{2}-4}/2 for λi(Wn)>2|\lambda_{i}(W_{n})|>2. In either event, we have

The intervals II with dI<lognd_{I}<\log n will contribute at most logA+O(1)n\log^{A+O(1)}n eigenvalues, by (25) (and Cauchy interlacing (5)). The claim (24) now follows by setting TT_{-} and T+T_{+} appropriately. The proof of Proposition 1.12 is now complete.

with overwhelming probability for all n/2inn/2\leq i\leq n, and similarly one has

with overwhelming probability for all 1in/21\leq i\leq n/2. On the other hand, according to the Tracy-Widom law, the gap between λn(Wn)\lambda_{n}(W_{n}) and λn1(Wn)\lambda_{n-1}(W_{n}) (or between λ1(Wn)\lambda_{1}(W_{n}) and λ2(Wn)\lambda_{2}(W_{n})) can be expected to be as large as n2/3n^{-2/3}. Thus we see that there is a significant bias at the edge in the interlacing law (5), which can ultimately be traced to the imbalance of “forces” in the interlacing identity (23) at that edge.

Lower bound on eigenvalue gap

We now give the proof of Theorem 1.14. Most of the proof will follow closely the proof of Theorem 1.6 in , so we shall focus on the changes needed to that argument. As such, this section will assume substantial familiarity with the material from , and will cite from it repeatedly. Similarly for the next section.

For technical reasons relating to an induction argument, it turns out that one has to treat the extreme cases i=1,ni=1,n separately:

Theorem 1.14 is true when i=1i=1 or i=ni=n.

By symmetry it suffices to do this for i=ni=n. By a limiting argument we may assume that the entries ζij\zeta_{ij} of MnM_{n} are continuously distributed. From Lemma 4.4 one has (almost surely) that

Recall that λn(Wn)=2+o(1)\lambda_{n}(W_{n})=2+o(1) with overwhelming probability; also, 1nζnn=o(1)\frac{1}{\sqrt{n}}\zeta_{nn}=o(1) with overwhelming probability. As all the terms in the left-hand side have the same sign, we conclude that

From Theorem 2.3 and Proposition 1.12, we have un1(Wn1)Xnc0/10|u_{n-1}(W_{n-1})^{*}X|\geq n^{-c_{0}/10} (say) with high probability, and so

with high probability. The claim now follows from the Cauchy interlacing property (5). ∎

In fact, at the edge, one should be able to improve the lower bound on the eigenvalue gap substantially, from nc0n^{-c_{0}} to n1/3c0n^{1/3-c_{0}}, in accordance to the Tracy-Widom law, but we will not need to do so here.

Now we handle the general case of Theorem 1.14. Fix MnM_{n} and c0c_{0}. We write n0,i0n_{0},i_{0} for i,ni,n, thus 1i0n01\leq i_{0}\leq n_{0} and our task is to show that

with high probability. By Proposition 5.1 we may assume 1<i0<n01<i_{0}<n_{0}. We may also assume n0n_{0} to be large, as the claim is vacuous otherwise. As in previous sections, we may truncate so that all coefficients ζij\zeta_{ij} are of size O(logO(1)n0)O(\log^{O(1)}n_{0}) (as before, the exponentially small corrections to the mean and variance of ζij\zeta_{ij} caused by this are easily controlled), and approximate so that the distribution is continuous rather than discrete.

For each n0/2nn0n_{0}/2\leq n\leq n_{0}, let AnA_{n} be the top left n×nn\times n minor of An0A_{n_{0}}. As in [27, Section 3.4], we introduce the regularized gap

for all n0/2nn0n_{0}/2\leq n\leq n_{0} and 1il<in1\leq i-l<i\leq n, where C1C_{1} is a large constant to be chosen later. It will suffice to show that for each 1<i0<n01<i_{0}<n_{0}, that

with high probability. By symmetry we may assume that n0/2i0<n0n_{0}/2\leq i_{0}<n_{0}.

We will need two key lemmas. First, we have the following deterministic lemma (a minor variant of [27, Lemma 47]), showing that a narrow gap can be propagated backwards in nn unless one of a small number of exceptional events happen:

Suppose that i0n+1n0i_{0}\leq n+1\leq n_{0} and ln/10l\leq n/10 is such that

for some 0<δ10<\delta\leq 1 (which can depend on nn), and that

Then one of the following statements hold:

(Macroscopic spectral concentration) There exists 1i<i+n+11\leq i_{-}<i_{+}\leq n+1 with i+ilogC1/2ni_{+}-i_{-}\geq\log^{C_{1}/2}n such that λi+(An+1)λi(An+1)δexp(log0.95n)(i+i)|\lambda_{i_{+}}(A_{n+1})-\lambda_{i_{-}}(A_{n+1})|\leq\delta\exp(\log^{0.95}n)(i_{+}-i_{-}).

(Small inner products) There exists 1ii0l<i0i+n1\leq i_{-}\leq i_{0}-l<i_{0}\leq i_{+}\leq n with i+ilogC1/2ni_{+}-i_{-}\leq\log^{C_{1}/2}n such that

(Large eigenvalue) For some 1in+11\leq i\leq n+1 one has

(Large inner product) There exists 1in1\leq i\leq n such that

(Large inner product near i0i_{0}) There exists 1in1\leq i\leq n with ii0logC1n|i-i_{0}|\leq\log^{C_{1}}n such that

The proof of this proposition repeats the proof of [27, Lemma 47] in [27, Section 6] almost exactly. Only the following changes have to be made:

We have the upper bound λi+(An+1)λi(An+1)δ(logC1n)log0.9n0\lambda_{i_{+}}(A_{n+1})-\lambda_{i_{-}}(A_{n+1})\leq\delta(\log^{C_{1}}n)^{\log^{0.9}n_{0}}, which together with (29) forces i+n+1i_{+}\neq n+1 and thus i0ni_{0}\leq n as required.

The variable jj now lies in the range 1jn1\leq j\leq n rather than εn/10j(1ε/10)n{\varepsilon}n/10\leq j\leq(1-{\varepsilon}/10)n.

ii_{--} has to be defined as max(i2k1,1)\max(i_{-}-2^{k-1},1) rather than just i2k1i_{-}-2^{k-1} (and similarly for i++i_{++}).

Next, we need the following result that asserts that the events (i)-(vii) are rare:

Suppose that n0/2n<n0n_{0}/2\leq n<n_{0} and ln/10l\leq n/10, and set δ:=n0κ\delta:=n_{0}^{-\kappa} for some sufficiently small fixed κ>0\kappa>0. Let m0m\geq 0 be such that 2mδ1/22^{m}\leq\delta^{-1/2}. Then:

The events (i), (iii), (iv), (v), (vi) in Lemma 5.3 all fail with high probability.

There is a constant CC^{\prime} such that all the coefficients of the eigenvectors uj(An)u_{j}(A_{n}) for 1jn1\leq j\leq n are of magnitude at most n1/2logCnn^{-1/2}\log^{C^{\prime}}n with overwhelming probability. Conditioning AnA_{n} to be a matrix with this property, the events (ii) and (vii) occur with a conditional probability of at most 2κm+nκ2^{-\kappa m}+n^{-\kappa}.

Furthermore, there is a constant C2C_{2} (depending on C,κ,C1C^{\prime},\kappa,C_{1}) such that if lC2l\geq C_{2} and AnA_{n} is conditioned as in (b), then (ii) and (vii) in fact occur with a conditional probability of at most 2κmlog2C1n+nκ2^{-\kappa m}\log^{-2C_{1}}n+n^{-\kappa}.

The proof of this proposition repeats the proof of [27, Proposition 49] in [27, Section 7] almost exactly. Only the following changes have to be made:

All references to [27, Theorem 56] (i.e. Theorem 1.7) need to be replaced with Theorem 1.10.

The variable jj now lies in the range 1jn1\leq j\leq n rather than εn/2j(1ε/2)n{\varepsilon}n/2\leq j\leq(1-{\varepsilon}/2)n.

Given Lemma 5.3 and Proposition 5.4, the proof of Theorem 1.14 exactly follows the proof of Theorem 1.6 in [27, Section 3.5], with the following minor changes:

In the definition of the event EnE_{n}, the range εn/2j(1ε/2)n{\varepsilon}n/2\leq j\leq(1-{\varepsilon}/2)n needs to be expanded to 1jn1\leq j\leq n.

In the definition of the event E0E_{0}, the events that (29) fail for some n0log2C1n0nn0n_{0}-\log^{2C_{1}}n_{0}\leq n\leq n_{0} have to be included; but these events occur with polynomially small probability, thanks to Proposition 5.1 and the union bound.

This concludes the proof of Theorem 1.14.

The four moment theorem

We now prove Theorem 1.13. As in [27, Section 3.3], the proof is based on two key propositions. The first proposition asserts that one can swap a single coefficient (or more precisely, two coefficients) of a (deterministic) matrix AA as long as AA obeys a certain “good configuration condition”:

There exists a positive constant C1C_{1} such that the following holds. Let k1k\geq 1 and ε1>0{\varepsilon}_{1}>0, and assume nn sufficiently large depending on these parameters. Let 1i1<<ikn1\leq i_{1}<\ldots<i_{k}\leq n. For a complex parameter zz, let A(z)A(z) be a (deterministic) family of n×nn\times n Hermitian matrices of the form

where ep,eqe_{p},e_{q} are unit vectors. We assume that for every 1jk1\leq j\leq k and every zn1/2+ε1|z|\leq n^{1/2+{\varepsilon}_{1}} whose real and imaginary parts are multiples of nC1n^{-C_{1}}, we have

(Eigenvalue separation) For any 1in1\leq i\leq n with iijnε1|i-i_{j}|\geq n^{{\varepsilon}_{1}}, we have

(Delocalization at iji_{j}) If Pij(A(z))P_{i_{j}}(A(z)) is the orthogonal projection to the eigenspace associated to λij(A(z))\lambda_{i_{j}}(A(z)), then

whenever Pij,αP_{i_{j},\alpha} is the orthogonal projection to the eigenspaces corresponding to eigenvalues λi(A(z))\lambda_{i}(A(z)) with 2αiij<2α+12^{\alpha}\leq|i-i_{j}|<2^{\alpha+1}.

We say that A(0),ep,eqA(0),e_{p},e_{q} are a good configuration for i1,,iki_{1},\ldots,i_{k} if the above properties hold. Assuming this good configuration, then we have

for all 0j50\leq j\leq 5, and ζ,ζ\zeta,\zeta^{\prime} are random variables with ζ,ζn1/2+ε1|\zeta|,|\zeta^{\prime}|\leq n^{1/2+{\varepsilon}_{1}} almost surely, which match to order rr for some r=2,3,4r=2,3,4.

If GG obeys the improved derivative bounds

for 0j50\leq j\leq 5 and some sufficiently large absolute constant CC, then we can strengthen n(r+1)/2+O(ε1)n^{-(r+1)/2+O({\varepsilon}_{1})} in (36) to n(r+1)/2ε1n^{-(r+1)/2-{\varepsilon}_{1}}.

The second proposition asserts that these good configurations occur very frequently:

The proof of this proposition repeats the proof of [27, Proposition 44] in [27, Section 5] almost exactly. Only the following changes have to be made:

All references to [27, Theorem 56] (i.e. Theorem 1.7) need to be replaced with Theorem 1.10.

All references to [27, Proposition 58] (i.e. Proposition 1.8) need to be replaced with Proposition 1.12.

The edge regions in which λi(A(z))\lambda_{i}(A(z)) do not fall inside the bulk region [(2+ε)n,(2ε)n][(-2+{\varepsilon}^{\prime})n,(2-{\varepsilon}^{\prime})n] no longer need to be treated separately, thus simplifying the last paragraph of the proof somewhat.

Given these two propositions, the proof of Theorem 1.13 repeats the proof of [27, Theorem 15] in [27, Section 3.3] almost exactly. Only the following changes have to be made:

All references to [27, Proposition 44] need to be replaced with Proposition 6.2.

The proof of Theorem 1.13 is now complete.

References