The Local Semicircle Law for a General Class of Random Matrices

Laszlo Erdos, Antti Knowles, Horng-Tzer Yau, Jun Yin

Introduction

Since the pioneering work W of Wigner in the fifties, random matrices have played a fundamental role in modelling complex systems. The basic example is the Wigner matrix ensemble, consisting of N×NN\times N symmetric or Hermitian matrices H=(hij)H=(h_{ij}) whose matrix entries are identically distributed random variables that are independent up to the symmetry constraint H=HH=H^{*}. From a physical point of view, these matrices represent Hamilton operators of disordered mean-field quantum systems, where the quantum transition rate from state ii to state jj is given by the entry hijh_{ij}.

A central problem in the theory or random matrices is to establish the local universality of the spectrum. Wigner observed that the distribution of the distances between consecutive eigenvalues (the gap distribution) in complex physical systems follows a universal pattern. The Wigner-Dyson-Gaudin-Mehta conjecture, formalized in M , states that this gap distribution is universal in the sense that it depends only on the symmetry class of the matrix, but is otherwise independent of the details of the distribution of the matrix entries. This conjecture has recently been established for all symmetry classes in a series of works ESY4 ; EYYrigi ; EKYY2 ; an alternative approach was given in TV for the special Wigner Hermitian case. The general approach of ESY4 ; EYYrigi ; EKYY2 to prove universality consists of three steps: (i) establish a local semicircle law for the density of eigenvalues; (ii) prove universality of Wigner matrices with a small Gaussian component by analysing the convergence of Dyson Brownian motion to local equilibrium; (iii) remove the small Gaussian component by comparing Green functions of Wigner ensembles with a few matching moments. For an overview of recent results and this three-step strategy, see EYBull .

For definiteness, let us focus on the case of a one-dimensional band matrix HH. A fundamental conjecture, supported by nonrigorous supersymmetric arguments as well as numerics Fy , is that the local spectral statistics of HH are governed by random matrix statistics for large WW and by Poisson statistics for small WW. This transition is in the spirit of the Anderson metal-insulator transition Fy ; Spe , and is conjectured to be sharp around the critical value W=NW=\sqrt{N}. In other words, if WNW\gg\sqrt{N}, we expect the universality results of EYY ; EYY2 ; EYYrigi to hold. In addition to a transition in the local spectral statistics, an accompanying transition is conjectured to occur in the behaviour localization length of the eigenvectors of HH, whereby in the large-WW regime they are expected to be completely delocalized and in the small-WW regime exponentially localized. The localization length for band matrices was recently investigated in great detail in EKYY3 .

Although the Wigner-Dyson-Gaudin-Mehta conjecture was originally stated for Wigner matrices, the methods of ESY4 ; EYYrigi ; EKYY2 also apply to certain ensembles with independent but not identically distributed entries, which however retain the mean-field character of Wigner matrices. More precisely, they yield universality provided the variances

of the matrix entries are only required to be of comparable size (but not necessarily equal):

for some positive constants cc and CC. (Such matrices were called generalized Wigner matrices in EYYrigi .) This condition admits a departure from spatial homogeneity, but still imposes a mean-field behaviour and hence excludes genuinely inhomogeneous models such as band matrices.

In the three-step approach to universality outlined above, the first step is to establish the semicircle law on very short scales. In the scaling of HH where its spectrum is asymptotically given by the interval $,thetypicaldistancebetweenneighbouringeigenvaluesisoforder, the typical distance between neighbouring eigenvalues is of order1/N.Thenumberofeigenvaluesinanintervaloflength. The number of eigenvalues in an interval of length\etaistypicallyoforderis typically of orderN\eta.Thus,thesmallestpossiblescaleonwhichtheempiricaldensitymaybeclosetoadeterministicdensity(inourcasethesemicirclelaw)is. Thus, the smallest possible scale on which the empirical density may be close to a deterministic density (in our case the semicircle law) is\eta\gg 1/N.Ifwecharacterizetheempiricalspectraldensityaroundanenergy. If we characterize the empirical spectral density around an energyEonscaleon scale\etabyitsStieltjestransform,by its Stieltjes transform,m_{N}(z)=N^{-1}\operatorname{Tr}(H-z)^{-1}forforz=E+i\eta,thenthelocalsemicirclelawaroundtheenergy, then the local semicircle law around the energyEandinaspectralwindowofsizeand in a spectral window of size\eta$ is essentially equivalent to

on the variances (instead of (1.1)). Here MM is a new parameter that typically satisfies MNM\ll N. (From now on, the relation ABA\ll B for two NN-dependent quantities AA and BB means that ANεBA\leqslant N^{-\varepsilon}B for some positive ε>0\varepsilon>0.) The question of the validity of the local semicircle law under the assumption (1.3) was initiated in EYY , where (1.2) was proved with an error term of order (Mη)1/2(M\eta)^{-1/2} away from the spectral edges.

The purpose of this paper is twofold. First, we prove a local semicircle law (1.2), under the variance condition (1.3), with a stronger error bound of order (Mη)1(M\eta)^{-1}, including energies EE near the spectral edge. Away from the spectral edge (and from the origin E=0E=0 if the matrix does not have a band structure), the result holds for any η1/M\eta\gg 1/M. Near the edge there is a restriction on how small η\eta can be. This restriction depends explicitly on a norm of the resolvent of the matrix of variances, S=(sij)S=(s_{ij}); we give explicit bounds on this norm for various special cases of interest.

As a corollary, we derive bounds on the eigenvalue counting function and rigidity estimates on the locations of the eigenvalues for a general class of matrices. Combined with an analysis of Dyson Brownian motion and the Green function comparison method, this yields bulk universality of the local eigenvalue statistics in a certain range of parameters, which depends on the matrix SS. In particular, we extend bulk universality, proved for generalized Wigner matrices in EYY , to a large class of matrix ensembles where the upper and lower bounds on the variances (1.1) are relaxed.

The main motivation for the generalizations in this paper is the Anderson transition for band matrices outlined above. While not optimal, our results nevertheless imply that band matrices with a sufficiently broad band plus a negligible mean-field component exhibit bulk universality: their local spectral statistics are governed by random matrix statistics. For example, the local two-point correlation functions coincide if WN33/34W\gg N^{33/34}. Although eigenvector delocalization and random matrix statistics are conjectured to occur in tandem, delocalization was actually proved in EKYY3 under more general conditions than those under which we establish random matrix statistics. In fact, the delocalization results of EKYY3 hold for a mean-field component as small as (N/W2)2/3(N/W^{2})^{2/3}, and, provided that WN4/5W\gg N^{4/5}, the mean-field component may even vanish (resulting in a genuine band matrix).

The second purpose of this paper is to provide a coherent, pedagogical, and self-contained proof of the local semicircle law. In recent years, a series of papers ESY1 ; ESY2 ; EYY ; EYY2 ; EYYrigi ; EKYY1 with gradually weaker assumptions, was published on this topic. These papers often cited and relied on the previous ones. This made it difficult for the interested reader to follow all the details of the argument. The basic strategy of our proof (that is, using resolvents and large deviation bounds) was already used in ESY1 ; ESY2 ; EYY ; EYY2 ; EYYrigi ; EKYY1 . In this paper we not only streamline the argument for generalized Wigner matrices (satisfying (1.1)), but we also obtain sharper bounds for random matrices satisfying the much weaker condition (1.3). This allows us to establish universality results for a class of ensembles beyond generalized Wigner matrices.

Our proof is self-contained and simpler than those of EYY ; EYY2 ; EYYrigi ; EKYY1 . In particular, we give a proof of the Fluctuation Averaging Theorem, Theorems 4.6 and 4.7 below, which is considerably simpler than that of its predecessors in EYY2 ; EYYrigi ; EKYY1 . In addition, we consistently use fluctuation averaging at several key steps of the main argument, which allows us to shorten the proof and relax previous assumptions on the variances sijs_{ij}. The reader who is mainly interested in the pedagogical presentation should focus on the simplest choice of SS, sij=1/Ns_{ij}=1/N, which corresponds to the standard Wigner matrix (for which M=NM=N), and focus on Sections 2, 4, 5, and 6, as well as Appendix B.

We conclude this section with an outline of the paper. In Section 2 we define the model, introduce basic definitions, and state the local semicircle law in full generality (Theorem 2.3). Section 3 is devoted to some examples of random matrix models that satisfy our assumptions; for each example we give explicit bounds on the spectral domain on which the local semicircle law holds. Sections 4, 5, and 6 are devoted to the proof of the local semicircle law. Section 4 collects the basic tools that will be used throughout the proof. The purpose of Section 5 is mainly pedagogical; in it, we state and prove a weaker form of the local semicircle law, Theorem 5.1. The error bounds in Theorem 5.1 are identical to those of Theorem 2.3, but the spectral domain on which they hold is smaller. Provided one stays away from the spectral edge, Theorems 5.1 and 2.3 are equivalent; near the edge, Theorem 2.3 is stronger. The proof of Theorem 5.1 is very short and contains several key ideas from the proof of Theorem 2.3. The expert reader may therefore want to skip Section 5, but for the reader looking for a pedagogical presentation we recommend first focusing on Sections 4 and 5 (along with Appendix B). The full proof of our main result, Theorem 2.3, is given in Section 6. In Sections 7 and 8 we draw consequences from Theorem 2.3. In Section 7 we derive estimates on the density of states and the rigidity of the eigenvalue locations. In Section 8 we state and prove the universality of the local spectral statistics in the bulk, and give applications to some concrete matrix models. In Appendix A we derive explicit bounds on relevant norms of the resolvent of SS (denoted by the abstract control parameters Γ~\widetilde{\Gamma} and Γ\Gamma), which are used to define the domains of applicability of Theorems 2.3 and 5.1. Finally, Appendix B is devoted to the proof of the fluctuation averaging estimates, Theorems 4.6 and 4.7.

We use CC to denote a generic large positive constant, which may depend on some fixed parameters and whose value may change from one expression to the next. Similarly, we use cc to denote a generic small positive constant.

Definitions and the main result

for all ii and jj. We regard NN as the fundamental parameter of our model, and MM as a function of NN. We introduce the N×NN\times N symmetric matrix SSN=(sij)i,j=1NS\equiv S_{N}=(s_{ij})_{i,j=1}^{N}. We assume that SS is (doubly) stochastic:

for all ii. For simplicity, we assume that SS is irreducible, so that 1 is a simple eigenvalue. (The case of non-irreducible SS may be trivially dealt with by considering its irreducible components separately.) We shall always assume the bounds

It is sometimes convenient to use the normalized entries

for all NN, ii, and jj. We make this assumption to streamline notation in the statements of results such as Theorem 2.3 and the proofs. In fact, our results (and our proof) also cover the case where (2.6) holds for some finite large pp; see Remark 2.4.

without further comment, and always assume that η>0\eta>0. Wigner semicircle law ϱ\varrho and its Stieltjes transform mm are defined by

To avoid confusion, we remark that mm was denoted by mscm_{sc} in the papers ESY1 ; ESY2 ; ESY4 ; ESYY ; EYY ; EYY2 ; EYYrigi ; EKYY1 ; EKYY2 , in which mm had a different meaning from (2.7). It is well known that the Stieltjes transform mm is the unique solution of

satisfying Imm(z)>0\operatorname{Im}m(z)>0 for Imz>0\operatorname{Im}z>0. Thus we have

Some basic estimates on mm are collected in Lemma 4.3 below.

the norm of (1m(z)2S)1(1-m(z)^{2}S)^{-1} restricted to the subspace orthogonal to the constants. Clearly, Γ~(z)Γ(z)\widetilde{\Gamma}(z)\leqslant\Gamma(z). Basic estimates on Γ\Gamma and Γ~\widetilde{\Gamma} are collected in Proposition A.2 below. Many estimates in this paper depend critically on Γ\Gamma and Γ~\widetilde{\Gamma}. Indeed, these parameters quantify the stability of certain self-consistent equations that underlie our proof. However, Γ\Gamma and Γ~\widetilde{\Gamma} remain bounded (up to a factor logN\log N) provided E=RezE=\operatorname{Re}z is separated from the set {2,0,2}\{-2,0,2\}; for band matrices (see Example 3.2) it suffices that EE be separated from the spectral edges {2,2}\{-2,2\}; see Appendix A. At a first reading, we recommend that the reader neglect Γ\Gamma and Γ~\widetilde{\Gamma} (i.e. replace them with a constant). For band matrices, this amounts to focusing on the local semicircle law in the bulk of the spectrum.

We define the resolvent or Green function of HH through

and denote its entries by Gij(z)G_{ij}(z). The Stieltjes transform of the empirical spectral measure of HH is

The following definition introduces a notion of a high-probability bound that is suited for our purposes. It was introduced (in a slightly different form) in EKYfluc .

be two families of nonnegative random variables, where U(N)U^{(N)} is a possibly NN-dependent parameter set. We say that XX is stochastically dominated by YY, uniformly in uu, if for all (small) ε>0\varepsilon>0 and (large) D>0D>0 we have

for large enough NN0(ε,D)N\geqslant N_{0}(\varepsilon,D). Unless stated otherwise, throughout this paper the stochastic domination will always be uniform in all parameters apart from the parameter δ\delta in (2.4) and the sequence of constants μp\mu_{p} in (2.6); thus, N0(ε,D)N_{0}(\varepsilon,D) also depends on δ\delta and μp\mu_{p}. If XX is stochastically dominated by YY, uniformly in uu, we use the notation XYX\prec Y. Moreover, if for some complex family XX we have XY\lvert X\rvert\prec Y we also write X=O(Y)X=O_{\prec}(Y).

For example, using Chebyshev’s inequality and (2.6) one easily finds that

The relation \prec is a partial ordering, i.e. it is transitive and it satisfies the familiar arithmetic rules of order relations. For instance if X1Y1X_{1}\prec Y_{1} and X2Y2X_{2}\prec Y_{2} then X1+X2Y1+Y2X_{1}+X_{2}\prec Y_{1}+Y_{2} and X1X2Y1Y2X_{1}X_{2}\prec Y_{1}Y_{2}. More general statements in this spirit are given in Lemma 4.4 below.

a spectral domain. (Recall that MMNM\equiv M_{N} depends on NN.)

Note that η~E\widetilde{\eta}_{E} depends on γ\gamma, but we do not explicitly indicate this dependence since we regard γ\gamma as fixed. At a first reading we advise the reader to think of γ\gamma as being zero. Note also that the lower bound in (A.3) below implies that η~EM1\widetilde{\eta}_{E}\geqslant M^{-1}. We also define the distance to the spectral edge,

Finally, we introduce the fundamental control parameter

which will be used throughout this paper as a sharp, deterministic upper bound on the entries of GG. Note that the condition in the definition of η~E\widetilde{\eta}_{E} states that the first term of Π\Pi is bounded by MγΓ~2M^{-\gamma}\widetilde{\Gamma}^{-2} and the second term by MγΓ~3M^{-\gamma}\widetilde{\Gamma}^{-3}. We may now state our main result.

Fix γ(0,1/2)\gamma\in(0,1/2) and define the spectral domain

We remark that the main estimate for the Stieltjes transform mNm_{N} is (2.19). The other estimate (2.20) is mainly useful for controlling the norm of HH, which we do in Section 7. We also recall that uniformity for the spectral parameter zz means that the threshold N0(ε,D)N_{0}(\varepsilon,D) in the definition of \prec is independent of the choice of zz within the indicated spectral domain. As stated in Definition 2.1, this uniformity holds for all statements containing \prec, and is not explicitly mentioned in the following; all of our arguments are trivially uniform in zz and any matrix indices.

Theorem 2.3 has the following variant for matrix entries where the condition (2.6) is only imposed for some large but fixed pp. More precisely, for any ε>0\varepsilon>0 and D>0D>0 there exists a constant p(ε,D)p(\varepsilon,D) such that if (2.6) holds for p=p(ε,D)p=p(\varepsilon,D) then

Most of the previous works ESY1 ; ESY2 ; EYY ; EYY2 ; EYYrigi ; EKYY1 assumed a stronger, subexponential decay condition on ζij\zeta_{ij} instead of (2.6). Under the subexponential decay condition, certain probability estimates in the results were somewhat stronger and precise tolerance thresholds were sharper. Roughly, this corresponds to operating with a modified definition of \prec, where the factors NεN^{\varepsilon} are replaced by high powers of logN\log N and the polynomial probability bound NDN^{-D} is replaced with a subexponential one. The proofs of the current paper can be easily adjusted to such a setup, but we shall not pursue this further.

A local semicircle law for Wigner matrices on the optimal scale η1/N\eta\gtrsim 1/N was first obtained in ESY2 . The optimal error estimates in the bulk were proved in EYY2 , and extended to the edges in EYYrigi . These estimates underlie the derivation of rigidity estimates for individual eigenvalues, which in turn were used in EYYrigi to prove Dyson’s conjecture on the optimal local relaxation time for the Dyson Brownian motion.

Apart from the somewhat different assumption on the tails of the entries of HH (see Remark 2.5), Theorem 2.3, when restricted to generalized Wigner matrices, subsumes all previous local semicircle laws obtained in ESY1 ; ESY2 ; EYY2 ; EYYrigi . For band matrices, a local semicircle law was proved in EYY . (In fact, in EYY the band structure was not required; only the conditions (2.2), (2.3), and the subexponential decay condition for the matrix entries (instead of (2.6)) were used.) Theorem 2.3 improves this result in several ways. First, the error bounds in (2.18) and (2.19) are uniform in EE, even for EE near the spectral edge; the corresponding bounds in Theorem 2.1 of EYY diverged as κ1\kappa^{-1}. Second, the bound (2.19) on the Stieltjes transform is better than (2.16) in EYY by a factor (Mη)1/2(M\eta)^{-1/2}. This improvement is due to exploiting the fluctuation averaging mechanism of Theorem 4.6. Third, the domain of η\eta for which Theorem 2.3 applies is essentially ηκ7/2M1\eta\gg\kappa^{-7/2}M^{-1}, which is somewhat larger than the domain ηκ4M1\eta\gg\kappa^{-4}M^{-1} of EYY .

While Theorem 2.3 subsumes several previous local semicircle laws, two previous results are not covered. The local semicircle law for sparse matrices proved in EKYY1 does not follow from Theorem 2.3. However, the argument of this paper may be modified so as to include sparse matrices as well; we do not pursue this issue further. The local semicircle law for one-dimensional band matrices given in Theorem 2.2 of EKYY3 is, however, of a very different nature, and may not be recovered using the methods of the current paper. Under the conditions WN4/5W\gg N^{4/5} and ηN2/W3\eta\gg N^{2}/W^{3}, Theorem 2.2 of EKYY3 shows that (focusing for simplicity on the one-dimensional case)

in the bulk spectrum, which is stronger than the bound of order (Wη)1/2(W\eta)^{-1/2} in (2.18). The proof of (2.21) relies on a very general fluctuation averaging result from EKYfluc , which is considerably stronger than Theorems 4.6 and 4.7; see Remark 4.8 below. The key open problem for band matrices is to establish a local semicircle law on a scale η\eta below W1W^{-1}. The estimate (2.21) suggests that the resolvent entries should remain bounded throughout the range ηmax{N1,W2}\eta\gtrsim\max\{N^{-1},W^{-2}\}.

The local semicircle law, Theorem 2.3, has numerous consequences, several of which are formulated in Sections 7 and 8. Here we only sketch them. Theorem 7.5 states that the empirical counting function converges to the counting function of the semicircle law. The precision is of order M1M^{-1} provided that we have the lower bound sijc/Ns_{ij}\geqslant c/N for some constant c>0c>0. As a consequence, Theorem 7.6 states that the bulk eigenvalues are rigid on scales of order M1M^{-1}. Under the same condition, in Theorem 8.2 we prove the universality of the local two-point correlation functions in the bulk provided that MN33/34M\gg N^{33/34}; we obtain similar results for higher order correlation functions, assuming a stronger restriction on MM. These results generalize the earlier theorems from EYYrigi ; EKYY1 ; EKYY2 , which were valid for generalized Wigner matrices satisfying the condition (1.1), under which MM is comparable to NN. We obtain similar results if the condition sijc/Ns_{ij}\geqslant c/N in (1.1) is relaxed to sijN1ξs_{ij}\geqslant N^{-1-\xi} with some small ξ\xi. The exponent ξ\xi can be chosen near 1 for band matrices with a broad band WNW\asymp N. In particular, we prove universality for such band matrices with a rapidly vanishing mean-field component. These applications of the general Theorem 8.2 are listed in Corollary 8.3.

Examples

Let aaNa\equiv a_{N} and bbNb\equiv b_{N} be possibly NN-dependent positive quantities. We call HH an aa-full Wigner matrix if SS satisfies (2.3) and

Similarly, we call HH a bb-flat Wigner matrix if SS satisfies (2.3) and

(Note that in this case we have MN/bM\geqslant N/b.)

If aa and bb are independent of NN we call an aa-full Wigner matrix simply full and a bb-flat Wigner matrix simply flat. In particular, generalized Wigner matrices, satisfying (1.1), are full and flat Wigner matrices.

for some fixed δ>0\delta^{\prime}>0. Define the dd-dimensional discrete torus

Then HH is a dd-dimensional band matrix with band width WW and profile function ff if

where ZLZ_{L} is a normalization chosen so that (2.3) holds.

The example of Definition 3.3 is a mixture of the previous two. We are especially interested in the case ν1\nu\ll 1, when most of the variance comes from the band matrix, i.e. the profile of SS is very close to a sharp band.

We conclude with some explicit bounds for these examples. The behaviour of Γ\Gamma and Γ~\widetilde{\Gamma} near the spectral edge is governed by the parameter

For general HH and any constant c>0c>0, there is a constant C>0C>0 such that

provided dist(E,{2,0,2})c\operatorname{dist}(E,\{-2,0,2\})\geqslant c.

where CC depends on the constant aa in Definition 3.1 but cc does not.

For a band matrix with a mean-field component, as in Definition 3.3, we have

The case ν=0\nu=0 corresponds to a band matrix from Definition 3.2.

Tools

Let XX(H)X\equiv X(H) be a random variable. For i{1,,N}i\in\{1,\dots,N\} define the operations PiP_{i} and QiQ_{i} through

We introduce the random zz-dependent control parameters

We remark that the letter Λ\Lambda had a different meaning in several earlier papers, such as EYYrigi . The following lemma collects basic bounds on mm.

There is a constant c>0c>0 such that for EE\in and η(0,10]\eta\in(0,10] we have

The proof is an elementary exercise using (2.9). ∎

In particular, recalling that 1S1-1\leqslant S\leqslant 1 and using the upper bound mC\lvert m\rvert\leqslant C from (4.2), we find that there is a constant c>0c>0 such that

The following lemma collects basic algebraic properties of stochastic domination \prec. Roughly, it states that \prec satisfies the usual arithmetic properties of order relations. We shall use it tacitly throughout the following.

Suppose that X(u,v)Y(u,v)X(u,v)\prec Y(u,v) uniformly in uUu\in U and vVv\in V. If VNC\lvert V\rvert\leqslant N^{C} for some constant CC then

Suppose that X1(u)Y1(u)X_{1}(u)\prec Y_{1}(u) uniformly in uu and X2(u)Y2(u)X_{2}(u)\prec Y_{2}(u) uniformly in uu. Then X1(u)X2(u)Y1(u)Y2(u)X_{1}(u)X_{2}(u)\prec Y_{1}(u)Y_{2}(u) uniformly in uu.

If XY+NεXX\prec Y+N^{-\varepsilon}X for some ε>0\varepsilon>0 then XYX\prec Y.

The claims (i) and (ii) follow from a simple union bound. The claim (iii) is an immediate consequence of the definition of \prec. ∎

The following resolvent identities form the backbone of all of our calculations. The idea behind them is that a resolvent matrix element GijG_{ij} depends strongly on the ii-th and jj-th columns of HH, but weakly on all other columns. The first identity determines how to make a resolvent matrix element GijG_{ij} independent of an additional index ki,jk\neq i,j. The second identity expresses the dependence of a resolvent matrix element GijG_{ij} on the matrix elements in the ii-th or in the jj-th column of HH.

This is an exercise in linear algebra. The first identity (4.6) was proved in Lemma 4.2 of EYY and the second is an immediate consequence of the first. The identity (4.7) is proved in Lemma 6.10 of EKYY2 . ∎

Our final tool consists of the following results on fluctuation averaging. They exploit cancellations in sums of fluctuating quantities involving resolvent matrix entries. A very general result was obtained in EKYfluc ; in this paper we state a special case sufficient for our purposes here, and give a relatively simple proof in Appendix B. We consider weighted averages of diagonal resolvent matrix entries GkkG_{kk}. They are weakly dependent, but the correlation between GkkG_{kk} and GmmG_{mm} for mkm\neq k is not sufficiently small to apply the general theory of sums of weakly dependent random variables; instead, we need to exploit the precise form of the dependence using the resolvent structure.

It turns out that the key quantity that controls the magnitude of the fluctuations is Λ\Lambda. However, being a random variable, Λ\Lambda itself is unsuitable as an upper bound. For technical reasons (our proof relies on a high-moment estimate combined with Chebyshev’s inequality), it is essential that Λ\Lambda be estimated by a deterministic control parameter, which we call Ψ\Psi. The error terms are then estimated in terms of powers of Ψ\Psi. We shall always assume that Ψ\Psi satisfies

Typical example weights are tik=sikt_{ik}=s_{ik} and tik=N1t_{ik}=N^{-1}. Note that in both of these cases TT commutes with SS. We introduce the average of a vector (ai)i=1N(a_{i})_{i=1}^{N} through

where we defined vi\vbox..=Giimv_{i}\mathrel{\vbox{\hbox{.}\hbox{.}}}=G_{ii}-m. The estimates (4.11), (4.12), and (4.14) are uniform in the index ii.

In fact, the first bound of (4.11) can be improved as follows.

The first instance of the fluctuation averaging mechanism appeared in EYY2 for the Wigner case, where [Z]=N1kZk[Z]=N^{-1}\sum_{k}Z_{k} was proved to be bounded by Λo2\Lambda_{o}^{2}. Since Qk[Gkk]1Q_{k}[G_{kk}]^{-1} is essentially ZkZ_{k} (see (5.6) below), this corresponds to the first bound in (4.11). A different proof (with a better bound on the constants) was given in EYYrigi . A conceptually streamlined version of the original proof was extended to sparse matrices EKYY1 and to sample covariance matrices PY1 . Finally, an extensive analysis in EKYfluc treated the fluctuation averaging of general polynomials of resolvent entries and identified the order of cancellations depending on the algebraic structure of the polynomial. Moreover, in EKYfluc an additional cancellation effect was found for the quantity QiGij2Q_{i}|G_{ij}|^{2}. These improvements played a key role in obtaining the diffusion profile for the resolvent of band matrices and the estimate (2.21) in EKYY3 .

All proofs of the fluctuation averaging theorems rely on computing expectations of high moments of the averages, and carefully estimating the resulting terms. In EKYfluc , a diagrammatic representation was developed for bookkeeping such terms, but this is necessary only for the case of general polynomials. For the special cases given in Theorem 4.6, the proof is relatively simple and it is presented in Appendix B. Compared with EYY2 ; EYYrigi ; EKYY1 , the algebra of the decoupling of the randomness is greatly simplified in the current paper. Moreover, unlike their counterparts from EYY2 ; EYYrigi ; EKYY1 , the fluctuation averaging results of Theorems 4.6 and 4.7 do not require conditioning on the complement of some “bad” low-probability event, because such events are automatically accounted for by the definition of \prec ; this leads to further simplifications in the proofs of Theorems 4.6 and 4.7.

A simpler proof using ΓΓ\Gamma instead of Γ~~Γ\widetilde{\Gamma}

In this section we prove the following weaker version of Theorem 2.3. In analogy to (2.14), we introduce the lower boundary

Fix γ(0,1/2)\gamma\in(0,1/2) and define the spectral domain

Note that the only difference between Theorems 2.3 and 5.1 is that Γ~\widetilde{\Gamma} was replaced with the larger quantity Γ\Gamma in the definition of the threshold ηE\eta_{E} and the spectral domain, so that

Hence Theorem 5.1 is indeed weaker than Theorem 2.3, since it holds on a smaller spectral domain. As outlined after (2.11) and discussed in detail in Appendix A, Theorems 5.1 and 2.3 are equivalent provided EE is separated from the set {2,0,2}\{-2,0,2\} (for band matrices they are equivalent provided EE is separated from the spectral edges ±2\pm 2).

The rest of this section is devoted to the proof of Theorem 5.1. We give the full proof of Theorem 5.1 for pedagogical reasons, since it is simpler than that of Theorem 2.3 but already contains several of its key ideas. Theorem 2.3 will be proved in Section 6. One big difference between the two proofs is that in Theorem 5.1 the main control parameter is Λ\Lambda, while in Theorem 2.3 we have to keep track of two control parameters, Λ\Lambda and the smaller Θ\Theta.

The key tool behind the proof is a self-consistent equation for the diagonal entries of GG. The starting point is Schur’s complement formula, which we write as

The partial expectation with respect to the index ii (see Definition 4.2) of the last term on the right-hand side reads

where in the first step we used (2.1) and in the second (4.6). Introducing the notation

and recalling (2.3), we therefore get from (5.6) that

where we introduced the fluctuating error term

Using (2.8), we therefore get the self-consistent equation

Notice that this is an equation for the family (vi)i=1N(v_{i})_{i=1}^{N}, with random error terms Υi\Upsilon_{i}.

Self-consistent equations play a crucial role in analysing resolvents of random matrices. The simplest one is the scalar (or first level) self-consistent equation for mN(z)m_{N}(z), the Stieltjes transform of the empirical density (2.12). By averaging the inverse of (5.7) and neglecting the error terms, one obtains that mNm_{N} approximately satisfies the equation m=(m+z)1m=-(m+z)^{-1}, which is the defining relation for the Stieltjes transform of the semicircle law (2.8).

The vector (or second level) self-consistent equation, as given in (5.9), allows one to control not only fluctuations of mNmm_{N}-m but also those of GiimG_{ii}-m. The equation (5.9) first appeared in EYY , where a systematic study of resolvent entries of random matrices was initiated.

For completeness, we mention that a matrix (or third level) self-consistent equation for local averages of Gij2|G_{ij}|^{2}, was introduced in EKYY3 . This equation constitutes the backbone of the study of the diffusion profile of the resolvent entries of random band matrices.

We begin with the first statement. We shall often use the fact that, by the lower bound of (4.2) and the assumption ϕΛMc\phi\Lambda\prec M^{-c}, we have

First we estimate ZiZ_{i}, which we split as

We estimate each term using the large deviation estimates from Theorem C.1, by conditioning on G(i)G^{(i)} and using the fact that the family (hik)k=1N(h_{ik})_{k=1}^{N} is independent of G(i)G^{(i)}. By (C.2), the first term of (5.13) is stochastically dominated by \phi\bigl{(}{\sum_{k}^{(i)}s_{ik}^{2}\bigl{\lvert}G_{kk}^{(i)}\bigr{\rvert}^{2}}\bigr{)}^{1/2}\prec M^{-1/2}, where we used the estimate (2.2) and \phi\bigl{\lvert}G_{kk}^{(i)}\bigr{\rvert}\prec 1, as follows from (4.6), (5.12), and the assumption ϕΛMc\phi\Lambda\prec M^{-c}. For the second term of (5.13) we apply (C.4) with akl=sik1/2Gkl(i)sli1/2a_{kl}=s_{ik}^{1/2}G_{kl}^{(i)}s_{li}^{1/2} and Xk=ζikX_{k}=\zeta_{ik} (see (2.5)). We find

where the second step follows by spectral decomposition of G(i)G^{(i)}, and in the last step we used (4.6) and (5.12). Thus we get

where we absorbed the bound M1/2M^{-1/2} on the first term of (5.13) into the right-hand side of (5.15), using Immη\operatorname{Im}m\geqslant\eta as follows from (4.4).

Next, we estimate Λo\Lambda_{o}. We can iterate (4.7) once to get, for iji\neq j,

The term hijh_{ij} is trivially O(M1/2)O_{\prec}(M^{-1/2}). In order to estimate the other term, we invoke (C.3) with akl=sik1/2Gkl(ij)slj1/2a_{kl}=s_{ik}^{1/2}G_{kl}^{(ij)}s_{lj}^{1/2}, Xk=ζikX_{k}=\zeta_{ik}, and Yl=ζljY_{l}=\zeta_{lj}. As in (5.14), we find

where we again absorbed the term hijM1/2h_{ij}\prec M^{-1/2} into the right-hand side.

In order to estimate AiA_{i} and hiih_{ii} in the definition of Υi\Upsilon_{i}, we use (5.12) to estimate

where the second step follows from Immη\operatorname{Im}m\geqslant\eta (recall (4.4)). This completes the proof of (5.10).

The proof of (5.11) is almost identical to that of (5.10). The quantities \bigl{\lvert}G^{(i)}_{kk}\bigr{\rvert} and \bigl{\lvert}G^{(ij)}_{kk}\bigr{\rvert} are estimated by the trivial deterministic bound η1\eta^{-1}. We omit the details. ∎

3. A rough bound on ΛΛ\Lambda

The next step in the proof of Theorem 5.1 is to establish the following rough bound on Λ\Lambda.

Then by definition we have ϕΛMγ/4Γ1CMγ/4\phi\Lambda\leqslant M^{-\gamma/4}\Gamma^{-1}\leqslant CM^{-\gamma/4}, where in the last step we used (4.5). Hence we may invoke (5.10) to estimate Λo\Lambda_{o} and Υi\Upsilon_{i}. In order to estimate Λd\Lambda_{d}, we expand the right-hand side of (5.9) in viv_{i} to get

where we used (4.2) and that viCMγ/4\lvert v_{i}\rvert\leqslant CM^{-\gamma/4} on the event {ϕ=1}\{\phi=1\}. Using (5.10) we therefore have

Recalling (4.5) and (5.10), we therefore get

Next, by definition of ϕ\phi we may estimate

Plugging this into (5.18) yields ϕΛMγ/2Γ1\phi\Lambda\prec M^{-\gamma/2}\Gamma^{-1}, which is the claim. ∎

In order to start the continuity argument underlying the proof of Proposition 5.3, we need the following bound on Λ\Lambda for large η\eta.

Moreover, we use (4.6) and (5.16) to estimate

where the last step follows using (C.3), exactly as the estimate of the right-hand side of (5.16) in the proof of Lemma 5.2. We conclude that ΥiM1/2\lvert\Upsilon_{i}\rvert\prec M^{-1/2}.

Using m12\lvert m^{-1}\rvert\geqslant 2 and vk1\lvert v_{k}\rvert\leqslant 1 as follows from (5.19), we find

Using m1/2\lvert m\rvert\leqslant 1/2 we therefore conclude that

from which the claim follows together with the estimate on Λo\Lambda_{o} from (5.20). ∎

We may now conclude the proof of Proposition 5.3 by a continuity argument in η=Imz\eta=\operatorname{Im}z. The gist of the continuity argument is depicted in Figure 5.1.

for NN0N\geqslant N_{0}, where N0N0(γ,D)N_{0}\equiv N_{0}(\gamma,D) does not depend on zz.

(Here the bounds (5.23) and (5.24) each hold surely, i.e. for every realization of Λ(z)\Lambda(z).)

4. Iteration step and conclusion of the proof of Theorem 5.1

In the following a key role will be played by deterministic control parameters Ψ\Psi satisfying

We now improve the estimate ΛΨ\Lambda\prec\Psi iteratively. The iteration step is the content of the following proposition.

Let Ψ\Psi be a control parameter satisfying (5.25) and fix ε(0,γ/3)\varepsilon\in(0,\gamma/3). Then

For the proof of Proposition 5.6 we need the following averaging result, which is a simple corollary of Theorem 4.6.

Suppose that ΛΨ\Lambda\prec\Psi for some deterministic control parameter Ψ\Psi satisfying (4.8). Then [Υ]=O(Ψ2)[\Upsilon]=O_{\prec}(\Psi^{2}) (recall the definition of the average [][\cdot] from (4.10)).

The claim easily follows from Schur’s complement formula (5.6) written in the form

We may therefore estimate [Υ][\Upsilon] using the trivial bound AiΨ2\lvert A_{i}\rvert\prec\Psi^{2} as well as the fluctuation averaging bound from the first estimate of (4.11) with tik=1/Nt_{ik}=1/N. ∎

Suppose that ΛΨ\Lambda\prec\Psi for some deterministic control parameter Ψ\Psi satisfying (5.25). We invoke Lemma 5.2 with ϕ=1\phi=1 (recall the bound (4.5)) to get

Next, we estimate Λd\Lambda_{d}. Define the zz-dependent indicator function

By (5.25), (4.5), and the assumption ΛΨ\Lambda\prec\Psi, we have 1ψ01-\psi\prec 0. On the event {ψ=1}\{\psi=1\}, we expand the right-hand side of (5.9) to get the bound

Using the fluctuation averaging estimate (4.12) as well as (5.27), we find

where we again used the lower bound from (4.5). Using 1ψ01-\psi\prec 0 we conclude

Using Young’s inequality and the assumption ΨMγ/3Γ1\Psi\leqslant M^{-\gamma/3}\Gamma^{-1} we conclude the proof. ∎

We may therefore iterate (5.26). This yields a bound on Λ\Lambda that is essentially the fixed point of the map ΨF(Ψ)\Psi\mapsto F(\Psi), which is Π\Pi (up to the factor MεM^{\varepsilon}). More precisely, the iteration is started with Ψ0\vbox..=Mγ/3Γ1\Psi_{0}\mathrel{\vbox{\hbox{.}\hbox{.}}}=M^{-\gamma/3}\Gamma^{-1}; the initial hypothesis ΛΨ0\Lambda\prec\Psi_{0} is provided by the rough bound from Proposition 5.3. For k1k\geqslant 1 we set Ψk+1\vbox..=F(Ψk)\Psi_{k+1}\mathrel{\vbox{\hbox{.}\hbox{.}}}=F(\Psi_{k}). Hence from (5.26) we conclude that ΛΨk\Lambda\prec\Psi_{k} for all kk. Choosing k\vbox..=ε1k\mathrel{\vbox{\hbox{.}\hbox{.}}}=\lceil{\varepsilon^{-1}}\rceil yields

Since ε\varepsilon was arbitrary, we have proved that

What remains is to prove (5.4), i.e. to estimate Θ\Theta. We expand (5.9) on {ψ=1}\{\psi=1\} to get

By (5.31) and (5.27) with Ψ=Π\Psi=\Pi, we have Λ+ΥiΠ\Lambda+\lvert\Upsilon_{i}\rvert\prec\Pi. Moreover, by Lemma 5.7 we have [Υ]Π2\lvert[\Upsilon]\rvert\prec\Pi^{2}. Thus we get

Since 1ψ01-\psi\prec 0, we conclude that [v]=m2[v]+O(Π2)[v]=m^{2}[v]+O_{\prec}(\Pi^{2}). Therefore

Proof of Theorem 2.3

In Section 6.2, we control Λ\Lambda in terms of Θ\Theta, which allows us to obtain a self-consistent equation involving only Θ\Theta. In this step we use the Fluctuation Averaging Theorem to obtain a quadratic estimate which, very roughly, states that ΛΘ+Λ2\Lambda\lesssim\Theta+\Lambda^{2} (see (6.20) below for the precise statement). This implies ΛΘ\Lambda\lesssim\Theta in the regime Λ1\Lambda\ll 1.

Finally, in Section 6.3, we solve the quadratic iteration for Θ\Theta. Since the corresponding quadratic equation has a dichotomy and for large η=Imz\eta=\operatorname{Im}z we know that Θ\Theta is small by direct expansion, a continuity argument similar to the proof of Proposition 5.3 will complete the proof.

In this section we prove the following apriori bounds on both control parameters, Λ\Lambda and Θ\Theta.

Before embarking on the proof of Proposition 6.1, we state some preparatory lemmas. First, we derive the key equation for [v]=N1ivi[v]=N^{-1}\sum_{i}v_{i}, the average of viv_{i}.

Define the zz-dependent indicator function

For the whole proof we work on the event {ϕ=1}\{\phi=1\}, i.e. every quantity is multiplied by ϕ\phi. We consistently drop these factors ϕ\phi from our notation in order to avoid cluttered expressions. In particular, ΛCMγ/4\Lambda\leqslant CM^{-\gamma/4} throughout the proof.

We begin by estimating Λo\Lambda_{o} and Λd\Lambda_{d} in terms of Θ\Theta. Recalling (4.5), we find that ϕ\phi satisfies the hypotheses of Lemma 5.2, from which we get

In order to estimate Λd\Lambda_{d}, we expand the self-consistent equation (5.9) (on the event {ϕ=1}\{\phi=1\}) to get

here we used the bound (6.4) on Υi|\Upsilon_{i}|. Next, we subtract the average N1iN^{-1}\sum_{i} from each side to get

Combining with the bound Λor(Λ)\Lambda_{o}\prec r(\Lambda) from (6.4), we therefore get

By definition of ϕ\phi we have Γ~Λ2Mγ/4Λ\widetilde{\Gamma}\Lambda^{2}\leqslant M^{-\gamma/4}\Lambda, so that by Lemma 4.4 (iii) the second term on the right-hand side of (6.7) may be absorbed into the left-hand side:

If (6.9) is proved, clearly (6.3) follows from (6.8). In order to prove (6.9), we use (6.8) and the Cauchy-Schwarz inequality to get

for any ε>0\varepsilon>0. We conclude that

Since ε>0\varepsilon>0 was arbitrary, (6.9) follows.

Next, we estimate Θ\Theta. We expand (5.9) to second order:

In order to take the average and get a closed equation for [v][v], we write, using (6.6),

Plugging this back into (6.10) and taking the average over ii gives

Estimating [Υ][\Upsilon] by maxΥir(Λ)\max\lvert\Upsilon_{i}\rvert\prec r(\Lambda) (recall (6.4)) yields

In order to conclude the proof of (6.2), we observe that, by the estimates ΘΛCMγ/4\Theta\leqslant\Lambda\leqslant CM^{-\gamma/4}, Γ~2r(Λ)1\widetilde{\Gamma}^{2}r(\Lambda)\leqslant 1, and ΛMγ/4Γ~1\Lambda\leqslant M^{-\gamma/4}\widetilde{\Gamma}^{-1}, we have

Let ε(0,γ/12)\varepsilon\in(0,\gamma/12). Define the zz-dependent indicator function

Therefore, on the event {ϕχ=1}\{\phi\chi=1\}, in (6.2) we may absorb the second term on the left-hand side and the second term on the right-hand side into the first term on the left-hand side:

Solving this quadratic relation for [v][v], we get

We now have all of the ingredients to complete the proof of Proposition 6.1.

The proof is a continuity argument similar to the proof of Proposition 5.3. In a first step, we prove that

The claim now follows since we may choose ε(0,γ/12)\varepsilon\in(0,\gamma/12) to be arbitrarily small. This concludes the proof of Proposition 6.1. ∎

2. An improved bound on ΛΛ\Lambda in terms of ΘΘ\Theta

In (6.3) we already estimated Λ\Lambda in terms of Θ\Theta; the goal of this section is to improve this bound by removing the factor Γ~\widetilde{\Gamma} from that estimate. We do this using the Fluctuation Averaging Theorem, but we stress that the removal of a factor Γ~\widetilde{\Gamma} is not the main rationale for using the fluctuation averaging mechanism. Its fundamental use will take place in Lemma 6.6 below. A technical consequence of invoking fluctuation averaging is that we have to use deterministic control parameters instead of random ones. Thus, we introduce a deterministic control parameter Φ\Phi that captures the size of the random control parameter Θ\Theta through the relation ΘΦ\Theta\prec\Phi. Throughout the following we shall make use of the control parameter

which differs from q(Φ)q(\Phi) only by a factor Γ~\widetilde{\Gamma} in the second term.

We remark that, by Proposition 6.1, the choice Ψ=Mγ/4Γ~1\Psi=M^{-\gamma/4}\widetilde{\Gamma}^{-1} and Φ=(Mη)1/3Mγ/4Γ~1\Phi=(M\eta)^{-1/3}\leqslant M^{-\gamma/4}\widetilde{\Gamma}^{-1} satisfies the assumptions of Lemma 6.5.

Choosing ϕ=1\phi=1 in Lemma 5.2 and recalling (4.5), we get

In order to estimate Λd\Lambda_{d}, as in (5.32), we expand (5.9) to get

As in the proof of (5.32) and (6.5), the expansion of (5.9) is only possible on the event {ΛMδ}\{\Lambda\leqslant M^{-\delta}\} for some δ>0\delta>0. By ΛΨ\Lambda\prec\Psi and (6.15), the indicator function of this event is 1+O(0)1+O_{\prec}(0); the contribution O(0)O_{\prec}(0) of the complementary event can be absorbed in the error term O(Ψ2)O_{\prec}(\Psi^{2}).

Subtracting the average N1iN^{-1}\sum_{i} from both sides of (6.18) and estimating m2m^{2} by a constant (see (4.2)) yields

where in the last step we used the fluctuation averaging estimate (4.14) and Υir(Ψ)\lvert\Upsilon_{i}\rvert\prec r(\Psi) from (6.17). Together with [v]=ΘΦ\lvert[v]\rvert=\Theta\prec\Phi , this gives the estimate ΛdΓΨ2+Φ+r(Ψ)\Lambda_{d}\prec\Gamma\Psi^{2}+\Phi+r(\Psi). Combining it with the bound (6.17), we conclude that

Now fix ε(0,γ/4)\varepsilon\in(0,\gamma/4). Using the assumption Γ~ΨCMγ/4Mε\widetilde{\Gamma}\Psi\leqslant CM^{-\gamma/4}\leqslant M^{-\varepsilon}, we conclude: if Ψ\Psi and Φ\Phi satisfy (6.15) then

This implies the claimed bound (6.16) on Λ\Lambda. Calling the right-hand side of (6.22) Ψ\Psi, we find

Hence the claimed bound (6.16) on Λo\Lambda_{o} and ZiZ_{i} follows from (6.17). ∎

3. Iteration for ΘΘ\Theta and conclusion of the proof of Theorem 2.3

Next, we prove the following version of (5.9), which is the key tool for estimating Θ\Theta.

Notice that this bound is stronger than the previous formula (6.2), as the power of p(Φ)p(\Phi) is two instead of one. The improvement is due to using fluctuation averaging in [Υ][\Upsilon]. Otherwise the proof is very similar to that of (6.2).

since ΘΛMγ/4Γ~1\Theta\leqslant\Lambda\prec M^{-\gamma/4}\widetilde{\Gamma}^{-1}. From Lemma 6.5 we get Λo+Zip(Φ)\Lambda_{o}+\lvert Z_{i}\rvert\prec p(\Phi) and ΛΨ\Lambda\prec\Psi, where

Now we expand the right-hand side of (5.9) exactly as in (6.10) to get

Using Theorem 4.7 and the bound Λop(Φ)\Lambda_{o}\prec p(\Phi) from Lemma 6.5, we may prove, exactly as in Lemma 5.7, that [Υ]p(Φ)2\lvert[\Upsilon]\rvert\prec p(\Phi)^{2}. Taking the average over ii in (6.27) therefore yields

Using the estimates (6.19) and (6.23), we write the quadratic term on the left-hand side as

where we also used Γ~Ψ2Mγ/4\widetilde{\Gamma}\Psi\leqslant 2M^{-\gamma/4}, as observed after (6.26). From (6.28) we therefore get

The bound on Θ\Theta will follow by iterating the following estimate.

If ΦM3ε(Mη)1\Phi\geqslant M^{3\varepsilon}(M\eta)^{-1} then

If E2\lvert E\rvert\geqslant 2, M3εM(κ+η)ΦMεκ+η\frac{M^{3\varepsilon}}{M(\kappa+\eta)}\leqslant\Phi\leqslant M^{\varepsilon}\sqrt{\kappa+\eta}, and Mηκ+ηM2εM\eta\sqrt{\kappa+\eta}\geqslant M^{2\varepsilon}, then

such constant exists by (4.2) and (4.3). Define the indicator function

Hence on the event {ψ=1}\{\psi=1\} we may absorb the quadratic term on the left-hand side of (6.24) into the linear term, to get the bound

which replaces (6.32) and (6.33). The rest of the argument is unchanged. ∎

Choosing k=ε1k=\lceil{\varepsilon^{-1}}\rceil yields ΘM3ε(Mη)1\Theta\prec M^{3\varepsilon}(M\eta)^{-1}. Since ε\varepsilon can be made as small as desired, we therefore obtain Θ(Mη)1\Theta\prec(M\eta)^{-1}. This is (2.19).

In the regime E2\lvert E\rvert\geqslant 2, the same argument with the better iteration bound (6.30) yields (2.20). The iteration can be started with Φ0=M3ε(Mη)1\Phi_{0}=M^{3\varepsilon}(M\eta)^{-1} from (2.19).

Finally, the bound ΛΠ\Lambda\prec\Pi in (2.18) follows from (2.19) and Lemma 6.5. This concludes the proof of Theorem 2.3.

Density of states and eigenvalue locations

In this section we apply the local semicircle law to obtain information on the density of states and on the location of eigenvalues. The techniques used here have been developed in a series of papers ESY2 ; ESYY ; EYYrigi ; EKYY1 .

The first result is to translate the local semicircle law, Theorem 2.3, into a statement on the counting function of the eigenvalues. Let λ1λ2λN\lambda_{1}\leqslant\lambda_{2}\leqslant\cdots\leqslant\lambda_{N} denote the ordered eigenvalues of HH, and recall the semicircle density ϱ\varrho defined in (2.7). We define the distribution functions

For given E1<E2E_{1}<E_{2} in $$ we abbreviate

Then, for 10E1<E210-10\leqslant E_{1}<E_{2}\leqslant 10, we have

The accuracy of the estimate (7.4) depends on Γ~\widetilde{\Gamma} (see (A.3) for explicit bounds on Γ~\widetilde{\Gamma}), since Γ~\widetilde{\Gamma} determines η~E\widetilde{\eta}_{E}, the smallest scale on which the local semicircle law (Theorem 2.3) holds around the energy EE. In the regime away from the spectral edges E=±2E=\pm 2 and away from E=0E=0, the parameter Γ~\widetilde{\Gamma} is essentially bounded (see the example (i) from Section 3); in this case η~EM1\widetilde{\eta}_{E}\asymp M^{-1} (up to an irrelevant logarithmic factor). For EE near , the parameter Γ~\widetilde{\Gamma} blows up as E2E^{-2}, so that η~EE12M1\widetilde{\eta}_{E}\sim E^{-12}M^{-1}; however, if SS has a positive gap δ\delta_{-} at the bottom of its spectrum, Γ~\widetilde{\Gamma} remains bounded in the vicinity of E=0E=0 (see (A.3)). See Definition A.1 in Appendix A for the definition of the spectral gaps δ±\delta_{\pm}.

where AA is an L×(NL)L\times(N-L) rectangular matrix with independent centred entries. The eigenvalues of HH are the square roots (with both signs) of the eigenvalues of the random covariance matrices AAAA^{*} and AAA^{*}A, whose spectral density is asymptotically given by the Marchenko-Pastur law MP . The instability near E=0E=0 arises from the fact that HH has a macroscopically large kernel unless L/N1/2L/N\to 1/2. In the latter case the support of the Marchenko-Pastur law extends to zero and in fact the density diverges as E1/2E^{-1/2}. We remark that a local version of the Marchenko-Pastur law was given in ESYY for the case when the limit of L/NL/N differs from 0, 1/2 and \infty; the “hard edge” case, L/N1/2L/N\to 1/2, in which the density near the lower spectral edge is singular, was treated in CMS .

This example shows that the vanishing of δ\delta_{-} may lead to a very different behaviour of the spectral statistics. Although our technique is also applicable to random covariance matrices, for simplicity in this section we assume that δc\delta_{-}\geqslant c for some positive constant cc. By Proposition A.3, this holds for random band matrices, for full Wigner matrices (see Definition 3.1), and for their combinations; these examples are our main interest in this paper.

Under the condition δc\delta_{-}\geqslant c, the upper bound of (A.3) yields

where θ\theta was defined in (3.2) and δ+\delta_{+} is the upper gap of the spectrum of SS given in Definition A.1. Notice that θ\theta vanishes near the spectral edge E=±2E=\pm 2 as η0\eta\to 0. For the purpose of estimating Γ~\widetilde{\Gamma}, this deterioration is mitigated if the upper gap δ+\delta_{+} is non-vanishing. While full Wigner matrices satisfy δ+c\delta_{+}\geqslant c, the lower bound on δ+\delta_{+} for band matrices is weaker; see Proposition A.3 for a precise statement.

We first give an estimate on η~x\widetilde{\eta}_{x} using the explicit bound (7.5). While not fully optimal, this estimate is sufficient for our purposes and in particular reproduces the correct behaviour when δ+c\delta_{+}\geqslant c.

Suppose that δc\delta_{-}\geqslant c (so that (7.5) holds). Then we have for any x2\lvert x\rvert\leqslant 2

In the regime 2x102\leqslant\lvert x\rvert\leqslant 10 we have the improved bound

For any x2\lvert x\rvert\leqslant 2 define ηx\eta_{x}^{\prime} as the solution of the equation

This solution is unique since the left-hand side is decreasing in η\eta. An elementary but tedious analysis of (7.8) yields

(The calculation is based on the observation that if η(a+ηα)b\eta(a+\eta^{\alpha})\leqslant b for some a,b>0a,b>0 and α0\alpha\geqslant 0, then η2b(bα1+α+a)1\eta\leqslant 2b(b^{\frac{\alpha}{1+\alpha}}+a)^{-1}.) From (7.5), Imm(x+iη)Cκx+η\operatorname{Im}m(x+i\eta)\leqslant C\sqrt{\kappa_{x}+\eta} (see (4.4)) and the simple bound θ(x+iη)c(κx+η2/3)\theta(x+i\eta)\geqslant c(\kappa_{x}+\eta^{2/3}), we get for ηηx\eta\geqslant\eta^{\prime}_{x}

The proof of (7.7) is similar, but we use θ=κ+η\theta=\sqrt{\kappa+\eta} and the stronger bound Immη/κ+η\operatorname{Im}m\leqslant\eta/\sqrt{\kappa+\eta} available in the regime x2\lvert x\rvert\geqslant 2. For 2x102\leqslant\lvert x\rvert\leqslant 10, define ηx\eta_{x}^{\prime} to be the solution of the equation

As for (7.9), a tedious calculation yields

Next, we obtain an estimate on the extreme eigenvalues.

Suppose that δc\delta_{-}\geqslant c (so that (7.5) holds) and that N3/4MNN^{3/4}\leqslant M\leqslant N. Then we have

where we introduced the control parameter

In particular, if δ+c\delta_{+}\geqslant c then

Note that (7.13) yields the optimal error bound O(N2/3)O_{\prec}(N^{-2/3}) in the case of a full and flat Wigner matrix (see Definition 3.1). Under stronger assumptions on the law of the entries of HH, Theorem 7.3 can be improved as follows.

Suppose that the matrix elements hijh_{ij} have a uniform subexponential decay, i.e. that there exist positive constants CC and ϑ\vartheta such that

If in addition the law of each matrix entry is symmetric (i.e. hijh_{ij} and hij-h_{ij} have the same law), then (7.11) holds with

We remark that (7.15) can obtained via a relatively standard moment method argument combined with refined combinatorics. Obtaining the bound (7.16) is fairly involved; it makes use of the Chebyshev polynomial representation first used by Feldheim and Sodin FSo ; So1 in this context for a special distribution of hijh_{ij}, and extended in EK2 to general symmetric entries.

We shall prove a lower bound on the smallest eigenvalue λ1\lambda_{1} of HH; the largest eigenvalue λN\lambda_{N} may be estimated similarly from above. Fix a small γ>0\gamma>0 and set

We distinguish two regimes depending on the location of λ1\lambda_{1}, i.e. we decompose

In the first regime we further decompose the probability space by estimating

Clearly, ηkκk\eta_{k}\leqslant\kappa_{k} since MNM\leqslant N. On the support of ϕ1,k\phi_{1,k} we have λ1EkC/Nηk\lvert\lambda_{1}-E_{k}\rvert\leqslant C/N\leqslant\eta_{k}, so that we get the lower bound

for some positive constant cc. On the other hand, by (4.4), we have

for some positive constant cc^{\prime}. Here in the second step we used that ηk/κkMγ(Nηk)1\eta_{k}/\sqrt{\kappa_{k}}\leqslant M^{-\gamma}(N\eta_{k})^{-1}.

Comparing this bound with (7.18) we conclude that ϕ1,k0\phi_{1,k}\prec 0 (i.e. the event {ϕ1,k=1}\{\phi_{1,k}=1\} has very small probability). Summing over kk yields ϕ10\phi_{1}\prec 0. Note that in this proof the stronger bound (2.20) outside of the spectrum was essential; the general bound of order (Mηk)1(M\eta_{k})^{-1} from (2.19) is not smaller than the right-hand side of (7.18).

The preceding proof of ϕ10\phi_{1}\prec 0 assumed the existence of a spectral gap δ+c\delta_{+}\geqslant c. The above argument easily carries over to the case without a gap of constant size, in which case we choose

All that remains to complete the proof of (7.11) and (7.13) is the estimate ϕ20\phi_{2}\prec 0. Clearly

In part (2) of Lemma 7.2 in EYY it was shown, using the moment method, that the right-hand side is bounded by CNcloglogNCN^{-c\log\log N} provided the matrix entries hijh_{ij} have subexponential decay, i.e.

for some constants α,β\alpha,\beta (recall the notation (2.5)). In this paper we only assume polynomial decay, (2.6). However, the subexponential decay assumption of EYY was only used in the first truncation step, Equations (7.28)–(7.29) in EYY , where a new set of independent random variables h^ij\widehat{h}_{ij} was constructed with the properties that

for n=(logN)(loglogN)n=(\log N)(\log\log N). Under the condition (2.6) the same truncation can be performed, but the estimates in (7.20) will be somewhat weaker; instead of the exponent n=(logN)(loglogN)n=(\log N)(\log\log N) we get n=DlogNn=D\log N for any fixed D>0D>0. The conclusion of the same proof is that, assuming only (2.6), we have

for any positive number DD and for any NN0(D)N\geqslant N_{0}(D). This guarantees that ϕ2H0\phi_{2}\lVert H\rVert\prec 0. Together with the estimate ϕ1H3ϕ10\phi_{1}\lVert H\rVert\leqslant 3\phi_{1}\prec 0 established above, this completes the proof of Theorem 7.3. ∎

The estimate of H\lVert H\rVert with X=M1/6X=M^{-1/6} follows from the proof of part (2) of Lemma 7.2 in EYY , by choosing k=M1/6εk=M^{-1/6-\varepsilon} with any small ε>0\varepsilon>0 in (7.32) of EYY . This argument can be improved to X=M1/4X=M^{-1/4} by the remark after (7.18) in EYY . Finally, the bound with X=M2/3X=M^{-2/3} under the symmetry condition on the entries of HH is proved in Theorem 3.4 of EK2 . ∎

Next, we establish an estimate on the normalized counting function nN{\mathfrak{n}}_{N} defined in (7.1). As above, the exponents are not expected to be optimal, but the estimate is in general sharp if δ+c\delta_{+}\geqslant c.

Suppose that δc\delta_{-}\geqslant c (so that (7.5) holds). Then

where we introduced the control parameter

First we prove the bound (7.22) for any fixed EE\in. Define the dyadic energies Ek\vbox..=22k(δ++M1/5)E_{k}\mathrel{\vbox{\hbox{.}\hbox{.}}}=-2-2^{k}(\delta_{+}+M^{-1/5}). By (7.6) we have for all k0k\geqslant 0

A similar bound holds for Ek\vbox..=2+2k(δ++M1/5)E_{k}^{\prime}\mathrel{\vbox{\hbox{.}\hbox{.}}}=-2+2^{k}(\delta_{+}+M^{-1/5}). For any EE\in, we express nN(E)n(E){\mathfrak{n}}_{N}(E)-n(E) as a telescopic sum and use (7.4) to get

Here we used that n(10)=0n(-10)=0 and nN(10)nN(3)0{\mathfrak{n}}_{N}(-10)\leqslant{\mathfrak{n}}_{N}(-3)\prec 0 by (7.21). In fact, (7) easily extends to any E<10E<-10. By an analogous dyadic analysis near the upper spectral edge, we also get (7.21) for any E0E\geqslant 0. Since this holds for any γ>0\gamma>0, we thus proved

To prove the statement uniformly in EE, we define the classical location of the α\alpha-th eigenvalue γα\gamma_{\alpha} through

Applying (7.25) for the NN energies E=γ1,,γNE=\gamma_{1},\dots,\gamma_{N}, we get

uniformly in α=1,,N\alpha=1,\dots,N. Since nN(E){\mathfrak{n}}_{N}(E) and n(E)n(E) are nondecreasing and Y1/NY\geqslant 1/N, we find

uniformly in α=2,3,\alpha=2,3,\dots. Below γ1\gamma_{1} we use (7.27) to get

Finally, for any EγNE\geqslant\gamma_{N}, we have nN(E)n(E)=nN(E)10{\mathfrak{n}}_{N}(E)-n(E)={\mathfrak{n}}_{N}(E)-1\leqslant 0 deterministically. Thus we have proved

Next, we derive rigidity bounds on the locations of the eigenvalues. Recall the definition of γα\gamma_{\alpha} from (7.26).

Suppose that δc\delta_{-}\geqslant c (so that (7.5) holds) and that (7.11) and (7.22) hold with some positive control parameters X,YCX,Y\leqslant C. Define α^\vbox..=min{α,N+1α}\widehat{\alpha}\mathrel{\vbox{\hbox{.}\hbox{.}}}=\min\{{\alpha,N+1-\alpha}\} and let ε>0\varepsilon>0 be arbitrary. Then

To simplify notation, we assume that αN/2\alpha\leqslant N/2 so that α^=α\widehat{\alpha}=\alpha; the other eigenvalues are handled analogously. Without loss of generality we assume that λN/21\lambda_{N/2}\leqslant 1. Indeed, the condition λN/21\lambda_{N/2}\leqslant 1 is equivalent to n(1)1/2{\mathfrak{n}}(1)\geqslant 1/2, which holds with very high probability by Theorem 7.5 and the fact that nsc(1)>1/2n_{sc}(1)>1/2.

where in the last step we used Theorem 7.5. By definition of n(x)n(x) we have for 2x1-2\leqslant x\leqslant 1 that

Suppose first that αα0\vbox..=MεNY\alpha\geqslant\alpha_{0}\mathrel{\vbox{\hbox{.}\hbox{.}}}=M^{\varepsilon}NY. Then n(γα)MεYn(\gamma_{\alpha})\geqslant M^{\varepsilon}Y, so that the relation (7.30) implies

which yields n(γα)n(λα)n(\gamma_{\alpha})\asymp n(\lambda_{\alpha}). By (7.31), we we therefore get that n(γα)n(λα)n^{\prime}(\gamma_{\alpha})\asymp n^{\prime}(\lambda_{\alpha}) as well. Since nn^{\prime} is nondecreasing, we get n(x)n(γα)n(λα)n^{\prime}(x)\asymp n^{\prime}(\gamma_{\alpha})\asymp n^{\prime}(\lambda_{\alpha}) for any xx between γα\gamma_{\alpha} and λα\lambda_{\alpha}. Therefore, by the mean value theorem, we have

where in the last step we used (7.30) and (7.32). This proves (7.28) for αMεNY\alpha\geqslant M^{\varepsilon}NY.

For the remaining indices, α<α0\alpha<\alpha_{0}, we get from (7.30) the upper bound

where in the second step we used (7.28) and in the last step (7.32). In order to obtain a lower bound, we use Theorem 7.3 to get

Similar bounds hold for γα\gamma_{\alpha} as well:

Finally, we state a trivial corollary of Theorem 7.6.

Suppose that δc\delta_{-}\geqslant c and that (7.11) and (7.22) hold with some positive control parameters X,YCX,Y\leqslant C. Then

In this section we prove Lemma 7.1. Define the empirical eigenvalue distribution

Following ERSY , we use the Helffer-Sjöstrand functional calculus Davies ; HS . Introduce \mathcal{E}\;\mathrel{\vbox{\hbox{.}\hbox{.}}}=\;\max\bigl{\{}{E_{2}-E_{1},\widetilde{\eta}}\bigr{\}}\,. Let χ\chi be a smooth cutoff function equal to 11 on [E,E][-\mathcal{E},\mathcal{E}] and vanishing on [2E,2E]c[-2\mathcal{E},2\mathcal{E}]^{c}, such that χ(y)CE1\lvert\chi^{\prime}(y)\rvert\leqslant C\mathcal{E}^{-1}. Let ff be a characteristic function of the interval [E1,E2][E_{1},E_{2}] smoothed on the scale η~\widetilde{\eta}: f(x)=1f(x)=1 on [E1+η~,E2η~][E_{1}+\widetilde{\eta},E_{2}-\widetilde{\eta}], f(x)=0f(x)=0 on [E1,E2]c[E_{1},E_{2}]^{c}, f(x)Cη~1\lvert f^{\prime}(x)\rvert\leqslant C\widetilde{\eta}^{-1}, and f(x)Cη~2\lvert f^{\prime\prime}(x)\rvert\leqslant C\widetilde{\eta}^{-2}. Note that the supports of ff^{\prime} and ff^{\prime\prime} have measure O(η~)O(\widetilde{\eta}).

Then we have the estimate (see Equation (B.13) in ERSY )

Since χ\chi^{\prime} vanishes away from [E,2E][\mathcal{E},2\mathcal{E}] and ff vanishes away from [E1,E2][E_{1},E_{2}], we may apply (7.2) to get

uniformly for x[E1,E2]x\in[E_{1},E_{2}] and yη~y\geqslant\widetilde{\eta}. Thus the first term on the right-hand side of (7.33) is bounded by

for yη~y\leqslant\widetilde{\eta} and x[E1,E2]x\in[E_{1},E_{2}]. Using mΔ=mNmm^{\Delta}=m_{N}-m and recalling (7.36), we therefore get

for yη~y\leqslant\widetilde{\eta} and x[E1,E2]x\in[E_{1},E_{2}]. The second term of (7.33) is therefore bounded by

In order to estimate the third term on the right-hand side of (7.33), we integrate by parts, first in xx and then in yy, to obtain the bound

The second term of (7.39) is similar to the first term on the right-hand side of (7.33), and is easily seen to be bounded by 1/M1/M as in (7.35).

In order to bound the first and third terms of (7.39), we estimate, for any yη~y\leqslant\widetilde{\eta},

where we also used that η~M1\widetilde{\eta}\geqslant M^{-1}. Using (7.41) for y=η~y=\widetilde{\eta}, we may now estimate the first term of (7.39) by η~\widetilde{\eta}.

What remains is the third term of (7.39), which can be estimated, using (7.34), by

Similarly, since ϱ\varrho has a bounded density, we find

Together with (7.42) and recalling η~M1\widetilde{\eta}\geqslant M^{-1}, we therefore get (7.4). This concludes the proof of Lemma 7.1.

Bulk universality

Universality is understood in the sense of the weak limit, as NN\to\infty for fixed E<2|E|<2, of f~N(n)(E,x)\widetilde{f}_{N}^{(n)}(E,{\bf x}) in the variables x{\bf x}.

The general approach developed in ESY4 ; ESYY ; EYY to prove the universality of the local eigenvalue statistics in the bulk spectrum of a general Wigner-type matrix consists of three steps.

A rigidity estimate on the locations of the eigenvalues, in the sense of a quadratic mean.

The spectral universality for matrices with a small Gaussian component, via local ergodicity of the Dyson Brownian motion (DBM).

A perturbation argument that removes the small Gaussian component by comparing Green functions.

In this paper we do not give the details of steps (ii) and (iii), since they have been concisely presented elsewhere, e.g. in EYBull . Here we only summarize the results and the key arguments of steps (ii) and (iii) for the general class of matrices we consider. In this section we assume that HH is either real symmetric or complex Hermitian. The former case means that the entries of HH are real. The latter means, loosely, that its off-diagonal entries have a nontrivial imaginary part. More precisely, in the complex Hermitian case we shall replace the lower bound on the variances sijs_{ij} from Definition 3.1 with the following, stronger, condition.

We call the Hermitian matrix HH a complex aa-full Wigner matrix if for each i,ji,j the 2×22\times 2 covariance matrix

as a symmetric matrix. Note that this condition implies that HH is aa-full, but the converse is not true.

We consider a stochastic flow of Wigner-type matrices generated by the Ornstein-Uhlenbeck equation

with some given initial matrix H0H_{0}. Here BB is an N×NN\times N matrix-valued standard Brownian motion with the same symmetry type as HH. The resulting dynamics on the level of the eigenvalues is Dyson Brownian motion (DBM). It is well known that HtH_{t} has the same distribution as the matrix

where UU is an independent standard Gaussian Wigner matrix of the same symmetry class as HH. In particular, HtH_{t} converges to UU as tt\to\infty. The eigenvalue distribution converges to the Gaussian equilibrium measure, whose density is explicitly given by

here β=1\beta=1 for the real symmetric case (GOE) and β=2\beta=2 for the complex Hermitian case (GUE).

The matrix S(t)S^{(t)} of variances of HtH_{t} is given by

where S(0)S^{(0)} is the matrix of variances of H0H_{0}. It is easy to see that the gaps δ±(t)\delta_{\pm}(t) of S(t)S^{(t)} satisfy δ±(t)δ±(0)\delta_{\pm}(t)\geqslant\delta_{\pm}(0); therefore the corresponding parameters (2.11) satisfy Γ~t(z)Γ~0(z)\widetilde{\Gamma}_{t}(z)\leqslant\widetilde{\Gamma}_{0}(z). Since all estimates behind our main theorems in Sections 2 and 7 improve if δ±\delta_{\pm} increase, it is immediate that all results in these sections hold for HtH_{t} provided they hold for H0H_{0}.

The key quantity to be controlled when establishing bulk universality is the mean quadratic distance of the eigenvalues from their classical locations,

for any ε>0\varepsilon>0 and NN0(ε)N\geqslant N_{0}(\varepsilon). Here we used that the estimate from Corollary 7.7 is uniform in tt, by the remark in the previous paragraph.

We modify the original DBM by adding a local relaxation term of the form 12τi(λiγi)2\frac{1}{2\tau}\sum_{i}(\lambda_{i}-\gamma_{i})^{2} to the original Hamiltonian H{\mathcal{H}}, which has the effect of artificially speeding up the relaxation of the dynamics. Here τ1\tau\ll 1 is a small parameter, the relaxation time of the modified dynamics. We choose τ\vbox..=N1+4εQ\tau\mathrel{\vbox{\hbox{.}\hbox{.}}}=N^{1+4\varepsilon}Q for some ε>0\varepsilon>0. As Theorem 4.1 of ESYY (see also Theorem 2.2 of EYBull ) shows, the local statistics of the eigenvalue gaps of HtH_{t} and GUE/GOE coincide if tNετ=N1+4εQt\geqslant N^{\varepsilon}\tau=N^{1+4\varepsilon}Q, i.e. if

The local statistics are averaged over N1εN^{1-\varepsilon} consecutive eigenvalues or, alternatively, in the energy parameter EE over an interval of length NεN^{-\varepsilon}.

To complete the programme (i)–(iii), we need to compare the local statistics of the original ensemble HH and HtH_{t}, i.e. perform step (iii). We first recall the Green function comparison theorem from EYY for the case MNM\asymp N (generalized Wigner). The result states, roughly, that expectations of Green functions with spectral parameter zz satisfying ImzN1ε\operatorname{Im}z\geqslant N^{-1-\varepsilon} are determined by the first four moments of the single-entry distributions. Therefore the local eigenvalue statistics on a very small scale, η=N1ε\eta=N^{-1-\varepsilon}, of two Wigner ensembles are indistinguishable if the first four moments of their matrix entries match. More precisely, for the local nn-point correlation functions (8.1) to match, one needs to compare expectations of nn-th order monomials of the form

where the energies EkE_{k} are chosen in the bulk spectrum with EkEk=O(1/N)E_{k}-E_{k^{\prime}}=O(1/N). (Recall that mN(z)=1NTrG(z)m_{N}(z)=\frac{1}{N}\operatorname{Tr}G(z).)

The Green function comparison method therefore has two main ingredients. First, a high probability apriori estimate is needed on the resolvent entries at any spectral parameter zz with imaginary part η\eta slightly below 1/N1/N:

for any small ε>0\varepsilon>0. Clearly, the same estimate also holds for mN(E+iη)m_{N}(E+i\eta). The bound (8.7) is typically obtained from the local semicircle law for the resolvent entries, (2.18). Although the local semicircle law is effective only for Imz1/N\operatorname{Im}z\gg 1/N, it still gives an almost optimal bound for a somewhat smaller η\eta by using the trivial estimate

with the choice of η=N1+ε\eta^{\prime}=N^{-1+\varepsilon}. The proof of (8.8) follows from a simple dyadic decomposition; see the proof of Theorem 2.3 in Section 8 of EYY for details.

The second ingredient is the construction of an initial ensemble H0H_{0} whose time evolution HtH_{t} for some t1t\leqslant 1 satisfying (8.5) is close to HH; here closeness is measured by the matching of moments of the matrix entries between the ensembles HH and HtH_{t}. We shall choose H0H_{0}, with variance matrix S(0)S^{(0)}, so that the second moments of HH and HtH_{t} match,

and the third and fourth moments are close. We remark that the matching of higher moments was introduced in the work of TV , while the idea of approximating a general matrix ensemble by an appropriate Gussian one appeared earlier in EPRSY . They have to be so close that even after multiplication with at most five resolvent entries and summing up for all i,ji,j indices, their difference is still small. (Five resolvent entries appear in the fourth order of the resolvent expansion of GG.) Thus, given (8.7), we require that

to ensure that the expectations of the nn-fold product in (8.6) are close. This formulation holds for the real symmetric case; in the complex Hermitian case all moments of order s=3,4s=3,4 involving the real and imaginary parts of hijh_{ij} have to be approximated. To simplify notation, we work with the real symmetric case in the sequel.

The matching can be done in two steps. In the first we construct a matrix of variances S(0)S^{(0)} such that (8.9) holds. This first step is possible if, given SS associated with HH, (8.9) can be satisfied for a doubly stochastic S(0)S^{(0)}, i.e. if HH is an aa-full Wigner matrix and

with some large constant CC. For the complex Hermitian case, the condition (8.11) is the same but HH has to be complex aa-full Wigner matrix (see Definition 8.1).

In the second step of moment matching, we use Lemma 3.4 of EYY2 to construct an ensemble H0H_{0} with variances S(0)S^{(0)}, such that the entries of HH and HtH_{t} satisfy

Suppose that HH is bb-flat, i.e. that sijb/Ns_{ij}\leqslant b/N. Then this condition holds provided

The argument so far assumed that MNM\asymp N (HH is a generalized Wigner matrix), in which case Gij(E+iη)G_{ij}(E+i\eta^{\prime}) remains essentially bounded down to the scale η1/N\eta^{\prime}\approx 1/N. If MNM\ll N, then (2.18) provides control only down to scale η1/M\eta^{\prime}\gg 1/M and (8.8) gives only the weaker bound

for any η1/M\eta\leqslant 1/M, which replaces (8.7). Using this weaker bound, the condition (8.12) is replaced with

are close, where the smeared out observable OηO_{\eta} on scale η\eta is defined through

To conclude the result for observables with OO instead of OηO_{\eta} in (8.15), we need to estimate, for both ensembles, the difference

Due to the smoothness of OO, we can decompose OOη=Q1+Q2O-O_{\eta}=Q_{1}+Q_{2}, where

with an arbitratry parameter KN/MK\gg N/M. Here the constants depend on OO. The contribution from Q1Q_{1} to (8.16) can thus be estimated by

where we used that the expected number of eigenvalues in the interval [EK/N,E+K/N][E-K/N,E+K/N] is O(K)O_{\prec}(K), since (8.13) guarantees that the density is bounded on scales larger than 1/M1/M. The contribution from Q2Q_{2} to (8.16) is estimated by

In the last step we used (8.13) to estimate

Optimizing the choice of KK and η\eta, (8.14) becomes

Summarizing the conditions (8.5), (8.11), and (8.19), we require that

in order to have bulk universality. We have therefore proved the following result.

Suppose that HH is N/MN/M-flat and aa-full (in the real symmetric case) or complex aa-full (in the complex Hermitian case). Suppose moreover that (7.11) and (7.22) hold with some positive control parameters X,YCX,Y\leqslant C. Fix an arbitrary positive parameter ε>0\varepsilon>0. Then the local nn-point correlation functions of HH, averaged over the energy parameter in an interval of size NεN^{-\varepsilon} around E<2|E|<2 (see (8.2)), coincide with those of GOE or GUE provided that

In particular, if N3/4MNN^{3/4}\leqslant M\leqslant N then (7.11) and (7.22) hold with XX and YY defined in (7.12) and (7.23).

We conclude with a few examples illustrating Theorem 8.2.

Fix an integer n2n\geqslant 2. There exists a positive number p(n)cn3p(n)\geqslant cn^{-3} with the following property. Suppose that HH satisfies any of the following conditions for some sufficiently small ξ>0\xi>0.

cN1ξsijCN1+p(n)ξcN^{-1-\xi}\leqslant s_{ij}\leqslant CN^{-1+p(n)-\xi}.

cN98+ξsijCN1cN^{-\frac{9}{8}+\xi}\leqslant s_{ij}\leqslant CN^{-1}.

HH is a one-dimensional band matrix with band width WW with a mean-field component of size ν\nu (see Definition 3.3) such that WN1p(n)+ξW\geqslant N^{1-p(n)+\xi} and νN15+ξW16\nu\geqslant N^{15+\xi}W^{-16}.

Then there exists an ε>0\varepsilon>0 (depending on ξ\xi and nn) such that the local nn-point correlation functions of HH, averaged over the energy parameter in an interval of size NεN^{-\varepsilon} around E<2|E|<2, coincide with those of GOE or GUE (depending on the symmetry class of HH).

We remark that the conditions for the upper bound on sijs_{ij} in parts (i) and (iii) are similar. But the band structure in (iii) allows one to choose a much smaller mean-field component than in (i).

In Case (i), we have a=cNξa=cN^{-\xi} and b=N/Mb=N/M in Definition 3.1; hence δ±cNξ\delta_{\pm}\geqslant cN^{-\xi} by Proposition A.3. Therefore Y=M1N7ξ/2Y=M^{-1}N^{-7\xi/2} and X=N2M8/3X=N^{2}M^{-8/3} from (7.12) and (7.23), so that (8.20) reads

By Theorem 8.2 bulk universality therefore holds provided that MN1p(n)+ξM\geqslant N^{1-p(n)+\xi} with any sufficiently small positive ξ>0\xi>0 (and ε\varepsilon chosen appropriately, depending on ξ\xi and nn). The function p(n)p(n) can be easily computed.

We remark that if we additionally assume that hijh_{ij} has a symmetric law with subexponential decay (7.14), then by Theorem 7.4 we can use the improved control parameter X=M2/3X=M^{-2/3}. This yields a better threshold p(n)p(n). For example, for n=2n=2 we obtain p(n)=134p(n)=\frac{1}{34}.

In Case (ii) we take M=NM=N, i.e. b=cb=c and δ+a=N1/8+ξ\delta_{+}\geqslant a=N^{-1/8+\xi}. Then with the choice (7.12) and (7.23) we have YCN1δ+7/2Y\leqslant CN^{-1}\delta_{+}^{-7/2}, XCN2/3+CN2(δ++N1/7)12X\leqslant CN^{-2/3}+CN^{-2}(\delta_{+}+N^{-1/7})^{-12}, so that (8.20) reads

which holds since δ+aN1/8\delta_{+}\geqslant a\geqslant N^{-1/8}.

Finally, in Case (iii) we have WMW\asymp M, b=N/Mb=N/M, a=νa=\nu, δ+cν+c(M/N)2\delta_{+}\geqslant c\nu+c(M/N)^{2}, and δc\delta_{-}\geqslant c. Since MN22/23M\geqslant N^{22/23} we have δ+cM1/5\delta_{+}\geqslant cM^{-1/5}, Thus, with the choice (7.12) and (7.23), we have

with some positive p(n)p(n), which concludes the proof. ∎

Appendix A Behaviour of ΓΓ\Gamma and Γ~~Γ\widetilde{\Gamma}

In this section we give basic bounds on the parameters Γ\Gamma and Γ~\widetilde{\Gamma}. As it turns out, their behaviour is intimately linked with the spectrum of SS, more precisely with its spectral gaps. Recall that the spectrum of SS lies in $,with, with1$ being a simple eigenvalue.

In the presence of a gap δ\delta_{-} we may improve the upper bound to

For Γ~\widetilde{\Gamma} we have the bounds

for some universal constant c>0c>0, where in the last step we used Lemma 4.3.

The estimate (A.2) follows similarly. Due to the gap δ\delta_{-} in the spectrum of SS, we may replace the estimate (A.4) with

The lower bound of (A.3) was proved in (4.5). The upper bound is proved similarly to (A.2), except that (A.6) is replaced with

The following proposition gives the behaviour of the spectral gaps δ±\delta_{\pm} for the example matrices from Section 3.

If HH is an aa-full Wigner matrix then δa\delta_{-}\geqslant a and δ+a\delta_{+}\geqslant a.

If HH is a band matrix there is a positive constant cc, depending on the dimension dd and the profile function ff, such that δc\delta_{-}\geqslant c and δ+c(W/L)2\delta_{+}\geqslant c(W/L)^{2}.

If H=1νHB+νHWH=\sqrt{1-\nu}H_{B}+\sqrt{\nu}H_{W}, where HBH_{B} is a band matrix, HWH_{W} is an aa-full Wigner matrix independent of HBH_{B}, and ν\nu\in (see Definition 3.3), then there is a constant cc depending only on the dimension dd and the profile function ff of HBH_{B}, such that δc\delta_{-}\geqslant c and δ+c(W/L)2+νa\delta_{+}\geqslant c(W/L)^{2}+\nu a.

For the case where HH is an aa-full Wigner matrix, the claim easily follows by splitting

By assumption, the first term is (1a)(1-a) times a doubly stochastic matrix. Hence its spectrum lies in [1+a,1a][-1+a,1-a]. The claims on δ±\delta_{\pm} now follow easily.

The claims about band matrices were proved in Lemma A.1 of EYY and Equation (5.16) of EKYY3 , respectively. Finally, (iii) easily follows from (i) and (ii). ∎

Appendix B Proof of Theorems 4.6 and 4.7

Theorems 4.6 and 4.7 are essentially simple special cases of the much more involved, and general, fluctuation averaging estimate from EKYfluc . Nevertheless, here we give the details of the proofs because (a) they do not strictly follow from the formulation of the result in EKYfluc , and (b) their proof is much easier than that of EKYfluc , so that the reader only interested in the applications of fluctuation averaging to the local semicircle law need not read the lengthy proof of EKYfluc . We start with a simple lemma which summarizes the key properties of \prec when combined with expectation.

uniformly in ii. If X=X(u)X=X(u) and Ψ=Ψ(u)\Psi=\Psi(u) depend on some parameter uu and the above assumptions are uniform in uu, then so are the conclusions.

It is enough to consider the case n=1n=1; the case of larger nn follows immediately from the case n=1n=1, using the basic properties of \prec from Lemma 4.4.

For the first claim, pick ε>0\varepsilon>0. Then

for arbitrary D>0D>0. The first claim therefore follows by choosing DD large enough.

The second claim follows from Chebyshev’s inequality, using a high-moment estimate combined with Jensen’s inequality for partial expectation. We omit the details, which are similar to those of the first claim. ∎

We shall apply Lemma B.1 to resolvent entries of GG. In order to verify its assumptions, we record the following bounds.

for any ε>0\varepsilon>0 and NN0(n,ε)N\geqslant N_{0}(n,\varepsilon).

where in the last step we used that \bigl{\lvert}G_{xy}^{(k)}\bigr{\rvert}\prec\Psi_{o} as follows from (B.3), and the bound 1/Gkk11/\lvert G_{kk}\rvert\prec 1 (recall that ΛΨMc\Lambda\prec\Psi\leqslant M^{-c}). This concludes the proof of (B.5).

Abbreviate Xk\vbox..=Qk(Gkk)1X_{k}\mathrel{\vbox{\hbox{.}\hbox{.}}}=Q_{k}(G_{kk})^{-1}. We shall estimate ktikXk\sum_{k}t_{ik}X_{k} in probability by estimating its pp-th moment by Ψo2p\Psi_{o}^{2p}, from which the claim will easily follow using Chebyshev’s inequality. Before embarking on the estimate for arbitrary pp, we illustrate its idea by estimating the variance

Using Lemma B.1 and the bounds (4.9) on tikt_{ik}, we find that the first term on the right-hand side of (B.6) is O(M1Ψo2)=O(Ψo4)O_{\prec}(M^{-1}\Psi_{o}^{2})=O_{\prec}(\Psi_{o}^{4}), where we used the estimate (4.8). Let us therefore focus on the second term of (B.6). Using the fact that klk\neq l, we apply (4.6) to XkX_{k} and XlX_{l} to get

After this pedagogical interlude we move on to the full proof. Fix some even integer pp and write

The idea behind this splitting is to use (4.6) on one entry of AA; the first term on the right-hand side of (4.6) gives rise to w0(A)w_{0}(A) and the second to w1(A)w_{1}(A). The precise definition of the algorithm applied to AAA\in\mathcal{A} is as follows.

If all factors of AA are maximally expanded or d(A)p+1d(A)\geqslant p+1 then stop the expansion of AA. In other words, the algorithm cannot be applied to AA in the future.

and so on, at each iteration performing the steps (1) and (2) on each new monomial independently of the others. Note that the lower indices are binary sequences that describe the recursive application of the operations w0w_{0} and w1w_{1}. In this manner we generate a binary tree whose vertices are given by finite binary strings σ\sigma. The associated monomials satisfy Aσir\vbox..=wi(Aσr)A_{\sigma i}^{r}\mathrel{\vbox{\hbox{.}\hbox{.}}}=w_{i}(A_{\sigma}^{r}) for i=0,1i=0,1, where σi\sigma i denotes the binary string obtained by appending ii to the right end of σ\sigma. See Figure B.1 for an illustration of the tree.

We stop the recursion of a tree vertex whenever the associated monomial satisfies the stopping rule of step (1). In other words, the set of leaves of the tree is the set of binary strings σ\sigma such that either all factors of AσrA^{r}_{\sigma} are maximally expanded or d(Aσr)p+1d(A^{r}_{\sigma})\geqslant p+1. We claim that the resulting binary tree is finite, i.e. that the algorithm always reaches step (1) after a finite number of iterations. Indeed, by the stopping rule in (1), we have d(Aσr)p+1d(A^{r}_{\sigma})\leqslant p+1 for any vertex σ\sigma of the tree. Since each application of w1w_{1} increases d()d(\cdot) by at least one, and in the first step (i.e. when applied to ArA^{r}) by two, we conclude that the number of ones in any σ\sigma is at most pp. Since each application of w1w_{1} increases the number of resolvent entries by at most four, and the application of w0w_{0} does not change this number, we find that the number of resolvent entries in AσrA^{r}_{\sigma} is bounded by 4p+14p+1. Hence the maximal number of upper indices in AσrA^{r}_{\sigma} for any tree vertex σ\sigma is (4p+1)p(4p+1)p. Since each application of w0w_{0} increases the total number of upper indices by one, we find that σ\sigma contains at most (4p+1)p(4p+1)p zeros. We conclude that the maximal length of the string σ\sigma (i.e. the depth of the tree) is at most (4p+1)p+p=4p2+2p(4p+1)p+p=4p^{2}+2p. A string σ\sigma encoding a tree vertex contains at most pp ones. Denoting by kk the number of ones in a string encoding a leaf of the tree, we find that the number of leaves is bounded by k=0p(4p2+2pk)(Cp2)p\sum_{k=0}^{p}\binom{4p^{2}+2p}{k}\leqslant(Cp^{2})^{p}. Therefore, denoting by Lr\mathcal{L}_{r} the set of leaves of the binary tree generated from ArA^{r}, we have Lr(Cp2)p\lvert\mathcal{L}_{r}\rvert\leqslant(Cp^{2})^{p}.

By definition of the tree and w0w_{0} and w1w_{1}, we have the decomposition

Moreover, each monomial AσrA_{\sigma}^{r} for σLr\sigma\in\mathcal{L}_{r} either consists entirely of maximally expanded resolvent entries or satisfies d(Aσr)=p+1d(A_{\sigma}^{r})=p+1. (This is an immediate consequence of the stopping rule in (1)).

Next, we observe that for any string σ\sigma we have

where b(σ)b(\sigma) is the number ones in the string σ\sigma. Indeed, if b(σ)=0b(\sigma)=0 then this follows from (B.5); if b(σ)1b(\sigma)\geqslant 1 this follows from the last statement in (B.9) and (B.3).

Using (B.8) and (B.12) we have the representation

We now claim that any nonzero term on the right-hand side of (B.14) satisfies

Before embarking on the proof, we explain its idea. By (B.13), the naive size of the left-hand side of (B.15) is Ψop\Psi_{o}^{p}. The key observation is that each lone label sLs\in L yields one extra factor Ψo\Psi_{o} to the estimate. This is because the expectation in (B.14) would vanish if all other factors \bigl{(}{Q_{k_{r}}A_{\sigma_{r}}^{r}}\bigr{)}, rsr\neq s, were independent of ksk_{s}. The expansion of the binary tree makes this dependence explicit by exhibiting ksk_{s} as a lower index. But this requires performing an operation w1w_{1} with the choice u=ksu=k_{s} in (B.10) or (B.11). However, w1w_{1} increases the number of off-diagonal element by at least one. In other words, every index associated with a lone label must have a “partner” index in a different resolvent entry which arose by application of w1w_{1}. Such a partner index may only be obtained through the creation of at least one off-diagonal resolvent entry. The actual proof below shows that this effect applies cumulatively for all lone labels.

In order to prove (B.15), we consider two cases. Consider first the case where for some r=1,,pr=1,\dots,p the monomial AσrrA_{\sigma_{r}}^{r} on the left-hand side of (B.15) is not maximally expanded. Then d(Aσrr)=p+1d(A_{\sigma_{r}}^{r})=p+1, so that (B.3) yields AσrrΨop+1A_{\sigma_{r}}^{r}\prec\Psi_{o}^{p+1}. Therefore the observation that AσssΨoA_{\sigma_{s}}^{s}\prec\Psi_{o} for all srs\neq r, together with (B.2) implies that the left-hand side of (B.15) is O_{\prec}\bigl{(}{\Psi_{o}^{2p}}\bigr{)}. Since Lp\lvert L\rvert\leqslant p, (B.15) follows.

Consider now the case where AσrrA_{\sigma_{r}}^{r} on the left-hand side of (B.15) is maximally expanded for all r=1,,pr=1,\dots,p. The key observation is the following claim about the left-hand side of (B.15) with a nonzero expectation.

For each sLs\in L there exists r=τ(s){1,,p}{s}r=\tau(s)\in\{1,\dots,p\}\setminus\{s\} such that the monomial AσrrA_{\sigma_{r}}^{r} contains a resolvent entry with lower index ksk_{s}.

In other words, after expansion, the lone label ss has a “partner” label r=τ(s)r=\tau(s), such that the index ksk_{s} appears also in the expansion of ArA^{r} (note that there may be several such partner labels rr). To prove ()(*), suppose by contradiction that there exists an sLs\in L such that for all r{1,,p}{s}r\in\{1,\dots,p\}\setminus\{s\} the lower index ksk_{s} does not appear in the monomial AσrrA_{\sigma_{r}}^{r}. To simplify notation, we assume that s=1s=1. Then, for all r=2,,pr=2,\dots,p, since AσrrA_{\sigma_{r}}^{r} is maximally expanded, we find that AσrrA_{\sigma_{r}}^{r} is independent of k1k_{1} (see Definition 4.2). Therefore we have

Summing over the binary trees in (B.14) and using Lemma B.1, we get from (B.15)

We now return to the sum (B.8). We perform the summation by first fixing PPpP\in\mathfrak{P}_{p}, with associated lone labels L=L(P)L=L(P). We find

in the first step we used (4.9) and the fact that the summation is performed over P\lvert P\rvert free indices, the remaining pPp-\lvert P\rvert being estimated by M1M^{-1}; in the second step we used that each block of PP that is not contained in LL consists of at least two labels, so that pP(pL)/2p-\lvert P\rvert\geqslant(p-\lvert L\rvert)/2. From (B.8) and (B.17) we get

where in the last step we used the lower bound from (4.8) and estimated the summation over Pp\mathfrak{P}_{p} with a constant CpC_{p} (which is bounded by (Cp2)p(Cp^{2})^{p}). Summarizing, we have proved that

We conclude the proof of Theorem 4.7 with a simple application of Chebyshev’s inequality. Fix ε>0\varepsilon>0 and D>0D>0. Using (B.18) and Chebyshev’s inequality we find

for large enough NN0(ε,p)N\geqslant N_{0}(\varepsilon,p). Choosing pε1(1+D)p\geqslant\varepsilon^{-1}(1+D) concludes the proof of Theorem 4.7. ∎

The identity (4.6) is the only identity about the entries of GG that is needed in the proof of Theorem 4.7. In particular, (4.7) is never used, and the actual entries of HH never appear in the argument.

The first estimate of (4.11) follows from Theorem 4.7 and the simple bound ΛoΛΨ\Lambda_{o}\leqslant\Lambda\prec\Psi. The second estimate of (4.11) may be proved by following the proof of Theorem 4.7 verbatim; the only modification is the bound

which replaces (B.5). Here we again use the same upper bound Ψo=Ψ\Psi_{o}=\Psi for Λ\Lambda and Λo\Lambda_{o}.

In order to prove (4.12), we write Schur’s complement formula (5.6) using (2.8) as

Since hiiM1/2Ψ\lvert h_{ii}\rvert\prec M^{-1/2}\leqslant\Psi and 1/Gii1/mΨ\lvert 1/G_{ii}-1/m\rvert\prec\Psi, we find that the term in parentheses is stochastically dominated by Ψ\Psi. Therefore we get, inverting (B.19) and expanding the right-hand side, that

Taking the partial expectation PiP_{i} yields

where in the second step we used (4.6), (2.2), and (B.3). Therefore we get, using (4.11) and QiGii=Qi(Giim)=QiviQ_{i}G_{ii}=Q_{i}(G_{ii}-m)=Q_{i}v_{i},

and apply the above argument to each term separately. This yields

We conclude this section with an alternative proof of Theorem 4.7. While the underlying argument remains similar, the following proof makes use of an additional decomposition of the space of random variables, which avoids the use of the stopping rule from Step (1) in the above proof of Theorem 4.7. This decomposition may be regarded as an abstract reformulation of the stopping rule.

As before, we set Xk\vbox..=Qk(Gkk)1X_{k}\mathrel{\vbox{\hbox{.}\hbox{.}}}=Q_{k}(G_{kk})^{-1}. For simplicity of presentation, we set tik=N1t_{ik}=N^{-1}. The decomposition is defined using the operations PiP_{i} and QiQ_{i}, introduced in Definition 4.2. It is immediate that PiP_{i} and QiQ_{i} are projections, that Pi+Qi=1P_{i}+Q_{i}=1, and that all of these projections commute with each other. For a set A{1,,N}A\subset\{1,\dots,N\} we use the notations PA\vbox..=iAPiP_{A}\mathrel{\vbox{\hbox{.}\hbox{.}}}=\prod_{i\in A}P_{i} and QA\vbox..=iAQiQ_{A}\mathrel{\vbox{\hbox{.}\hbox{.}}}=\prod_{i\in A}Q_{i}.

Let pp be even and introduce the shorthand X~ks\vbox..=Xks\widetilde{X}_{k_{s}}\mathrel{\vbox{\hbox{.}\hbox{.}}}=X_{k_{s}} for sp/2s\leqslant p/2 and X~ks\vbox..=X ⁣ks\widetilde{X}_{k_{s}}\mathrel{\vbox{\hbox{.}\hbox{.}}}=\overline{X}\!\,_{k_{s}} for s>p/2s>p/2. Then we get

Next, by definition of X~ks\widetilde{X}_{k_{s}}, we have that X~ks=QksX~ks\widetilde{X}_{k_{s}}=Q_{k_{s}}\widetilde{X}_{k_{s}}, which implies that PAscX~ks=0P_{A^{c}_{s}}\widetilde{X}_{k_{s}}=0 if ksAsk_{s}\notin A_{s}. Hence may restrict the summation to AsA_{s} satisfying

for all ss. Moreover, we claim that the right-hand side of (B.21) vanishes unless

for all ss. Indeed, suppose that ksqsAqck_{s}\in\bigcap_{q\neq s}A_{q}^{c} for some ss, say s=1s=1. In this case, for each s=2,,ps=2,\dots,p, the factor PAscQAsX~ksP_{A_{s}^{c}}Q_{A_{s}}\widetilde{X}_{k_{s}} is independent of k1k_{1} (see Definition 4.2). Thus we get

We conclude that the summation on the right-hand side of (B.21) is restricted to indices satisfying (B.22) and (B.23). Under these two conditions we have

since each index ksk_{s} must belong to at least two different sets AqA_{q}: to AsA_{s} (by (B.22)) as well as to some AqA_{q} with qsq\neq s (by (B.23)).

(Note that if we were doing the case Xk=QkGkkX_{k}=Q_{k}G_{kk} instead of Xk=Qk(Gkk)1X_{k}=Q_{k}(G_{kk})^{-1}, then (B.25) would have to be weakened to QAXkΨA\lvert Q_{A}X_{k}\rvert\prec\Psi^{\lvert A\rvert}, in accordance with (4.11). Indeed, in that case and for A={k}A=\{k\}, we only have the bound QkGkkΨ\lvert Q_{k}G_{kk}\rvert\prec\Psi and not QkGkkΨo\lvert Q_{k}G_{kk}\rvert\prec\Psi_{o}.)

Before proving (B.25), we show it may be used to complete the proof. Using (B.21), (B.25), and Lemma B.1, we find

where in the first step we estimated the summation over the sets A1,,ApA_{1},\dots,A_{p} by a combinatorial factor CpC_{p} depending on pp, in the forth step we used the elementary inequality anbm(a+b)n+ma^{n}b^{m}\leqslant(a+b)^{n+m} for positive a,ba,b, and in the last step we used (4.8) and the bound MNM\leqslant N. Thus we have proved (B.18), from which the claim follows exactly as in the first proof of Theorem 4.7.

What remains is the proof of (B.25). The case A=1\lvert A\rvert=1 (corresponding to A={k}A=\{k\}) follows from (B.5), exactly as in the first proof of Theorem 4.7. To simplify notation, for the case A2\lvert A\rvert\geqslant 2 we assume that k=1k=1 and A={1,2,,t}A=\{1,2,\dots,t\} with t2t\geqslant 2. It suffices to prove that

where the first term vanishes since G11(2)G_{11}^{(2)} is independent of 22 (see Definition 4.2). We now consider

and apply (4.6) with k=3k=3 to each resolvent entry on the right-hand side, and multiply everything out. The result is a sum of fractions of entries of GG, whereby all entries in the numerator are diagonal and all entries in the denominator are diagonal. The leading order term vanishes,

so that the surviving terms have at least three (off-diagonal) resolvent entries in the numerator. We may now continue in this manner; at each step the number of (off-diagonal) resolvent entries in the numerator increases by at least one.

More formally, we obtain a sequence A2,A3,,AtA_{2},A_{3},\dots,A_{t}, where A2\vbox..=Q2G12G21G11G11(2)G22A_{2}\mathrel{\vbox{\hbox{.}\hbox{.}}}=Q_{2}\frac{G_{12}G_{21}}{G_{11}G_{11}^{(2)}G_{22}} and AiA_{i} is obtained by applying (4.6) with k=ik=i to each entry of QiAi1Q_{i}A_{i-1}, and keeping only the nonvanishing terms. The following properties are easy to check by induction.

AiA_{i} consists of the projection Q2QiQ_{2}\cdots Q_{i} applied to a sum of fractions such that all entries in the numerator are diagonal and all entries in the denominator are diagonal.

The number of (off-diagonal) entries in the numerator of each term of AiA_{i} is at least ii.

By Lemma B.1 combined with (ii) and (iii) we conclude that AiΨoi\lvert A_{i}\rvert\prec\Psi_{o}^{i}. From (i) we therefore get

This is (B.26). Hence the proof is complete. ∎

Appendix C Large deviation bounds

We consider random variables XX satisfying

If the coefficients aij(N)a_{ij}^{(N)} and bi(N)b_{i}^{(N)} depend on an additional parameter uu, then all of these estimates are uniform in uu (see Definition 2.1), i.e. the threshold N0=N0(ε,D)N_{0}=N_{0}(\varepsilon,D) in the definition of \prec depends only on the family μp\mu_{p} from (C.1) and δ\delta from (2.4); in particular, N0N_{0} does not depend on uu.

The estimates (C.2), (C.3), and (C.4) follow from Lemmas B.2, B.3, and B.4 of EKYY3 , combined with Chebyshev’s inequality. ∎

References