Isotropic Local Laws for Sample Covariance and Generalized Wigner Matrices

Alex Bloemendal, Laszlo Erdos, Antti Knowles, Horng-Tzer Yau, Jun Yin

Introduction

In recent years there has been substantial progress in understanding the local versions of the semicircle and the Marchenko-Pastur laws (see EKYY4 ; ECDM for an overview and detailed references). This research was originally motivated by the Wigner-Dyson-Mehta universality conjecture for the local spectral statistics of random matrices. The celebrated sine kernel universality and related results for other symmetry classes concern higher-order correlation functions, and not just the eigenvalue density. Moreover, they pertain to scales of order 1/N1/N, smaller than the scales on which local laws hold. Nevertheless, local laws (with precise error bounds) are essential ingredients for proving universality. In particular, one of their consequences, the precise localization of the eigenvalues (called rigidity bounds), has played a fundamental role in the relaxation flow analysis of the Dyson Brownian Motion, which has led to the proof of the Wigner-Dyson-Mehta universality conjecture for all symmetry classes ESY4 ; ESYY .

The basic approach behind the proofs of local laws is the analysis of a self-consistent equation for the Stieltjes transform, a scalar equation which controls the trace of the resolvent (and hence the empirical eigenvalue density). A vector self-consistent equation for the diagonal resolvent matrix entries, [(Hz)1]ii[(H-z)^{-1}]_{ii}, was introduced in EYY1 . Later, a matrix self-consistent equation was derived in EKYY3 . Such self-consistent equations provide entrywise control of the resolvent and not only its trace. This latter fact has proved a key ingredient in the Green function comparison method (introduced in EYY1 and extended to the spectral edge in EYY3 ), which allows the comparison of local statistics via moment matching even below the scale of eigenvalue spacing.

The isotropic local laws established in this paper serve as a key input in establishing detailed results about the eigenvalues and eigenvectors of deformed matrix models. These include:

A complete picture of the distribution of outlier eigenvalues/eigenvectors, as well as the non-outlier eigenvalues/eigenvectors near the spectral edge.

An investigation of the BBP transition using that, thanks to the optimality of the high-probability bounds in the local laws, the results of (a) extend even to the case when some eigenvalues of AA are very close to the critical value.

This programme for the eigenvalues of deformed Wigner matrices was carried out in KY2 ; KY3 . In KYY , we carry it out for the eigenvectors of covariance matrices.

In this paper we prove the isotropic Marchenko-Pastur law for sample covariance matrices as well as the isotropic semicircle law for generalized Wigner matrices. Our proofs are based on a novel method, which is considerably more robust than that of KY2 . Both proofs (the one from KY2 and the one presented here) crucially rely on the entrywise local law as input, but follow completely different approaches to obtain the isotropic law from the entrywise one. The basic idea of the proof in KY2 is to use the Green function comparison method to compare the resolvent of a given Wigner matrix to the resolvent of a Gaussian random matrix, for which the isotropic law is a trivial corollary of the entrywise one (by basis transformation). Owing to various moment matching conditions imposed by the Green function comparison, the result of KY2 required the variances of all matrix entries to coincide and, for results in the bulk spectrum, the third moments to vanish. In contrast, our current approach does not rely on Green function comparison. Instead, it consists of a precise analysis of the cancellation of fluctuations in Green functions. We use a graphical expansion method inspired by techniques recently developed in EKY2 to control fluctuations in Green functions of random band matrices.

Our first main result is the isotropic local Marchenko-Pastur law for sample covariance matrices H=XXH=X^{*}X, where XX is an M×NM\times N matrix. We allow the dimensions of XX to differ wildly: we only assume that logNlogM\log N\asymp\log M. In particular, the aspect ratio ϕ=M/N\phi=M/N – a key parameter in the Marchenko-Pastur law – may scale as a power of NN. Our entrywise law (required as input for the proof of the isotropic law) is a generalization of the one given in PY . In addition to generalizing the proof of PY , we simplify and streamline it, so as to obtain a short and self-contained proof.

Our second main result is the isotropic local semicircle law for generalized Wigner matrices. This extends the isotropic law of KY2 from Wigner matrices to generalized Wigner matrices, in which the variances of the matrix entries need not coincide. It also dispenses with the third moment assumption of KY2 mentioned previously. In fact, our proof applies to even more general matrix models, provided that an entrywise law has been established. As an application of the isotropic laws, we also prove a basis-independent version of eigenvector delocalization for both sample covariance and generalized Wigner matrices.

We conclude with an outline of the paper. In Section 2 we define our models and state our results, first for sample covariance matrices (Section 2.1) and then for generalized Wigner matrices (Section 2.2). The rest of the paper is devoted to the proofs. Since they are very similar for sample covariance matrices and generalized Wigner matrices, we only give the details for sample covariance matrices. Thus, Sections 3–6 are devoted to the proof of the isotropic Marchenko-Pastur law for sample covariance matrices; in Section 7, we describe how to modify the arguments to prove the isotropic semicircle law for generalized Wigner matrices. Section 3 collects some basic identities and estimates that we shall use throughout the proofs. In Section 4 we prove the entrywise local Marchenko-Pastur law, generalizing the results of PY . The main argument and the bulk of the proof, i.e. the proof of the isotropic law, is given in Section 5. For a sketch of the argument we refer to Section 5.3. Finally, in Section 6 we draw some simple consequences from the isotropic law: optimal control outside of the spectrum and isotropic delocalization bounds.

We use CC to denote a generic large positive constant, which may depend on some fixed parameters and whose value may change from one expression to the next. Similarly, we use cc to denote a generic small positive constant. For two positive quantities ANA_{N} and BNB_{N} depending on NN we use the notation ANBNA_{N}\asymp B_{N} to mean C1ANBNCANC^{-1}A_{N}\leqslant B_{N}\leqslant CA_{N} for some positive constant CC.

Results

Let XX be an M×NM\times N matrix whose entries XiμX_{i\mu} are independent complex-valued random variables satisfying

We shall study the N×NN\times N matrix XXX^{*}X; hence we regard NN as the fundamental large parameter, and write MMNM\equiv M_{N}. Our results also apply to the matrix XXXX^{*} provided one replaces NMN\leftrightarrow M. See Remark 2.11 below for more details.

We always assume that MM and NN satisfy the bounds

for some positive constant CC. We define the ratio

which may depend on NN. Here, and throughout the following, in order to unclutter notation we omit the argument NN in quantities, such as XX and ϕ\phi, that depend on it.

It is well known that the empirical distribution of the eigenvalues of the N×NN\times N matrix XXX^{*}X has the same asymptotics as the Marchenko-Pastur lawMP

to be the edges of the limiting spectrum. Note that (2.4) is normalized so that its integral is equal to one. The Stieltjes transform of the Marchenko-Pastur law (2.4) is

where the square root is chosen so that mϕm_{\phi} is holomorphic in the upper half-plane and satisfies mϕ(z)0m_{\phi}(z)\to 0 as zz\to\infty. The function mϕ=mϕ(z)m_{\phi}=m_{\phi}(z) is also characterized as the unique solution of the equation

satisfying Imm(z)>0\operatorname{Im}m(z)>0 for Imz>0\operatorname{Im}z>0. The formulas (2.4)–(2.7) were originally derived for the case when ϕ=M/N\phi=M/N is independent of NN (or, more precisely, when ϕ\phi has a limit in (0,)(0,\infty) as NN\to\infty). Our results allow ϕ\phi to depend on NN under the constraint (2.2), so that mϕm_{\phi} and ϱϕ\varrho_{\phi} may also depend on NN through ϕ\phi.

Throughout the following we use a spectral parameter

with η>0\eta>0, as the argument of Stieltjes transforms and resolvents. Define the resolvent

The following notion of a high-probability bound was introduced in EKY2 , and has been subsequently used in a number of works on random matrix theory. It provides a simple way of systematizing and making precise statements of the form “ξ\xi is bounded with high probability by ζ\zeta up to small powers of NN”.

be two families of nonnegative random variables, where U(N)U^{(N)} is a possibly NN-dependent parameter set. We say that ξ\xi is stochastically dominated by ζ\zeta, uniformly in uu, if for all (small) ε>0\varepsilon>0 and (large) D>0D>0 we have

for large enough NN0(ε,D)N\geqslant N_{0}(\varepsilon,D). Throughout this paper the stochastic domination will always be uniform in all parameters (such as matrix indices and the spectral parameter zz) that are not explicitly fixed. Note that N0(ε,D)N_{0}(\varepsilon,D) may depend on the constants from (2.2) and (2.3) as well as any constants fixed in the assumptions of our main results. If ξ\xi is stochastically dominated by ζ\zeta, uniformly in uu, we use the notation ξζ\xi\prec\zeta. Moreover, if for some complex family ξ\xi we have ξζ\lvert\xi\rvert\prec\zeta we also write ξ=O(ζ)\xi=O_{\prec}(\zeta).

Because of (2.2), all (or some) factors of NN in Definition (2.1) could be replaced with MM without changing the definition of stochastic domination.

Sometimes we shall need the following notion of high probability.

which is the number of nontrivial (i.e. nonzero) eigenvalues of XXX^{*}X; the remaining NKN-K eigenvalues of XXX^{*}X are zero. (Note that the KK nontrivial eigenvalues of XXX^{*}X coincide with those of XXXX^{*}.) Fix a (small) ω(0,1)\omega\in(0,1) and define the domain

Throughout the following we regard ω\omega as fixed once and for all, and do not track the dependence of constants on ω\omega.

Suppose that (2.1), (2.2), and (2.3) hold. Then

Beyond the support of the limiting spectrum, one has stronger control all the way down to the real axis. For fixed (small) ω(0,1)\omega\in(0,1) define the region

of spectral parameters separated from the asymptotic spectrum by K2/3+ωK^{-2/3+\omega}, which may have an arbitrarily small positive imaginary part η\eta.

Suppose that (2.1), (2.2), and (2.3) hold. Then

for all ε>0\varepsilon>0, D>0D>0, and NN0(ε,D)N\geqslant N_{0}(\varepsilon,D).

The right-hand side of (2.15) is stable under the limit η0\eta\to 0, and may therefore be extended to η=0\eta=0. Recalling the previous remark, we conclude that (2.15) also holds for η=0\eta=0.

Suppose that (2.1), (2.2), and (2.3) hold. Then for any ε>0\varepsilon>0 we have the bound

The following result is on the rigidity of the nontrivial eigenvalues of XXX^{*}X, which coincide with the nontrivial eigenvalues of XXXX^{*}. Let γ1γ2γK\gamma_{1}\geqslant\gamma_{2}\geqslant\cdots\geqslant\gamma_{K} be the classical eigenvalue locations according to ϱϕ\varrho_{\phi} (see (2.4)), defined through

Fix a (small) ω(0,1)\omega\in(0,1) and suppose that (2.1), (2.2), and (2.3) hold. Then

uniformly for all α{1,,[(1ω)K]}\alpha\in\{1,\dots,[(1-\omega)K]\}. If in addition ϕ1c\lvert\phi-1\rvert\geqslant c for some constant c>0c>0 then

uniformly for all α{[K/2],,K}\alpha\in\{[K/2],\dots,K\}.

2. Generalized Wigner matrix

its Stieltjes transform; here we chose the square root so that mm is holomorphic in the upper half-plane and satisfies m(z)0m(z)\to 0 as zz\to\infty. Note that m=m(z)m=m(z) is also characterized as the unique solution of

that satisfies Imm>0\operatorname{Im}m>0 for η>0\eta>0. Let

Fix a (small) ω(0,1)\omega\in(0,1) and define

Suppose that (2.22) and (2.23) hold. Then

Theorem 2.12 is the isotropic generalization of the following result, proved in EKYY4 . A similar result first appeared in EYY3 .

Suppose that (2.22) and (2.23) hold. Then

The proof of Theorem 2.14 is the same as that of Theorem 2.12. Below we give the proof for Theorem 2.12, which can be trivially adapted to yield Theorem 2.14.

Combining Theorem 2.14 with the isotropic local semicircle law from EKYY4 , we may for instance obtain an isotropic local semicircle law for matrices where the lower bound of (2.22) is relaxed, so that some matrix entries may vanish.

Beyond the support of the limiting spectrum $,thestatementofTheorem2.12maybeimprovedtoaboundthatisstableallthewaydowntotherealaxis.Forfixed(small), the statement of Theorem 2.12 may be improved to a bound that is stable all the way down to the real axis. For fixed (small)\omega\in(0,1)$ define the region

of spectral parameters separated from the asymptotic spectrum by N2/3+ωN^{-2/3+\omega}, which may have an arbitrarily small positive imaginary part η\eta.

Suppose that (2.22) and (2.23) hold. Then

Suppose that (2.22) and (2.23) hold. Then

Finally, in analogy to Theorem 2.10, we record the following rigidity result, which is a trivial consequence of (EKYY4, , Theorem 7.6) with X=CN2/3X=CN^{-2/3} and Y=CN1Y=CN^{-1}; see also (EYY3, , Theorem 2.2). Write λ1λ2λN\lambda_{1}\geqslant\lambda_{2}\geqslant\cdots\geqslant\lambda_{N} for the eigenvalues of HH, and let γ1γ2γN\gamma_{1}\geqslant\gamma_{2}\geqslant\cdots\geqslant\gamma_{N} be their classical locations according to ϱ\varrho, defined through

Preliminaries

The rest of this paper is devoted to the proofs of our main results. They are similar for sample covariance matrices and generalized Wigner matrices, and in Sections 3–6 we give the argument for sample covariance matrices (hence proving the results of Section 2.1). How to modify these arguments to generalized Wigner matrices (and hence prove the results of Section 2.2) is explained in Section 7. We choose to present our method in the context of sample covariance matrices mainly for two reasons. First, we take this opportunity to give a version of the entrywise local law (Section 4) – required as input for the proof of the isotropic law – which is more general and has a simpler proof than the local law previously established in PY . Second, the proof of the isotropic law in the case of sample covariance matrices is conceptually slightly clearer due to a natural splitting of summation indices into two categories (which we distinguish by the use of Latin and Greek letters); this splitting is an essential structure behind our proof in Section 5, and is also used in the case of generalized Wigner matrices, in which case it is however purely artificial.

We now move on to the proofs. In order to unclutter notation, we shall often omit the argument zz from quantities that depend on it. Thus, we for instance often write GG instead of G(z)G(z). We put the arguments zz back when needed, typically if we are working with several different spectral parameters zz.

We begin by recording some basic large deviations estimates. We consider complex-valued random variables ξ\xi satisfying

If the coefficients aij(N)a_{ij}^{(N)} and bi(N)b_{i}^{(N)} depend on an additional parameter uu, then all of these estimates are uniform in uu (see Definition 2.1), i.e. the threshold N0=N0(ε,D)N_{0}=N_{0}(\varepsilon,D) in the definition of \prec depends only on the family CpC_{p} from (3.1); in particular, N0N_{0} does not depend on uu.

These estimates are an immediate consequence of Lemmas B.2, B.3, and B.4 in EKYY3 . See also Theorem C.1 in EKYY4 . ∎

The following lemma collects basic algebraic properties of stochastic domination \prec. We shall use them tacitly throughout the following.

Suppose that ξ(u,v)ζ(u,v)\xi(u,v)\prec\zeta(u,v) uniformly in uUu\in U and vVv\in V. If VNC\lvert V\rvert\leqslant N^{C} for some constant CC then

Suppose that ξ1(u)ζ1(u)\xi_{1}(u)\prec\zeta_{1}(u) uniformly in uu and ξ2(u)ζ2(u)\xi_{2}(u)\prec\zeta_{2}(u) uniformly in uu. Then

The claims (i) and (ii) follow from a simple union bound. For (iii), pick ε>0\varepsilon>0 and assume to simplify notation that ξ\xi and Ψ\Psi do not depend on uu. Then

for arbitrary D>0D>0. The claim (iii) therefore follows by choosing D3CD\geqslant 3C. ∎

Next, we give some basic facts about the Stieltjes transform mϕm_{\phi} of the Marchenko-Pastur law defined in (2.6). They have an especially simple form in the case ϕ1\phi\geqslant 1; the complementary case ϕ<1\phi<1 can be easily handled using (2.20). Recall the definition (2.9) of κ\kappa. We record the following elementary properties of mϕm_{\phi}, which may be proved e.g. by starting from the explicit form (2.6).

The basic object is the M×NM\times N matrix XX. We use the indices i,j=1,,Mi,j=1,\dots,M to denote the rows of XX and μ,ν=1,,N\mu,\nu=1,\dots,N to denote its columns. Unrestricted summations over Latin indices always run over the set {1,2,,M}\{1,2,\ldots,M\} while Greek indices run over {1,2,,N}\{1,2,\ldots,N\}.

In addition to the resolvent RR from (2.8), we shall need another resolvent, GG:

Although our main results only pertain to RR, the resolvent GG will play a crucial role in the proofs, in which we consider both XXX^{*}X and XXXX^{*} in tandem. In the following formulas the spectral parameter zz plays no explicit role, and we therefore omit it from the notation, as explained at the beginning of this section.

More abstractly, we may introduce two sets of indices, IpopulationI_{\rm population} and IsampleI_{\rm sample}, such that IpopulationI_{\rm population} indexes the entries of GG and IsampleI_{\rm sample} the entries of RR. Elements of IpopulationI_{\rm population} are always denoted by Latin letters and elements of IsampleI_{\rm sample} by Greek letters. These two sets are to be viewed as distinct. For convenience of notation, we shall always use the customary representations Ipopulation:={1,,M}I_{\rm population}\mathrel{\mathop{:}}=\{1,\dots,M\} and Isample:={1,,N}I_{\rm sample}\mathrel{\mathop{:}}=\{1,\dots,N\}, bearing in mind that neither should be viewed as a subset of the other. This terminology originates from the statistical application of sample covariance matrices. The idea is that we are observing the statistics of a population of size MM by making NN independent measurements (“samples”) of the population. Each observation is a column of XX. Hence the population index ii labels the rows of XX and the sample index μ\mu the columns of XX.

Moreover, for i,jTi,j\notin T we define the resolvents entries

When T={a}T=\{a\}, we abbreviate ({a})(\{a\}) by (a)(a) in the above definitions; similarly, we write (ab)(ab) instead of ({a,b})(\{a,b\}).

We shall use the following expansion formulas for GG. They are elementary consequences of Schur’s complement formula; see e.g. Lemma 4.2 of EYY1 and Lemma 6.10 of EKYY2 for proofs of similar identities.

For i,j,kTi,j,k\notin T and i,jki,j\neq k we have

Moreover, for i,jTi,j\notin T and iji\neq j we have

Definition 3.5 and Lemma 3.6 have the following analogues for removing columns and identities for RR.

Moreover, for μ,νT\mu,\nu\notin T we define the resolvents entries

When T={μ}T=\{\mu\}, we abbreviate [{μ}][\{\mu\}] by [μ][\mu] in the above definitions; similarly, we write [μν][\mu\nu] instead of [{μ,ν}][\{\mu,\nu\}].

We use the following expansion formulas for RR, which are analogoues to those of Lemma 3.6.

For μ,ν,ρT\mu,\nu,\rho\notin T and μ,νρ\mu,\nu\neq\rho we have

Moreover, for μ,νT\mu,\nu\notin T and μν\mu\neq\nu we have

Here we draw attention to a fine notational distinction. Parentheses ()(\cdot) in X(T)X^{(T)} indicate removal of rows (indexed by Latin letters), while square brackets [][\cdot] in X[T]X^{[T]} indicate removal of columns (indexed by Greek letters). In particular, this notation makes it unambiguous whether TT is required to be a subset of {1,,M}\{1,\dots,M\} or of {1,,N}\{1,\dots,N\}.

The following lemma is an immediate consequence of the fact that for ϕ1\phi\geqslant 1 the spectrum of XXXX^{*} is equal to the spectrum of XXX^{*}X plus MNM-N zero eigenvalues. (A similar result holds for for ϕ1\phi\leqslant 1, and if XX is replaced with X[T]X^{[T]} or X(U)X^{(U)}.)

Let T{1,,N}T\subset\{1,\dots,N\} and U{1,,M}U\subset\{1,\dots,M\}. Then we have

in agreement with the relation (2.20) and the heuristics M1TrGmϕ1M^{-1}\operatorname{Tr}G\sim m_{\phi^{-1}} and N1TrRmϕN^{-1}\operatorname{Tr}R\sim m_{\phi}.

The following lemma is an easy consequence of the well-known interlacing property of the eigenvalues of XXXX^{*} and X(i)(X(i))X^{(i)}(X^{(i)})^{*}, as well as the eigenvalues of XXX^{*}X and (X[μ])X[μ](X^{[\mu]})^{*}X^{[\mu]}.

For any T{1,,N}T\subset\{1,\dots,N\} and U{1,,M}U\subset\{1,\dots,M\}, there exists a constant CC, depending only on T\lvert T\rvert and U\lvert U\rvert, such that

Finally, we record the fundamental identity

which follows easily by spectral decomposition of G[T]G^{[T]}.

2. Reduction to the case ϕ⩾1italic-ϕ1\phi\geqslant 1

We shall prove Theorems 2.4 and 2.5 by restricting ourselves to the case ϕ1\phi\geqslant 1 but considering both XXX^{*}X and XXXX^{*} simultaneously. In this short section we give the details of this reduction. Define the control parameter

Suppose that (2.1), (2.2), (2.3), and ϕ1\phi\geqslant 1 hold. Then

Suppose that (2.1), (2.2), (2.3), and ϕ1\phi\geqslant 1 hold. Then

Suppose that (2.1), (2.2), (2.3), and ϕ1\phi\geqslant 1 hold. For any ε>0\varepsilon>0 we have the bounds

Theorems 2.4, 2.5, and 2.8 are easy consequences of Theorems 3.11, 3.12, and 3.13 respectively, combined with the observation that

What remains therefore is to prove Theorems 2.10, 3.11, 3.12, and 3.13. We shall prove Theorem 2.10 in Section 4.3, Theorem 3.11 in Section 5, and Theorems 3.12 and 3.13 in Section 6.

The entrywise local Marchenko-Pastur law

Suppose that (2.1), (2.2), (2.3), and ϕ1\phi\geqslant 1 hold. Then

Theorem 4.1 differs from Theorem 3.1 in PY in the following two ways.

The restriction 1ϕC1\leqslant\phi\leqslant C in PY is relaxed to 1ϕNC1\leqslant\phi\leqslant N^{C} (and hence, as explained in Section 3.2, to NCϕNCN^{-C}\leqslant\phi\leqslant N^{C}).

The uniform subexponential decay assumption of PY is relaxed to (2.3). On the other hand, thanks to the stronger subexponential decay assumption the statement of Theorem 3.1 of PY is slightly stronger than Theorem 4.1: in Theorem 3.1 of PY , the error bounds NεN^{\varepsilon} in the definition of \prec are replaced with (logN)CloglogN(\log N)^{C\log\log N}.

The difference (ii) given above is technical and amounts to using Lemma 3.1, which is tailored for random variables satisfying (2.3), for the large deviation estimates. We remark that all of the arguments of the current paper may be translated to the setup of PY , explained in (ii) above, by modifying the definition of \prec. The essence of the proofs remains unchanged; the only nontrivial difference is that in Section 5 we have to control moments whose power depends weakly on NN; this entails keeping track of some basic combinatorial bounds. We do not pursue this modification any further.

The difference (i) is more substantial, and requires to keep track of the ϕ\phi-dependence of all appropriately rescaled quantities throughout the proof. In addition, we take this opportunity to simplify and streamline the argument from PY . This provides a short and self-contained proof of Theorem 4.1, up to a fluctuation averaging result, Lemma 4.9 below, which was proved in the current simple and general form in EKYY4 .

We begin with the proof of (4.1) and (3.18). For the following it will be convenient to use the rescaled spectral parameters

Using z~\widetilde{z} and z^\widehat{z} we may write the the defining equation (2.7) of mϕm_{\phi} as

We define the zz-dependent random control parameters

where we defined the Stieltjes transform of the empirical density of XXX^{*}X,

The goal of this subsection is to prove the following weaker variant of Theorem 4.1.

The rest of this subsection is devoted to the proof of Proposition 4.2. We begin by introducing the basic zz-dependent event

The proof is a simple induction argument using (3.10) and the bound mϕc\lvert m_{\phi}\rvert\geqslant c from (3.5). We omit the details. ∎

As in the works EYY3 ; PY , the main idea of the proof is to derive a self-consistent equation for mR=1NμRμμm_{R}=\frac{1}{N}\sum_{\mu}R_{\mu\mu} using the resolvent identity (3.11). To that end, we introduce the conditional expectation

i.e. the partial expectation in the randomness of the μ\mu-th column of XX. We define

where in the last step we used (2.1) and (4.3). Using (3.11) with T=T=\emptyset, Lemma 3.9, and (4.3), we find

The following lemma contains the key estimates needed to control the error terms ZμZ_{\mu} and Λo\Lambda_{o}. The errors are controlled using of the (random) control parameter

whose analogue in the context of Wigner matrices first appeared in EYY3 .

The proof is very similar to that of Theorems 6.8 and 6.9 of PY . We consequently only give the details for the estimate of Λo\Lambda_{o}; the estimate of ZμZ_{\mu} is similar.

For μν\mu\neq\nu we use (3.12) with T=T=\emptyset to expand RμνR_{\mu\nu}. Conditioning on X[μν]X^{[\mu\nu]} and invoking (3.3) yields

On the event Ξ\Xi, we estimate the right-hand side using

where the first step follows from (3.14), the second from Lemma 3.9, the third from Imz1=ηz2Cη/ϕ\operatorname{Im}z^{-1}=-\eta\lvert z\rvert^{-2}\geqslant-C\eta/\phi and (4.7), and the fourth from the fact that Immϕcη\operatorname{Im}m_{\phi}\geqslant c\eta by (3.6).

Recalling (3.5), we have therefore proved that

which, together with the analogous bound for ZμZ_{\mu}, concludes the proof of (4.13).

In order to prove the estimate ΛoΨΘ\Lambda_{o}\prec\Psi_{\Theta} from (4.14) for η(logN)1\eta\geqslant(\log N)^{-1}, we proceed similarly. From (4.15) and the trivial deterministic bound \bigl{\lvert}R_{\mu\mu}R_{\nu\nu}^{[\mu]}\bigr{\rvert}\leqslant\eta^{-2}\leqslant(\log N)^{2} we get

where the estimate is similar to (4.16), except that in the last step we use Lemma 3.10 to estimate TrRμνTrR\operatorname{Tr}R^{\mu\nu}-\operatorname{Tr}R. Since η(logN)1\eta\geqslant(\log N)^{-1}, we easily find that RμνΨΘ\lvert R_{\mu\nu}\rvert\prec\Psi_{\Theta}. This concludes the proof. ∎

Note that, by (4.4), the function mϕm_{\phi} satisfies D(mϕ)=0\mathcal{D}(m_{\phi})=0.

Next, we derive a stability result for D1\mathcal{D}^{-1}. Roughly, we prove that if D(u)\mathcal{D}(u) is small then uu is close to mϕm_{\phi}. Note that this result is entirely deterministic. It relies on a discrete continuity argument, whose essence is the existence of a sufficiently large gap between the two solutions of D()=0\mathcal{D}(\cdot)=0. Once this gap is established, then, together with the fact that uu is close to mϕm_{\phi} for large η\eta, we may conclude that uu is close to mϕm_{\phi} for smaller η\eta as well. We use a discrete version of a continuity argument (as opposed to a continuous one used e.g. in PY ), which allows us to bypass several technical issues when applying it to estimating the random quantity mRmϕ\lvert m_{R}-m_{\phi}\rvert. For more details of this application, see the explanation following (4.34).

Thus, if Imz1\operatorname{Im}z\geqslant 1 then L(z)={z}L(z)=\{z\} and if Imz1\operatorname{Im}z\leqslant 1 then L(z)L(z) is a one-dimensional lattice with spacing N5N^{-5} plus the point zz. Clearly, we have the bound

for some constant CC independent of zz and NN.

Let uu be as in Lemma 4.5, and abbreviate R:=D(u)\mathcal{R}\mathrel{\mathop{:}}=\mathcal{D}(u). Hence, by assumption on uu, we have Rδ\lvert\mathcal{R}\rvert\leqslant\delta. We introduce u1u1Ru_{1}\equiv u_{1}^{\mathcal{R}} and u2u2Ru_{2}\equiv u_{2}^{\mathcal{R}} by setting u1:=uu_{1}\mathrel{\mathop{:}}=u and defining u2u_{2} as the other solution of the quadratic equation D(u)=R\mathcal{D}(u)=\mathcal{R}. Note that each uiu_{i} is continuous. Explicitly, for R1/2\lvert\mathcal{R}\rvert\leqslant 1/2 we get

for some constant C02C_{0}\geqslant 2. What remains is to show that (4.21) holds for i=1i=1. We shall prove this using a continuity argument.

holds or not. If (4.24) does not hold, then we have κ+η4C0C1δ\kappa+\eta\leqslant 4C_{0}C_{1}\delta, so that (4.21), Rδ\lvert\mathcal{R}\rvert\leqslant\delta, and the upper bound of (4.22) imply

What remains is the case where (4.24) holds. We use a continuity argument along the set L(z)L(z), which we parametrize as L(z)={z0,,zL}L(z)=\{z_{0},\dots,z_{L}\}, where Imz0=1\operatorname{Im}z_{0}=1, zL=zz_{L}=z, and Imzl+1<Imzl\operatorname{Im}z_{l+1}<\operatorname{Im}z_{l}. Note that zl+1zlN5\lvert z_{l+1}-z_{l}\rvert\leqslant N^{-5}. By assumption, Rδ\lvert\mathcal{R}\rvert\leqslant\delta at each zlL(z)z_{l}\in L(z), so that (4.21) and (4.22) yield

at each zlL(z)z_{l}\in L(z). (Here the quantities κκ(zl)\kappa\equiv\kappa(z_{l}), ηη(zl)\eta\equiv\eta(z_{l}), and δδ(zl)\delta\equiv\delta(z_{l}) are understood as functions of the spectral parameters zlz_{l}.) Moreover, since (4.24) holds at z=zLz=z_{L}, by the monotonicity assumption on δ\delta we find that (4.24) holds for all zlL(z)z_{l}\in L(z). We now prove that

for all l=1,,Ll=1,\dots,L by induction on ll. For l=0l=0 the bound (4.26) is simply (4.23) proved above. Suppose therefore that (4.26) holds for some ll. Since u1u_{1} and mϕm_{\phi} are Lipschitz continuous with Lipschitz constant NN, we get

where in the second step we used the induction assumption, in the third step the Lipschitz continuity of δ\delta and the bound ηN1\eta\geqslant N^{-1}, and in the last step the bounds δN2\delta\geqslant N^{-2} and κ+η+δC\kappa+\eta+\delta\leqslant C. Next, recalling (4.24), it is easy to deduce (4.26) with ll replaced by l+1l+1, using the bounds (4.25) and (4.27). This concludes the proof. ∎

We may now combine the probabilistic estimates from Lemma 4.4 with the stability of D1\mathcal{D}^{-1} from Lemma 4.5 to get the following result for η1\eta\geqslant 1, which will be used as the starting estimate in the bootstrapping in η\eta.

Next, we plug the estimates from Lemma 4.4 into (4.11) in order to obtain estimates on mRm_{R}. The summation in mR=1NμRμμm_{R}=\frac{1}{N}\sum_{\mu}R_{\mu\mu} will give rise to an error term of the form

For the proof of Proposition 4.2, it will be enough to estimate [Z]maxμZμ\lvert[Z]\rvert\leqslant\max_{\mu}\lvert Z_{\mu}\rvert, but for the eventual proof of Theorem 4.1, we shall need to exploit cancellation in the averaging in [Z][Z]. Bearing this in mind, we state our estimates in terms of [Z][Z] to avoid repeating the following argument in Section 4.2.

where we absorbed the error term N1N^{-1} on the right-hand side of (4.11) into ΨΘ2\Psi_{\Theta}^{2} using (3.6). Thus, using (4.13) we get

Hence (4.30) follows. Next, expanding Rμμ=mR+(RμμmR)R_{\mu\mu}=m_{R}+(R_{\mu\mu}-m_{R}) yields

After taking the average []=1Nμ[\cdot]=\frac{1}{N}\sum_{\mu}\cdot\, the second term on the right-hand side vanishes. Taking the average of (4.32) therefore yields, using (4.30) and (4.3),

Our goal is to prove that with high probability there is a gap in the range of vv, i.e.

The basic idea behind the proof of (4.34) is to use the deterministic result from Lemma 4.5 to propagate smallness of the random variable Λ(z)\Lambda(z) from large values of η\eta to smaller values of η\eta. Since we are dealing with random variables, one has to keep track of probabilities of exceptional events. To that end, we only work on a discrete set of values of η\eta, which allows us to control the exceptional probabilities by a simple union bound. We remark that the first instance of such a stochastic continuity argument combined with stability of a self-consistent equation was given in ESY2 in the context of Wigner matrices. Over the years it has been improved through several papers in the context of Wigner matrices EKYY4 ; EYY1 ; EYY3 as well as in the context of sample covariance matrices ESYY ; PY .

for large enough NN0(ε,D)N\geqslant N_{0}(\varepsilon,D) (independent of zz and ww). Using (4.19) and a union bound, we therefore get

Next, we use Lemma 4.5 with u=mRu=m_{R} and δ=Nε/2(Nη)1/2\delta=N^{\varepsilon/2}(N\eta)^{-1/2} to get

(Here we used the trivial observation that the conclusion of Lemma 4.5 is valid not only at zz but in the whole set L(z)L(z).) Using (4.13) and (4.30) we therefore get (4.34).

Since ε\varepsilon can be made arbitrarily small and DD arbitrarily large, Proposition 4.2 follows from (4.38).

2. Fluctuation averaging and conclusion of the proof of Theorem 4.1

In order to improve the negative power of (Nη)(N\eta) in Proposition 4.2, and hence prove the optimal bound in Theorem 4.1, we shall use the following result iteratively. Recall the definition of Θ\Theta from (4.6) and definition of [Z][Z] from (4.29).

Let τ(0,1)\tau\in(0,1) and suppose that we have

This result was given in a slightly different context in Theorem 4.7 in EKYY4 . However, it is a triviality that the proof of Theorem 4.7 in EKYY4 carries over word for word, provided one replaces Gij(T)G_{ij}^{(T)} there with Rμν[T]R_{\mu\nu}^{[T]}; see Remark B.3 in EKYY4 . The proof relies only on the identity (3.10), which is the analogue of Equation (4.6) in EKYY4 . ∎

The conclusion of Lemma 4.9 remains true under somewhat more general hypotheses, whereby Λ\Lambda is not required to be small. Indeed, (4.41) holds provided that Φo\Phi_{o} is as in Lemma 4.9 and that

The proof is the same as that of Theorem 4.7 in EKYY4 .

Using (3.6), it is easy to check that these definitions of Φo\Phi_{o} and Φ\Phi satisfy the assumptions of Lemma 4.9. Hence the claim follows from Lemma 4.9 and (4.42). ∎

where in the second step we used (3.6). Summarizing, we have proved the self-improving estimate

From Proposition 4.2 we know that Θ(Nη)1/4\Theta\prec(N\eta)^{-1/4}. Thus, for any ε>0\varepsilon>0, we iterate (4.43) an order CεC_{\varepsilon} times to get Θ(Nη)1+ε\Theta\prec(N\eta)^{-1+\varepsilon}. This concludes the proof of the first bound of (3.18). The second bound of (3.18) follows from the first one and the identity (3.13).

Using Lemma 3.10, we rewrite the sum in the second term according to N^{-1}\sum_{\mu}R_{\mu\mu}^{(i)}=m_{R}+O\bigl{(}{(N\eta)^{-1}}\bigr{)}. Moreover, the third term is estimated exactly as ZμZ_{\mu} in Lemma 4.4, using Lemma 3.1. Putting everything together yields

In particular, Giiϕ1/2\lvert G_{ii}\rvert\prec\phi^{-1/2}. The same argument applied to the matrix X(j)X^{(j)} instead of XX yields Gii(j)ϕ1/2\lvert G_{ii}^{(j)}\rvert\prec\phi^{-1/2}. Thus we get from (3.9) that for iji\neq j we have

where the last step follows using (3.3), exactly as in the proof of Lemma 4.4, and (4.1). This concludes the proof of (4.2), and hence of Theorem 4.1.

3. Proof of Theorem 2.10

The proof of Theorem 2.10 is similar to that of Theorem 2.2 in EYY3 and Theorem 3.3 in PY . We therefore only sketch the argument. First we observe that, since the nontrivial eigenvalues λ1,,λK\lambda_{1},\dots,\lambda_{K} of XXX^{*}X and XXXX^{*} coincide and

for all γ>0\gamma>0, it suffices to prove Theorem 2.10 for ϕ1\phi\geqslant 1, i.e. K=NK=N.

The proof relies on the following key estimates.

Finally, if ϕ1+c\phi\geqslant 1+c for some constant c>0c>0, then

Starting from Lemma 4.11, the proof of Theorem 2.10 is elementary. (The details are given e.g. on the last page of Section 5 in EYY3 .)

The estimate (4.44) is a standard consequence of (3.18), using Helffer-Sjöstrand functional calculus; see e.g. (EYY3, , Section 5).

What remains is the proof of (4.45) and (4.46). Here the argument from (PY, , Section 8) applies with trivial modifications. The key inputs in our case are (4.44), Lemma 4.5, (4.40), and Lemma 4.9 combined with Remark 4.10. We omit further details. ∎

The isotropic law: proof of Theorem 3.11

In this section we complete the proof of Theorem 3.11. Since (3.18) was proved in Section 4, we only need to prove (3.16) and (3.17). For definiteness, we give the details of the proof of (3.16); the proof of (3.17) is very similar, and the required modifications are outlined at the end of Section 5.15 below.

It is convenient to introduce the rescaled quantities

The motivation behind this definition is that

2. Reduction to off-diagonal entries

By linearity and polarization, in order to prove (3.16) it suffices to prove that

The rest of this section is devoted to the proof of (5.7).

3. Sketch of the proof

The basic reason why (5.7) holds is that GabG_{ab} can be expanded, to leading order, as a sum of independent random variables using the identity (3.9). To simplify the presentation in this sketch, we set M=NM=N, so that ϕ=1\phi=1 and the rescalings from Section 5.1 indicated by a tilde are trivial. Hence we drop all tildes. From (3.9) we get

If we could replace the diagonal entries by the deterministic value mϕm_{\phi}, it would suffice to estimate the sum abμ,νv ⁣aXaμRμν(ab)Xνbvb\sum_{a\neq b}\sum_{\mu,\nu}\overline{v}\!\,_{a}X_{a\mu}R^{(ab)}_{\mu\nu}X_{\nu b}^{*}v_{b}. By the independence of the entries of XX we have, using (3.3),

The error made in the approximation GaamϕG_{aa}\approx m_{\phi} is of order Ψ\Psi by (5.3), so that the corresponding error term on the right-hand side of (5.8) may be bounded using (3.3) by

We conclude that the simple replacement of GaaG_{aa} with its deterministic approximation in (5.8) is not affordable. Not only the leading term but also every error term has to be expanded in the entries of XX. This expansion is most effectively controlled if performed within a high-moment estimate. Thus, for large and even pp we shall estimate

Before explaining the general strategy, we sketch a second moment calculation. First, we write

Using (3.7), we maximally expand all resolvent entries in the indices a,b,c,da,b,c,d. This means that we use (3.7) repeatedly until each term in the expansion is independent of all Latin indices that do not explicitly appear among its lower indices; here an entry is independent of an index if that index is an upper index of the entry. This generates a series of maximally expanded terms, whereby a resolvent entry is by definition maximally expanded if we cannot add to it upper indices from the set a,b,c,da,b,c,d by using the identity (3.7). In other words, Gij(T)G_{ij}^{(T)} is maximally expanded if and only if T={a,b,c,d}{i,j}T=\{a,b,c,d\}\setminus\{i,j\}.

To illustrate this procedure, we assume temporarily that a,b,c,da,b,c,d are all distinct, and write, using (3.7),

We get a similar expression for GcdG_{cd}^{*}. We plug both of these expansions into (5.10) and multiply out the product. The leading term is

We now expand both resolvent entries using (3.9), which gives

The goal is to use the expectation to get a pairing (or a more general partition) of the entries of XX. In order to do that, we shall require all terms that are not entries of XX to be independent of the randomness in the rows a,b,c,da,b,c,d of XX. While the entries of RR satisfy this condition, the entries of GG do not. We shall hence have to perform a further expansion on them using the identities (3.7) and (3.9). In fact, these two types of expansions will have to be performed in tandem, using a two-step recursive procedure. The main reason behind this is that even if all entries of GG are maximally expanded, each application of (3.9) produces a diagonal entry that is not maximally expanded; for such terms the expansion using (3.7) has to be repeated. For the purposes of this sketch, however, we omit the details of the further expansion of the entries of GG, and replace them with their deterministic leading order, mϕm_{\phi} (see (5.3)). This approximation gives

Since all entries of RR are independent of all entries of XX, we can compute the expectation with respect to the rows a,b,c,da,b,c,d. Note that the only possible pairing is a=da=d, μ=β\mu=\beta, b=cb=c, and ν=α\nu=\alpha. This results in the expression

This calculation, while giving the right order, was in fact an oversimplification, since the expansion (5.11) required that bcb\neq c and dad\neq a. The correct argument requires first a decomposition of the summation over a,b,c,da,b,c,d into a finite number of terms, indexed by the partitions of four elements, according to the coincidences among these four indices. Thus, we write

where a star over the summation indicates that all summation indices that are not explicitly equal to each other have to be distinct. The above calculation leading to (5.12) is valid for the first summation of (5.13), whose contribution (up to leading order) is zero, since the only possible pairing contradicts the condition that the indices a,b,c,da,b,c,d be all distinct. It is not too hard to see that, among the sums in (5.13), only the last one gives a nonzero contribution (up to leading order), and it is, going back to (5.10), equal to

Next, we consider a subleading term from the first summation in (5.13), which has three off-diagonal entries:

We proceed as before, expanding all off-diagonal entries of GG using (3.9). Up to leading order, we get

The expectation again renders this term zero if a,b,c,da,b,c,d are distinct.

Based on these preliminary heuristics, we outline the main steps in estimating a high moment of Z\mathcal{Z}.

Partition the indices in (5.9) according to their coincidence structure: indices in the same block of the partition are required to coincide and indices in different blocks are required to be distinct. This leads to a reduced family, TT, of distinct indices.

Make all entries of GG maximally expanded by repeatedly applying the identity (3.7). Roughly, this entails adding upper indices from the family TT to each entry of GG using the identity (3.7). We stop the iteration if either (3.7) cannot be applied to any entry of GG or we have generated a sufficiently large number of off-diagonal entries of GG.

Apply (3.9) to each maximally expanded off-diagonal entry of GG. This yields factors of the form μ,νXaμRμν(T)Xνb\sum_{\mu,\nu}X_{a\mu}R^{(T)}_{\mu\nu}X_{\nu b}^{*} with a,bTa,b\in T and R(T)R^{(T)} is independent of all entries of XX by construction. In addition, this application of (3.9) produces new diagonal entries of GG that are not maximally expanded.

Repeat Steps 2 and 3 recursively in tandem until we only have a sum of terms whose factors consist of maximally expanded diagonal entries of GG, entries of R(T)R^{(T)}, and entries of XX from the rows indexed by TT.

Apply (3.8) to each maximally expanded diagonal entry of GG. We end up with factors consisting only of entries of R(T)R^{(T)} and entries of XX from the rows indexed by TT.

Using the fact that all entries of RR are independent of all entries of XX, take a partial expectation over the rows of XX indexed by the set TT; this only involves the entries of XX. Only those terms give a nonzero contribution whose Greek indices have substantial coincidences.

Steps 1–6 require a careful expansion algorithm and a meticulous bookkeeping of the resulting terms. We shall develop a graphical language that encodes the resulting monomials. Expansion steps will be recorded via operations on graphs such as merging certain vertices or replacing some vertex or edge by a small subgraph. Several ingredients of the graphical representation and the concept of graph operations are inspired by tools from EKY2 developed for random band matrices. Once the appropriate graphical language is in place and the expansion algorithm has been constructed, the observations in Steps 7 and 8 will yield the desired estimate by a power counting coupled with a parity argument.

4. The p𝑝p-th moment of 𝒵𝒵\mathcal{Z} and introduction of graphs

We shall estimate Z\mathcal{Z} with high probability by estimating its pp-th moment for a large but fixed pp. It is convenient to rename the summation variables in the definition of Z\mathcal{Z} as (a,b)=(b1,b2)(a,b)=(b_{1},b_{2}). Let pp be an even integer and write

where we recall the definition of Z\mathcal{Z} from (5.6).

We shall perform the summation by first fixing the partition PPpP\in\mathfrak{P}_{p} and by deriving an upper bound that is uniform in PP; at the very end we shall sum trivially over PPpP\in\mathfrak{P}_{p}.

In order to handle expressions of the form (5.15), as well as more complicated ones required in later stages of the proof, we shall need to develop a graphical notation. The basic idea is to associate matrix indices with vertices and resolvent entries with edges. The following definition introduces graphs suitable for our purposes.

By a graph we mean a finite, directed, edge-coloured, multigraph

Here VV is a finite set of vertices, EE a finite set of directed edges, and ξ\xi is a “colouring of EE”, i.e. a mapping from EE to some finite set of colours. The graph Γ\Gamma may have multiple edges and loops. More precisely, EE is some finite set with maps α,β:EV\alpha,\beta\mathrel{\mathop{:}}E\to V; here α(e)\alpha(e) and β(e)\beta(e) represent the source and target vertices of the edge eEe\in E. We denote by degΓ(i)\deg_{\Gamma}(i) the degree of the vertex iV(Γ)i\in V(\Gamma).

We may now express the right-hand side of (5.15) using graphs. Fix the partition PPpP\in\mathfrak{P}_{p}. We associate a graph ΔΔ(P)\Delta\equiv\Delta(P) with PP as follows. The vertex set V(Δ)V(\Delta) is given by the blocks of PP, i.e. V(Δ)=PV(\Delta)=P. The set of colours, i.e. the range of ξ\xi, is {G,G}\{G,G^{*}\} (we emphasize that these two colours are simply formal symbols whose name is supposed to evoke their meaning). The set of edges E(Δ)E(\Delta) is parametrized as follows by the resolvent entries on the right-hand side of (5.15). Each resolvent entry Gbk1bk2#G_{b_{k1}b_{k2}}^{\#} gives rise to an edge eE(Δ)e\in E(\Delta) with colour ξ(e)=G\xi(e)=G if #\# is nothing and ξ(e)=G\xi(e)=G^{*} if #\# is *. The source vertex α(e)\alpha(e) of this edge is the unique block of PP satisfying (k,1)α(e)(k,1)\in\alpha(e), and its target vertex β(e)\beta(e) the unique block of PP satisfying (k,2)β(e)(k,2)\in\beta(e). Figure 5.1 illustrates the construction of Δ(P)\Delta(P), where the two different types of line correspond to the two colours G,GG,G^{*}. The graph Δ\Delta has no loops.

We record the following basic properties of Δ\Delta.

Δ\Delta has no loops, i.e. α(e)β(e)\alpha(e)\neq\beta(e) for all eE(Δ)e\in E(\Delta).

1V(Δ)2p1\leqslant\lvert V(\Delta)\rvert\leqslant 2p.

5. Generalized colours and encoding

We shall need to enlarge the set of colours of edges, so as to be able to encode entries of not only G~\widetilde{G} and G~\widetilde{G}^{*}, but also entries of RR, RR^{*}, XX, and XX^{*}; in addition, we shall need to encode diagonal entries of G~\widetilde{G} and G~\widetilde{G}^{*} that are in the denominator, as in the formulas (3.7), as well as to keep track of upper indices. We need all of these factors, since our expansion relies on a repeated application of the identities (3.7), (3.8), and (3.9).

In order to define the graphs precisely, we consider graphs Γ\Gamma satisfying Definition 5.3 whose vertex set V(Γ)=Vb(Γ)Vw(Γ)V(\Gamma)=V_{b}(\Gamma)\cup V_{w}(\Gamma) is partitioned into black vertices Vb(Γ)V_{b}(\Gamma) and white vertices Vw(Γ)V_{w}(\Gamma). Informally, black vertices are incident to edges encoding entries of G~\widetilde{G} and G~\widetilde{G}^{*}, and white vertices to edges encoding entries of R~\widetilde{R} and R~\widetilde{R}^{*}. In other words, black vertices are associated with Latin summation indices in the population space {1,,M}\{1,\dots,M\}; white vertices are associated with Greek summation indices in the sample space {1,,N}\{1,\dots,N\}. See Remark 3.4. We shall only consider graphs Γ\Gamma satisfying

an assumption we make throughout the following. This means that only new Greek summation indices but no new Latin indices are generated, corresponding to the repeated applications of (3.8) and (3.9). In particular, the vertex colouring for our graph is very simple: the vertices of Δ\Delta are black and all other vertices are white.

We now list some properties of all graphs we shall consider. To that end, we call eE(Γ)e\in E(\Gamma) a GG-edge if ξ1(e){G,G}\xi_{1}(e)\in\{G,G^{*}\}, an RR-edge if ξ1(e){R,R}\xi_{1}(e)\in\{R,R^{*}\}, and an XX-edge if ξ1(e){X,X}\xi_{1}(e)\in\{X,X^{*}\}.

If ee is a GG-edge then α(e),β(e)Vb(Γ)\alpha(e),\beta(e)\in V_{b}(\Gamma).

If ee is an RR-edge then α(e),β(e)Vw(Γ)\alpha(e),\beta(e)\in V_{w}(\Gamma).

If ξ1(e)=X\xi_{1}(e)=X then α(e)Vb(Γ)\alpha(e)\in V_{b}(\Gamma) and β(e)Vw(Γ)\beta(e)\in V_{w}(\Gamma).

If ξ1(e)=X\xi_{1}(e)=X^{*} then α(e)Vw(Γ)\alpha(e)\in V_{w}(\Gamma) and β(e)Vb(Γ)\beta(e)\in V_{b}(\Gamma).

If ξ2(e)=\xi_{2}(e)=- then ξ1(e){G,G}\xi_{1}(e)\in\{G,G^{*}\} and α(e)=β(e)\alpha(e)=\beta(e).

If ξ3(e)\xi_{3}(e)\neq\emptyset then ξ1(e){G,G}\xi_{1}(e)\in\{G,G^{*}\} and ξ3(e)Vb(Γ){α(e),β(e)}\xi_{3}(e)\subset V_{b}(\Gamma)\setminus\{\alpha(e),\beta(e)\}.

Properties (i)–(iv) are straightforward compatibility conditions which are obvious in light of the type of matrix entry that the edge ee encodes. Property (v) states that only diagonal entries of G~\widetilde{G} and G~\widetilde{G}^{*} may be in the denominator. Property (vi) states that only entries of GG or G~\widetilde{G} may have a (nontrivial) upper index and the lower indices of an entry of G~\widetilde{G} or G~\widetilde{G}^{*} may not coincide with its upper indices (by definition of minors).

For eE(Γ)e\in E(\Gamma) with ξ1(e){G,G}\xi_{1}(e)\in\{G,G^{*}\} we define the resolvent entry encoded by ee in Γ\Gamma as

When drawing graphs, we represent a black vertex as a black dot and a white vertex as a white dot. An edge with ξ1=G\xi_{1}=G is represented as a solid directed line joining two black dots, and an edge with ξ1=G\xi_{1}=G^{*} as a dashed directed line joining two black dots. If ξ2=\xi_{2}=- we indicate this by decorating the edge with a white diamond (not to be confused with a white dot). Notice that such edges are always loops, according to property (v). Sometimes we also indicate the component ξ3(e)\xi_{3}(e) in our graphs, simply by writing it next to the edge ee. See Figure 5.2 for our graphical conventions when depicting edges with ξ1{G,G}\xi_{1}\in\{G,G^{*}\}.

For the other edges, eE(Γ)e\in E(\Gamma) with ξ1(e){R,R,X,X}\xi_{1}(e)\in\{R,R^{*},X,X^{*}\}, we set

When drawing graphs, we represent an edge with ξ1=R\xi_{1}=R as a solid directed line joining two white vertices, an edge with ξ1=R\xi_{1}=R^{*} as a dashed directed line joining two white vertices, an edge with ξ1=X\xi_{1}=X as a dotted directed line from a black to a white vertex, and an edge with ξ1=X\xi_{1}=X^{*} as a dotted directed line from a white to a black vertex. Note that we use the same line style to draw XX- and XX^{*}-edges, since the orientation of the edge together with the vertex colouring distinguishes them uniquely. See Figure 5.3 for an illustration of these conventions, and Figure 5.6 for an illustration of (5.28).

6. R𝑅R-groups

We define an RR-group to be an induced subgraph of Γ\Gamma consisting of three edges, e1e_{1}, e2e_{2}, e3e_{3}, such that e1e_{1} and e3e_{3} are XX-edges and e2e_{2} is an RR-edge, and they form a chain in the sense that β(e1)=α(e2)\beta(e_{1})=\alpha(e_{2}), β(e2)=α(e3)\beta(e_{2})=\alpha(e_{3}), and both of these vertices have degree two. We call e2e_{2} the centre of the RR-group and define A(e2):=α(e1)A(e_{2})\mathrel{\mathop{:}}=\alpha(e_{1}) and B(e2):=β(e3)B(e_{2})\mathrel{\mathop{:}}=\beta(e_{3}). If A(e2)=B(e2)A(e_{2})=B(e_{2}) we call the RR-group diagonal; otherwise we call it off-diagonal. See Figure 5.4 for an illustration.

We require that our graphs Γ\Gamma satisfy the following property.

Each XX-edge and RR-edge of Γ\Gamma belongs to some RR-group of Γ\Gamma. In particular, all white vertices have degree two, and an RR-group is uniquely determined by its centre.

The RR-groups constitute graphical representations of the monomials on the right-hand sides of (3.8) and (3.9). It is important to stress that there is no restriction on possible coincidences among the white-vertex indices (ai)iVw(a_{i})_{i\in V_{w}}; this means that even if two Greek summation indices arising from two different applications (3.8) or (3.9) coincide, they will nevertheless be encoded by distinct white vertices. This allows us to keep the graphical structure involving RR and XX edges very simple.

Note that the initial graph Δ=Δ(P)\Delta=\Delta(P) trivially satisfies the properties (i)–(vii).

7. Maximally expanded edges and sketch of the expansion

The following definition introduces a notion that underlies our entire expansion. Note that it only applies to GG-edges.

The expansion relies of three main operations:

make one of the GG-entries maximally expanded by adding upper indices using the identity (3.7);

expand all off-diagonal maximally expanded GG-entries of in terms of XX using the identity (3.9);

expand all diagonal maximally expanded GG-entries in terms of XX using (3.8).

We shall implement each ingredient by a graph surgery procedure. Operation (a) is the subject of Section 5.8; it creates two new graphs, τ0(Γ)\tau_{0}(\Gamma) and τ1(Γ)\tau_{1}(\Gamma), from an initial graph Γ\Gamma. Operation (b) is the subject of Section 5.9; it creates one new graph, ρ(Γ)\rho(\Gamma), from an initial graph Γ\Gamma. As it turns out, Operations (a) and (b) have to be performed in tandem using a coupled recursion, described by a tree T\mathcal{T}, which is the subject of Section 5.10. Once this recursion has terminated, Operation (c) may be performed (see Section 5.11).

In order to avoid trivial ambiguities, we choose and fix an arbitrary ordering of the vertices V(Δ)V(\Delta) and of the family of resolvent entries (G~ab(T):a,bT)({\widetilde{G}_{ab}^{(T)}\mathrel{\mathop{:}}a,b\notin T}). Hence we may speak of the first vertex of V(Δ)V(\Delta) and the first factor of a monomial in the entries G~ab(T)\widetilde{G}_{ab}^{(T)}.

We now describe Operation (a) of the expansion. It relies on the identities

Moreover, it follows immediately that the maps τ0\tau_{0} and τ1\tau_{1} do not change the vertices, so that we have

The procedure Γ(τ0(Γ),τ1(Γ))\Gamma\mapsto(\tau_{0}(\Gamma),\tau_{1}(\Gamma)) may also be explicitly described on the level graphs alone, but we shall neither need nor do this. Instead, we give a graphical depiction of this process in Figure 5.5.

If Γ\Gamma satisfies the properties (i)–(vii) from Sections 5.5 and 5.6 then so do τ0(Γ)\tau_{0}(\Gamma) and τ1(Γ)\tau_{1}(\Gamma).

9. Operation (b): construction of the graph ρ​(Γ)𝜌Γ\rho(\Gamma)

In this section we give the second operation, (b), outlined in Section 5.7. The idea is that Operation (a) from Section 5.5 generates off-diagonal GG-entries that are maximally expanded. They in turn will have to be expanded further using (3.9), so as to extract their explicit XX-dependence. Roughly, the map ρ\rho replaces each maximally expanded off-diagonal GG-edge by an off-diagonal RR-group.

where z~\widetilde{z}^{*} denotes the complex conjugate of z~\widetilde{z}. Note that the first diagonal term on the right-hand side is not maximally expanded (while the second one is).

Like τ0\tau_{0} and τ1\tau_{1}, the map ρ\rho may be explicitly defined on the level of graphs, which we shall however not do in order to avoid unnecessary and heavy notation. See Figure 5.6 for an illustration of ρ\rho.

The following result is an immediate corollary of the definition of ρ\rho.

If Γ\Gamma satisfies the properties (i)–(vii) from Sections 5.5 and 5.6 then so does ρ(Γ)\rho(\Gamma).

10. Constructing the tree 𝒯𝒯\mathcal{T}: recursion using (a) and (b)

The algorithm generates a family of graphs Θσ\Theta_{\sigma} which are indexed by finite binary strings σ\sigma, or, equivalently, by vertices of a rooted binary tree T=(V(T),E(T))\mathcal{T}=(V(\mathcal{T}),E(\mathcal{T})). We start the algorithm with Θ:=Δ\Theta_{\emptyset}\mathrel{\mathop{:}}=\Delta, corresponding to the empty string or the root of the tree. The tree is constructed recursively according to

and so on, until a stopping rule is satisfied (see Definition 5.7 below). See Figure 5.7 for an illustration of the resulting tree.

The construction of the tree T\mathcal{T} relies on the following stopping rule.

The tree T\mathcal{T}, along with the graphs (Θσ)σV(T)(\Theta_{\sigma})_{\sigma\in V(\mathcal{T})}, is constructed recursively from the trivial tree, consistsing of the single vertex \emptyset with Θ=Δ\Theta_{\emptyset}=\Delta, as follows. Let σ\sigma be a leaf of the tree such that Θσ\Theta_{\sigma} does not satisfy the stopping rule. We add the children of σ\sigma, i.e. 0σ0\sigma and 1σ1\sigma, to the tree, and set

We continue this recursion on each leaf until all leaves satisfy the stopping rule from Definition 5.7. By Lemma 5.9 below, the resulting tree T\mathcal{T} is finite, i.e. the recursion terminates after a finite number of steps.

The graphs Θσ\Theta_{\sigma} have the following two properties. First,

In particular, the set of black vertices remains unchanged throughout the recursion: Vb(Θσ)=Vb(Δ)V_{b}(\Theta_{\sigma})=V_{b}(\Delta). Second,

Moreover, each Θσ\Theta_{\sigma} for σV(T)\sigma\in V(\mathcal{T}) satisfies the properties (i)–(vii) from Sections 5.5 and 5.6.

The identity (5.32) follows immediately from (5.27) and (5.30). Similarly, (5.33) follows from (5.26) and (5.29). The final statement follows immediately from Lemmas 5.5 and 5.6. ∎

The interpretation of (5.33) is that the value of any graph Θσ\Theta_{\sigma} is equal to the sum of the values of its two children, Θ0σ\Theta_{0\sigma} and Θ1σ\Theta_{1\sigma}.

Next, we express Y(Δ)Y(\Delta) from (5.17) in terms of the graphs we just introduced. By Lemma 5.8 and the fact that Vb(Δ)=V(Δ)V_{b}(\Delta)=V(\Delta), we have for all σV(T)\sigma\in V(\mathcal{T}) that

Let L(T)V(T)L(\mathcal{T})\subset V(\mathcal{T}) denote the leaves of T\mathcal{T}. The identity (5.33) states that if σV(T)\sigma\in V(\mathcal{T}) is not a leaf of T\mathcal{T}, we may replace the value of Θσ\Theta_{\sigma} by the sum of the values of its two children. Starting from the root \emptyset and the graph Θ=Δ\Theta_{\emptyset}=\Delta, we may propagate this identity recursively from the root down to the leaves. We conclude that

Recalling the definition (5.17) of Y(Δ)Y(\Delta), we get the following result.

The quantity Y(Δ)Y(\Delta) defined in (5.17) may be written in terms of the tree T\mathcal{T} as

By Definition 5.7, if σL1(T)\sigma\in L_{1}(\mathcal{T}) is a nontrivial leaf then all GG-edges of Θσ\Theta_{\sigma} are diagonal and maximally expanded. The estimate of the nontrivial leaves will be performed in Sections 5.12–5.14.

11. The trivial leaves

In this section we estimate the contribution of Θσ\Theta_{\sigma} for a trivial leaf σL0(T)\sigma\in L_{0}(\mathcal{T}). Thus, fix σL0(T)\sigma\in L_{0}(\mathcal{T}). From (5.28) and Lemma 5.2 we get for aba\neq b

We therefore conclude that each off-diagonal RR-group of Γ\Gamma yields a contribution of size O(ϕ1/2Ψ)O_{\prec}(\phi^{-1/2}\Psi) after summation over the indices associated with the vertices incident to its centre. Moreover, by definition of T\mathcal{T}, each RR-group of Θσ\Theta_{\sigma} is off-diagonal. In addition, each off-diagonal GG-edge yields a contribution of size ϕ1/2Ψ\phi^{-1/2}\Psi by Lemma 5.2. Thus we get, summing out all indices associated with white vertices (i.e. inner vertices of RR-groups),

Hence the contribution of Θσ\Theta_{\sigma} to the right-hand side of (5.36) may be bounded by

For any pp there exists a constant CpC_{p} such that for any graph Γ\Gamma and any eE(Γ)e\in E(\Gamma) we have

The cases ξ1(e){G,G,R,R}\xi_{1}(e)\in\{G,G^{*},R,R^{*}\} and ξ2(e)=+\xi_{2}(e)=+ are dealt with the deterministic estimates

which follows from Gij(T),Rij(T)η1N\lvert G_{ij}^{(T)}\rvert,\lvert R_{ij}^{(T)}\rvert\leqslant\eta^{-1}\leqslant N. The cases ξ1(e){X,X}\xi_{1}(e)\in\{X,X^{*}\} follow immediately from (2.3). Finally, the cases ξ1(e){G,G}\xi_{1}(e)\in\{G,G^{*}\} and ξ2(e)=\xi_{2}(e)=- follow easily from (3.8). ∎

Using Lemma 5.9, we therefore conclude that the contribution of all trivial leaves to the right-hand side of (5.36) is bounded by

Here we used the bound ΨCNω/2\Psi\leqslant CN^{-\omega/2}, which follows from the definitions (3.15), (2.11), and (3.5).

12. The nontrivial leaves I: Operation (c)

From now on we focus on the nontrivial leaves, σL1(T)\sigma\in L_{1}(\mathcal{T}). Our goal is to prove the following estimate, which is analogous to (5.38). Its proof will be the content of this and the two following subsections, and will be completed at the end of Section 5.14.

In order to handle entries in the numerator, we rewrite this identity in the form

This is our main expansion for the diagonal GG-entries in the numerator. Both formulas (5.40) and (5.42) have trivial analogues for the Hermitian conjugate G^aa\widehat{G}_{aa}^{*}.

Exactly as in Section 5.11, we may brutally estimate the contribution of the rest term on the right-hand side of (5.43) by

Hence, in order to complete the proof of Proposition 5.12, it suffices to prove that for all σL1(T)\sigma\in L_{1}(\mathcal{T}) and all ΓG(Θσ)\Gamma\in\mathfrak{G}(\Theta_{\sigma}) we have

As before, the map ΘσG(Θσ)\Theta_{\sigma}\mapsto\mathfrak{G}(\Theta_{\sigma}) may be explicitly given on the level of graphs, but we refrain from doing so. Instead, we illustrate this process for some simple cases in Figure 5.8.

13. The nontrivial leaves II: taking the expectation

Let us now consider a nontrivial leaf σL1(T)\sigma\in L_{1}(\mathcal{T}). By definition of L1(T)L_{1}(\mathcal{T}), all GG-edges of Θσ\Theta_{\sigma} are diagonal and maximally expanded. Therefore, any ΓG(Θσ)\Gamma\in\mathfrak{G}(\Theta_{\sigma}) does not contain any GG-edges. This was the goal of the expansion generated by Operations (a)–(c). Hence, each ΓG(Θσ)\Gamma\in\mathfrak{G}(\Theta_{\sigma}) consists solely of RR-groups.

We recall from Property (vii) in Section 5.6 that each white vertex jVw(Γ)j\in V_{w}(\Gamma) is adjacent in Γ\Gamma to a unique black vertex π(j)πΓ(j)\pi(j)\equiv\pi_{\Gamma}(j). For each iVb(Γ)i\in V_{b}(\Gamma) we introduce a partition ζi\zeta_{i} of the subset of white vertices π1({i})\pi^{-1}(\{i\}), and constrain the values of the indices (aj:π(j)=i)(a_{j}\mathrel{\mathop{:}}\pi(j)=i) to be compatible with ζi\zeta_{i}. On the level of graphs, such a partition amounts to merging vertices in π1({i})\pi^{-1}(\{i\}). Abbreviate ζ=(ζi)iVb(Γ)\zeta=(\zeta_{i})_{i\in V_{b}(\Gamma)}, and denote by Γζ\Gamma_{\zeta} the graph obtained from Γ\Gamma by merging, for each iVb(Γ)i\in V_{b}(\Gamma), the vertices adjacent to ii according to ζi\zeta_{i}. Note that, like Γ\Gamma, each Γζ\Gamma_{\zeta} satisfies the properties (i)–(vi) from Section 5.5, but, unlike Γ\Gamma, in general Γζ\Gamma_{\zeta} does not satisfy the property (vii) from Section 5.6. See Figure 5.9 for an illustration of the mapping ΓΓζ\Gamma\mapsto\Gamma_{\zeta}.

which constrains the summation indices associated with different white vertices adjacent to the same black vertex to have different values. By definition of Γζ\Gamma_{\zeta}, we therefore have

and abbreviated ER()E_{R}(\cdot) for the set of RR-edges and Ei()E_{i}(\cdot) for the set of XX-edges incident to ii. Here we used the independence described above.

Note that by definition of Γζ\Gamma_{\zeta} we have Vb(Γζ)=Vb(Γ)V_{b}(\Gamma_{\zeta})=V_{b}(\Gamma) and degΓ(i)=degΓζ(i)\deg_{\Gamma}(i)=\deg_{\Gamma_{\zeta}}(i) for all iVb(Γ)i\in V_{b}(\Gamma). For the following we therefore drop the argument of VbV_{b}. Define the subset

For iVb(Γ)i\in V_{b}(\Gamma) let nζ(i)n_{\zeta}(i) denote the number of vertices of Γζ\Gamma_{\zeta} adjacent to ii (these are all white since there are no GG-edges in Γζ\Gamma_{\zeta}, which are the only edges that join two black vertices).

Recalling (5.46), we assume without loss of generality that, for each iVbi\in V_{b}, eack block of ζi\zeta_{i} has size at least two; in particular, we assume that for each iVbi\in V_{b} we have degΓ(i)2\deg_{\Gamma}(i)\geqslant 2. From (5.46) we get

By definition, τ0\tau_{0} and ρ\rho leave deg(i)\deg(i) invariant, and τ1\tau_{1} increases deg(i)\deg(i) by or 44. In particular, they all leave the parity of deg(i)\deg(i) invariant for iVbi\in V_{b}. We conclude that degΓ(i)\deg_{\Gamma}(i) is odd for each iVbi\in V_{b}^{*}. Since each block of ζi\zeta_{i} has size at least two, we find that

The proof is then completed by the following claim.

If degΓ(i)2\deg_{\Gamma}(i)\geqslant 2 for all iVbi\in V_{b}^{*} then

What remains is to prove (5.47). Since Δ\Delta has pp edges, we find

As observed above, τ0\tau_{0} and ρ\rho leave deg(i)\deg(i) invariant, and τ1\tau_{1} increases deg(i)\deg(i) by or 44. Let iVbi\in V_{b}^{*}. Since by assumption degΓ(i)2\deg_{\Gamma}(i)\geqslant 2, we find that in fact degΓ(i)5\deg_{\Gamma}(i)\geqslant 5. This yields

Recall the quantity dd from (5.31), defined as the number of off-diagonal GG-edges plus the number of off-diagonal RR-groups. By definition of Δ\Delta, d(Δ)=pd(\Delta)=p. Moreover, τ0\tau_{0}, τ1\tau_{1}, and ρ\rho do not decrease dd. Since by construction Γ\Gamma has no GG-entries, we conclude that Γ\Gamma has at least pp off-diagonal RR-groups. We may therefore choose a set Eo(Γ)ER(Γ)E_{o}(\Gamma)\subset E_{R}(\Gamma) of size at least pp, such that each eEo(Γ)e\in E_{o}(\Gamma) is the centre of an off-diagonal RR-group (see Section 5.6). The set EoE_{o} is naturally mapped into ER(Γζ)E_{R}(\Gamma_{\zeta}), and is denoted by Eo(Γζ)E_{o}(\Gamma_{\zeta}). We denote by α(e)\alpha(e) and β(e)\beta(e) the end points of ee in Γζ\Gamma_{\zeta}. By (5.4), we have

As in Section 5.11, it is easy to take the expectation using Lemma 3.2 to get

where the last step follows from the fact that, by definition of Δ\Delta, degΔ(i)1\deg_{\Delta}(i)\geqslant 1 for all iVbi\in V_{b}, as well as the estimate

Summing over σL1(T)\sigma\in L_{1}(\mathcal{T}) concludes the proof of Proposition 5.12.

15. Conclusion of the proof of Theorem 3.11

Combining (5.38) and Proposition 5.12, and recalling (5.36) and (5.16), yieds

Now (5.7), and hence (3.16), follows by a simple application of Chebyshev’s inequality. Let ε>0\varepsilon>0 and DD be given. Then

for pε1D+2p\geqslant\varepsilon^{-1}D+2. This, together with Remark 2.2 which allows us to interchange MM and NN in Definition 2.1, concludes the proof of (3.16).

Finally, we outline the proof of (3.17), which is very similar to that of (3.16). The expansion of Sections 5.4–5.12 may be taken over by swapping the roles of RR and G~\widetilde{G}. In other words, we use Lemma 3.8 instead of Lemma 3.6. The arguments from Sections 5.13 and 5.14 carry over with straightforward adjustments in the power counting of Section 5.14. We leave the details to the interested reader. This concludes the proof of Theorem 3.11.

Proof of Theorems 3.12 and 3.13

We only prove (3.19); the proof of (3.20) is the same, using (3.17) instead of (3.16). Moreover, to simplify notation, we assume that Eγ++N2/3+ωE\geqslant\gamma_{+}+N^{-2/3+\omega}; the case EγN2/3+ωE\leqslant\gamma_{-}-N^{-2/3+\omega} is handled in exactly the same way.

Note first that if ηκ\eta\geqslant\kappa then it is easy to see that (3.19) follows from (3.16), (3.6), and the lower bound ηκN2/3\eta\geqslant\kappa\geqslant N^{-2/3}. For the following we therefore assume that ηκ\eta\leqslant\kappa. By Lemma 3.3, for ηκ\eta\leqslant\kappa we have

By polarization and linearity, it therefore suffices to prove that

Since (6.1) holds at z0z_{0} by (3.16), it is enough to prove the estimates

What remains is to prove (6.3). By Theorem 2.10 we have Eλ1+η0E\geqslant\lambda_{1}+\eta_{0} with high probability since η0N2/3+ω/4\eta_{0}\geqslant N^{-2/3+\omega/4}. Therefore, since ηη0Eλ1Eλα\eta\leqslant\eta_{0}\leqslant E-\lambda_{1}\leqslant E-\lambda_{\alpha} for all α1\alpha\geqslant 1, we get

as follows from (3.6) and the estimate z02ϕ\lvert z_{0}\rvert^{2}\asymp\phi.

Finally, we estimate the real part of the error in (6.3) using

with high probability, where in the last step we used that η0Eλ1\eta_{0}\leqslant E-\lambda_{1} with high probability. Combining (6.4) and (6.5) completes the proof of (6.3), and hence of Theorem 3.12. ∎

Proofs for generalized Wigner matrices

In this section we explain how to modify the arguments of Sections 3–6 to the case of generalized Wigner matrices, and hence how to complete the proof of the results from Section 2.2. Since we are dealing with generalized Wigner matrices, in this section we consistently use the notations from Section 2.2 instead of Section 2.1.

the distance from EE to the spectral edges ±2\pm 2.

For z2ω1\lvert z\rvert\leqslant 2\omega^{-1} we have

(All implicit constants depend on ω\omega.)

The proof is an elementary calculation; see Lemma 4.2 in EYY2 . ∎

The following definition is the analogue of Definitions 3.5 and 3.7. (Note that for generalized Wigner matrices we always simultaneously remove a row and the corresponding column.)

For T{1,,N}T\subset\{1,\dots,N\} we define H(T)H^{(T)} by

Moreover, for i,jTi,j\notin T we define the resolvent of the minor through

When T={a}T=\{a\}, we abbreviate ({a})(\{a\}) by (a)(a) in the above definitions; similarly, we write (ab)(ab) instead of ({a,b})(\{a,b\}).

We shall also need the following resolvent identities, proved in Lemma 4.2 of EYY1 and Lemma 6.10 of EKYY2 .

For i,jki,j\neq k and i,j,kTi,j,k\notin T the identity (3.7) holds. Moreover, for iji\neq j and i,jTi,j\notin T we have

Finally, for iTi\notin T we have Schur’s formula

From (7.3) we find for iji\neq j and i,jTi,j\notin T that

2. The isotropic law: proof of Theorem 2.12

As in (5.7), and using (2.28) instead of (4.2), it suffices to prove that

The quadratic term in the expansion of 1/Gii1/G_{ii} and GijG_{ij} contains an entry of GG and not of RR. (The matrix RR is not even defined for generalized Wigner matrices.)

Both (7.4) and (7.5) contain an additional term, an entry of HH.

Of these differences, the first one is minor. In order to mimic the bookkepping of Section 5, we still speak of black and white vertices. The simplest definition of our colouring is that the vertices of Δ\Delta are black and any other vertices that were added to Δ\Delta are white; see the explanation below (5.20). The second difference leads to a slightly larger class of graphs, but the new graphs will always be of subleading order. An alternative viewpoint is that the additional entry of HH on the right-hand side of (7.4) and (7.5) should be regarded as a negligible error term. Almost all of the differences highlighted below stem from this additional term.

We now sketch the argument for each of the eight steps, by highlighting the changes as compared to Section 5.

This concludes sketch of how the argument of Section 5 is to be modified for the proof of Theorem 2.12. We omit further details. Finally, Theorems 2.15 and 2.16 follow from Theorem 2.12 by repeating the arguments of Section 6 almost to the letter.

References