Random matrices: Sharp concentration of eigenvalues

Terence Tao, Van Vu

Introduction

The purpose of this paper is to sharpen the existing bounds on the eigenvalue counting function $N_{I}=N_{I}(W_{n})$ of a (normalized) Wigner matrix $W_{n}=\frac{1}{\sqrt{n}}M_{n}$ , and related quantities such as the Stieltjes transform $s_{W_{n}}(z)$ and individual eigenvalues $\lambda_{i}(W_{n})$ . Let us first state the Wigner random matrix model which we will use.

Let $n\geq 1$ be an integer (which we view as a parameter going off to infinity; in particular, $n$ is understood to be large enough that quantities such as $\log\log n$ are well-defined and positive). An $n\times n$ Wigner matrix $M_{n}$ is defined to be a random Hermitian $n\times n$ matrix $M_{n}=(\xi_{ij})_{1\leq i,j\leq n}$ , in which the $\xi_{ij}$ for $1\leq i\leq j\leq n$ are jointly independent with $\xi_{ji}=\overline{\xi_{ij}}$ (in particular, the $\xi_{ii}$ are real-valued). For $1\leq i<j\leq n$ , we require that the $\xi_{ij}$ have mean zero and variance one, while for $1\leq i=j\leq n$ we require that the $\xi_{ij}$ (which are necessarily real) have mean zero and variance $\sigma^{2}$ for some $\sigma^{2}>0$ independent of $i,j,n$ . For simplicity, we will also assume that for each $1\leq i<j\leq n$ , the real and imaginary parts ${\operatorname{Re}}\xi_{ij}$ , ${\operatorname{Im}}\xi_{ij}$ are independent. We refer to the distributions ${\operatorname{Re}}\xi_{ij}$ , ${\operatorname{Im}}\xi_{ij}$ for $1\leq i<j\leq n$ and $\xi_{ii}$ for $1\leq i\leq n$ as the atom distributions of $M_{n}$ , and view them as fixed while $n$ goes off to infinity.

We say that the Wigner matrix ensemble obeys Condition C0 if we have the exponential decay condition

for all $1\leq i,j\leq n$ and $t\geq C^{\prime}$ , and some constants $C,C^{\prime}$ (independent of $i,j,n$ ).

Two Wigner matrices $M_{n}=(\xi_{ij})_{1\leq i,j\leq n}$ and $M^{\prime}_{n}=(\xi^{\prime}_{ij})_{1\leq i,j\leq n}$ are said to have matching moments to order $m$ for some $m\geq 0$ if one has

for all $1\leq i,j\leq n$ and all natural numbers $k,l\geq 0$ with $k+l\leq m$ . As we are assuming the real and imaginary parts to be independent, this condition simplifies to the conditions

for all $1\leq i,j\leq n$ and all $0\leq k\leq m$ . If we only require (2) or (3) to hold in the off-diagonal case $i\neq j$ (resp. in the diagonal case $i=j$ ), we say that $M_{n}$ and $M^{\prime}_{n}$ match moments to order $m$ off the diagonal (resp. on the diagonal).

We observe four basic examples of Wigner matrices:

In the symmetric Bernoulli ensemble, $\xi_{ij}$ equals $+1$ with probability $1/2$ and $-1$ with probability $1/2$ for all $1\leq i,j\leq n$ , and $\sigma^{2}=1$ .

In the complex Hermitian Bernoulli ensemble, ${\operatorname{Re}}\xi_{ij},{\operatorname{Im}}\xi_{ij}$ for $1\leq i<j\leq n$ and $\xi_{ii}$ for $1\leq i\leq n$ all equal $+1$ with probability $1/2$ and $-1$ with probability $1/2$ , and $\sigma^{2}=1$ .

Note that we do not require the off-diagonal $\xi_{ij}$ , $1\leq i<j\leq n$ (or the diagonal $\xi_{ii}$ , $1\leq i\leq n$ ) to be identically distributed. This lack of an identical distribution hypothesis will be convenient when we apply the Lindeberg exchange strategy , in which one Wigner matrix is compared to another one by exchanging the entries of the former matrix with the latter oneMore precisely, we exchange the diagonal entries one at a time, and the off-diagonal entries two at a time, in order to preserve the Hermitian property throughout. at a time. As such, the intermediate stages of this exchange process need not have identically distributed entries, even if the initial and final matrices do.

The hypothesis of independence of real and imaginary parts is imposed purely to simplify the exposition, and can easily be removed at the cost of some more complicated notation; in particular, the simpler moment matching condition (3) must be replaced by the more complicated condition (2). See Remark 23.

In this paper, we will mostly deal with the (coarse-scale) normalization $W_{n}:=\frac{1}{\sqrt{n}}M_{n}$ of $M_{n}$ of the Wigner matrix, and more specifically with the eigenvalue counting function

Let $M_{n}$ be a Wigner Hermitian matrix obeying Condition C0. Then for any fixed interval $I$ (independent of $n$ ), one has

See for instance for a proof of this theorem and for historical background. Condition C0 can be omitted from this law, but we retain the hypothesis as it will be needed for the subsequent results discussed below.

If we use $o(x)$ to denote a quantity that goes to zero as $n\to\infty$ after dividing by $x$ , we can reformulate Theorem 3 as the assertion that the asymptotic

holds with probability $1-o(1)$ for each fixed $I$ .

One can also phrase the semicircular law in terms of the individual eigenvalues $\lambda_{i}(W_{n})$ . If for each $1\leq i\leq n$ we define the classical location $\gamma_{i}$ of the normalised $i^{\operatorname{th}}$ eigenvalue by the formula

then the Wigner semicircular law (combined with an almost sure bound of $(2+o(1))\sqrt{n}$ for the operator norm of $M_{n}$ , due to Bai and Yin ) is equivalent to the assertion that one has

for any given $1\leq i\leq n$ , with probability $1-o(1)$ .

In this paper we investigate sharper versions of the semicircular law (known in the literature as local semicircular laws), which improve upon the error terms and failure probabilities in (4) and (6), and in which the interval $I$ is now allowed to depend on $n$ .

We first discuss the case of the Gaussian Unitary Ensemble (GUE), which is the most well-understood case, as the joint distribution of the eigenvalues is given by a determinantal point process. Because of this, it is known that for any interval $I$ , the random variable $N_{I}(W_{n})$ in the GUE case obeys a law of the form

where the $\eta_{i}=\eta_{i,n,I}$ are jointly independent indicator random variables (i.e. they take values in $\{0,1\}$ ); see e.g. [3, Corollary 4.2.24]. The mean and variance of $N_{I}(W_{n})$ can also be computed in the GUE case with a high degree of accuracy:

Let $M_{n}$ be drawn from GUE, let $W_{n}:=\frac{1}{\sqrt{n}}M_{n}$ , and let $I=[-\infty,x]$ for some real number $x$ (which may depend on $n$ ). Let ${\varepsilon}>0$ be independent of $n$ .

(Bulk case) If $x\in[-2+{\varepsilon},2-{\varepsilon}]$ , then

(Variance bound) If one has $x\in[-2,2-{\varepsilon}]$ and $n^{2/3}(2+x)\to\infty$ as $n\to\infty$ , one has

In particular, one has $\mathbf{Var}N_{I}(W_{n})=O(\log n)$ in this regime.

Here of course we use $X=O(Y)$ , $X\ll Y$ or $Y\gg X$ to denote the estimate $|X|\leq CY$ for some quantity $C$ independent of $n$ . We will also use $c$ to denote various small positive constants $c>0$ independent of $n$ (but possibly depending on the constants in Condition C0).

See [21, Lemmas 2.1, 2.2, 2.3]. Note that the normalization conventions in differ by a factor of $\sqrt{2}$ from the ones used hereThere is a slight inaccuracy in the statement of [21, Lemma 2.2], namely that the main term of $\frac{4\sqrt{2}}{3\pi}n(1-t)^{3/2}$ in that lemma should be replaced with the more accurate main term $\frac{2n}{\pi}\int_{t}^{1}\sqrt{1-x^{2}}\ dx$ (which is what actually comes out of the proof of [21, Lemma 2.2]). These two main terms differ by $O(1)$ in the regime $t=1-O(n^{-2/5})$ as can be seen from a Taylor expansion, but they differ by more than $O(1)$ outside of this regime.. ∎

By combining these estimates with a well-known inequality of Bennett , we obtain a concentration estimate for $N_{I}(W_{n})$ in the GUE case:

Let $M_{n}$ be drawn from GUE, let $W_{n}:=\frac{1}{\sqrt{n}}M_{n}$ , and let $I$ be an interval. Then one has

By the triangle inequality we may take $I=[-\infty,x]$ for some real number $x$ . As $\rho_{\operatorname{sc}}$ is supported on $ $and has total mass$ 1 $, we see (using the trivial bounds$ 0\leq N_{I}(W_{n})\leq n $and$ N_{I}(W_{n})\leq N_{J}(W_{n}) $whenever$ I\subset J $) that without loss of generality we may assume$ x\in $. By (7) and Theorem 4,$ N_{I}(W_{n}) $is then the sum of independent indicator functions, and the mean$ \mu $and variance$ \sigma^{2}$ of this sum is given by

and $\sigma^{2}=O(\log n)$ respectively. Bennett’s inequality (see , or [23, p.29]) then asserts that

where $\phi(x):=(1+x)\log(1+x)-x$ . Since $\phi(x)\gg x$ when $x\gg 1$ , the claim followsIndeed, this argument shows a slightly better bound than $\exp(-cT)$ . One can also use Bernstein’s inequality to also obtain the $\exp(-cT)$ bound if desired.. ∎

Let us say that an event holds with overwhelming probability if it occurs with probability $1-O(n^{-A})$ for each fixed $A$ . From the above corollary we see in particular that in the GUE case, one has

with overwhelming probability for each fixed $I$ , and an easy union bound argument (ranging over all intervals $I$ in, say, $ $whose endpoints are a multiple of$ n^{-100} $(say)) then shows that this is also true uniformly in$ I$ as well.

By using a general result of Costin and Lebowitz , one can also obtain a central limit theorem for $N_{I}(W_{n})$ as long as $I$ is not too small; see . Such results have also been recently been extended to more general Wigner matrices in . However, such theorems will not be the focus of the current paper.

Now we turn from the GUE case to more general Wigner ensembles. There has been much interest in recent years in obtaining concentration results for $N_{I}(W_{n})$ (and for closely related objects, such as the Stieltjes transform $s_{W_{n}}(z):=\frac{1}{n}\operatorname{trace}(W_{n}-z)^{-1}$ of $W_{n}$ ) for short intervals $I$ , due to the applicability of such results to establishing various universality properties of such matrices; see . The previous best result in this direction was by Erdős, Yau, and Yin (see also for a variant):

Let $M_{n}$ be a Wigner matrix obeying Condition C0, and let $W_{n}:=\frac{1}{\sqrt{n}}M_{n}$ . Then, for any interval $I$ , one has

for all $T\geq\log^{A\log\log n}n$ , and some constant $A>0$ .

One can reformulate (8) equivalently as the assertion that

In particular, this theorem asserts that with overwhelming probability one has

for all intervals $I$ . The proof of the above theorem is somewhat lengthy, requiring a delicate analysis of the self-consistent equation of the Stieltjes transform of $W_{n}$ . A recent preprint of Götze and Tikhomirov has claimedAt the current time of writing, the preprint is being revised to address some gaps in the proofs of some lemmas in that paper, specifically Lemmas 5.2 and 5.3 from (private communication). an improvement to this result, namely that

with probability $1-O(\exp(-c\log n(\log\log n)^{\alpha}))$ for certain explicit exponents $C,\alpha$ . This claim would imply as a consequence that for any interval $I$ , $N_{I}(W_{n})$ has variance $O(\log^{O(1)}n)$ .

Comparing Theorem 7 with the previous results for the GUE case, we see that there is a loss of a double logarithm $\log\log n$ in the exponent. The first main result of this paperWe would like to thank M. Ledoux for a private conversation that led to this question. is to remove this double logarithmic loss, at least under an additional vanishing moment assumption:

Let $M_{n}$ be a Wigner matrix obeying Condition C0, and let $W_{n}:=\frac{1}{\sqrt{n}}M_{n}$ . Assume that $M_{n}$ matches moments with GUE to third order off the diagonal (i.e. ${\operatorname{Re}}\xi_{ij},{\operatorname{Im}}\xi_{ij}$ have variance $1/2$ and third moment zero). Then, for any interval $I$ , one has

This estimate is phrased for any $T$ , but the bound only becomes non-trivial when $T\gg\log^{C}n$ for some sufficiently large $C$ . In that regime, we see that this result removes the double-logarithmic factor from Theorem 7; it is also comparable to the result (9) from when $T=\log^{O(1)}n$ (though not with as sharp a set of exponents as , and one also needs an additional moment matching hypothesis), but gives additional large deviation bounds when $T$ is much larger than $\log^{O(1)}n$ .

As we are assuming ${\operatorname{Re}}(\xi_{ij})$ and ${\operatorname{Im}}(\xi_{ij})$ to be independent, the moment matching condition simplifies to the constraints that ${\mathbf{E}}{\operatorname{Re}}(\xi_{ij})^{2}={\mathbf{E}}{\operatorname{Im}}(\xi_{ij})^{2}=\frac{1}{2}$ and ${\mathbf{E}}{\operatorname{Re}}(\xi_{ij})^{3}={\mathbf{E}}{\operatorname{Im}}(\xi_{ij})^{3}=0$ . However, it is possible to extend this theorem to the case when the real and imaginary parts of $\xi_{ij}$ are not independent; see Remark 23.

The constant $c$ in the bound in Theorem 8 is quite decent in several cases. For instance, if the atom variables of $M_{n}$ are Bernoulli or have sub-gaussian tail, then we can set $c=2/5-o(1)$ by optimizing our arguments (details omitted). If we assume 4 matching moments rather than 3, then we can set $c=1$ (see Remark 26), matching the bound in Corollary 5. It is an interesting question to determine the best value of $c$ . The value of $c$ in is implicit and rather small.

We prove Theorem 8 in Sections 2-4. Our argument differs from that in in that it only uses a relatively crude analysis of the self-consistent equation to obtain some preliminary bounds on the Stieltjes transform and on $N_{I}$ (which were also essentially implicit in previous literature). Instead, the bulk of the argument relies on using the Lindeberg swapping strategy to deduce concentration of $N_{I}(W_{n})$ in the non-GUE case from the concentration results in the GUE case provided by Corollary 5. In order to keep the error terms in this swapping under control, three matching momentsCompare with the “four moment theorem” from . We need one less moment here because we are working at “mesoscopic” scales (in which the number of eigenvalues involved is much larger than $1$ ) rather than at “microscopic” scales. However, in Theorem 14 below, only one eigenvalue is involved, making the problem microscopic enough to require four moments instead of three. are needed.

Very roughly speaking, the main idea of the argument is to show that high moments such as

are quite stable (in a multiplicative sense) if one swaps (the real or imaginary part of) one of the entries of $W_{n}$ (and its adjoint) with another random variable that matches the moments of the original entry to third order. For technical reasons, however, we do not quite manipulate $N_{I}(W_{n})$ directly, but instead work with a proxy for this quantity, namely a certain integral of the Stieltjes transform of $W_{n}$ . As observed in , the Lindeberg swapping argument is quite simple to implement at the level of the Stieltjes transform (due to the simplicity of the resolvent identities, when compared against the rather complicated Taylor expansions of individual eigenvalues used in ).

The result in Theorem 8 is well suited for controlling eigenvalues in the bulk of the spectrum, but is not sufficient by itself to control eigenvalues at the edge, and in particular the largest eigenvalue $\lambda_{1}(W_{n})$ and the smallest eigenvalue $\lambda_{n}(W_{n})$ . However, it is known that these eigenvalues are highly concentrated around $+2$ and $-2$ respectively. In the GUE case, we have the following concentration result of Aubrun :

Let $M_{n}$ be drawn from GUE, let $W_{n}:=\frac{1}{\sqrt{n}}M_{n}$ . Then one has

As is well known, the random variable $n^{2/3}(\lambda_{1}(W_{n})-2)$ in fact converges in distribution to the Tracy-Widom law . However, we will not focus on this law here. The exponent $3/2$ on the right-hand side cannot be improved (indeed, it matches the decay rate of the Tracy-Widom law); see for further discussion.

This result was partially extended to the Wigner case in :

Let $M_{n}$ be a Wigner matrix obeying Condition C0, and let $W_{n}:=\frac{1}{\sqrt{n}}M_{n}$ . Then one has

for all $T\geq\log^{A\log\log n}n$ , for some $A>0$ independent of $n$ . By symmetry, one also has

As before, we can reformulate (10) equivalently as the assertion that

Our second main result is to remove the double logarithm from Theorem 13, at the cost of requiring matching GUE to fourth order rather than to third order:

Let $M_{n}$ be a Wigner matrix obeying Condition C0, and let $W_{n}:=\frac{1}{\sqrt{n}}M_{n}$ . Assume that $M_{n}$ matches moments with GUE to fourth order off the diagonal and second order on the diagonal (i.e. $\sigma^{2}=1$ ). Then one has

for any $T>0$ . By symmetry, one then also has

We will derive Theorem 14 from Theorem 11 in Section 5 using the same techniques used to derive Theorem 8 from Corollary 5.

By combining Theorem 8 and Theorem 14 one can “solve” for individual eigenvalues $\lambda_{i}(W_{n})$ to obtain an appropriate concentration (localization) result:

If we assume only three matching moments, then the above estimate still holds provided that we have the additional hypothesis

for some fixed $c^{\prime}>0$ (where the constant $c$ above is allowed to depend on $c^{\prime}$ ).

The second part of this corollary significantly improves [30, Theorem 29]. (As a matter of fact, the original proof of this theorem has a gap in it; see [33, Appendix A] for a further discussion.)

First assume four matching moments. By Theorems 8, 14 and the union bound, we see that outside of an event of probability $n^{O(1)}\exp(-cT^{c})$ , we have

for all intervals $I$ , as well as the bounds

Some elementary estimation of the semicircular density $\rho_{\operatorname{sc}}$ and its integrals $\int_{I}\rho_{\operatorname{sc}}(y)\ dy$ (cf. [17, §5]) then gives

for all $1\leq i\leq n$ . The claim follows (possibly after adjusting $T$ by a multiplicative factor).

Now suppose we only have three matching moments. Then by Theorem 8 and the union bound, we may assume that

for all $I$ . In particular (setting $I$ equal to $[2,+\infty)$ or $(-\infty,-2]$ ) this implies that $-2\leq\lambda_{i}(W_{n})\leq 2$ whenever $\min(i,n+1-i)\geq T^{c^{\prime}}$ . One can then argue as before. ∎

The results in this paper also hold if one replaces the GUE ensemble by the GOE ensemble, in which case one considers real symmetric Wigner matrices instead of Hermitian Wigner matrices, with the off-diagonal $\xi_{ij}$ having mean zero, variance one, and third moment zero (if there are three matching moments) and fourth moment equal to $3$ (if there are four matching moments). To do this, one needs to replace Theorem 4 and Theorem 11 by their GOE counterparts. The GOE version of Theorem 4 was established by O’Rourke . The GOE version of Theorem 11 follows from the results in . In principle, one might be able to use other ensembles (such as the gaussian divisible matrices ) to match moments with, which would allow one to remove the moment conditions almost entirely. We will not pursue these matters here.

We are indebted to the anonymous referees for several suggestions and corrections.

Reduction to the Stieltjes transform

We now begin the proof of Theorem 8. The first step is to replace the counting function $N_{I}=N_{I}(W_{n})$ with the Stieltjes transform $s_{W_{n}}$ , defined by the formula

for any complex number $z$ with positive imaginary part. We can express this Stieltjes transform as a Riemann-Stieltjes integral

which gives a clear connection between the Stieltjes transform and the counting function; in the converse direction, we have the identity

whenever $E$ is not an eigenvalue of $W_{n}$ , showing that (in principle at least) we can reconstruct the eigenvalue counting function from the Stieltjes transform.

Using the heuristic $dN_{(-\infty,x)}\approx n\rho_{\operatorname{sc}}(x)\ dx$ from (4), we thus expect from (14) to have $s_{W_{n}}\approx s_{\operatorname{sc}}$ , where

As is well known (see e.g. ), $s_{\operatorname{sc}}$ can be evaluated explicitly via contour integrationFor instance, one can observe that $\frac{1}{\pi}{\operatorname{Im}}s_{\operatorname{sc}}(x\pm\sqrt{-1}{\varepsilon})$ converges to $\pm\rho_{sc}(x)$ as ${\varepsilon}\to 0^{+}$ , and then apply the Cauchy integral formula to $s_{\operatorname{sc}}$ around the slit $$.

where $\sqrt{z^{2}-4}$ is the branch of the square root that is asymptotic to $z$ at infinity. In particular, $s_{\operatorname{sc}}$ exactly obeys the self-consistent equation

In the case of GUE, we may easily formalize this heuristic with the assistance of Corollary 5:

Let $M_{n}$ be drawn from GUE, and $W_{n}:=\frac{1}{\sqrt{n}}M_{n}$ . Then for any $T>0$ and any complex number $z=E+\sqrt{-1}\eta$ with $\eta>0$ , one has

We may assume that $T\gg\log n$ , as the claim is trivial otherwise. Let $T_{1}\gg\log n$ be chosen later. From Corollary 5 and the union bound, we see that with probability $1-O(n^{O(1)}\exp(-cT_{1}))$ , one has

for all intervals $I$ in $ $whose endpoints are multiples of$ n^{-100} $, and hence for all intervals$ I$. In particular,

for all $x$ . On the other hand, from (14) and integration by parts, one has

The error term on the right-hand side evaluates to $O(\frac{T_{1}}{n\eta})$ . The claim then follows by choosing $T_{1}$ to be a small multiple of $T$ . ∎

We will use this proposition to obtain a similar concentration result for Wigner matrices:

Let $M_{n}$ be a Wigner matrix obeying Condition C0, and let $W_{n}:=\frac{1}{\sqrt{n}}M_{n}$ . Assume that $M_{n}$ matches moments with GUE to third order off the diagonal. Then for any $T>0$ and any complex number $z=E+\sqrt{-1}\eta$ with $E\in$ and $0<\eta\ll n^{100}$ , one has

We prove this theorem in later sections. Let us assume it for now, and use it to establish Theorem 8. The basic idea (which is standard in the Stieltjes transform approach to the local semicircle law) is to use a truncated form of (15). Let $M_{n},W_{n},T,K$ be as in the above theorem. By the triangle inequality, we may take $I=(-\infty,E)$ for some real number $E$ ; from the support of $\rho_{\operatorname{sc}}$ , we may assume that $E\in$ . We may also take $T\gg\log^{100}n$ (say), as the claim is trivial otherwise.

Let $T_{1}\gg T/\log n\gg\log^{99}n$ be a quantity to be chosen later, and set $\eta_{0}:=T_{1}/n$ . Applying Theorem 18 and the union bound, we see that outside of an event of probability at most

for all values of $\eta$ between $\eta_{0}$ and $n^{100}$ which are integer multiples of $n^{-1000}$ . On the other hand, in this range one easily verifies that the functions $\eta\mapsto s_{W_{n}}(E+\sqrt{-1}\eta)$ and $\eta\mapsto s_{\operatorname{sc}}(E+\sqrt{-1}\eta)$ are Lipschitz with Lipschitz norm at most $O(n^{200})$ (say). As a consequence, we conclude (after conditioning outside of the above exceptional event) that (19) holds for all $\eta$ between $\eta_{0}$ and $n^{100}$ .

By conditioning on another event of probability at most (18), we may assume that all entries of $M_{n}$ are of size at most $O(n)$ (say). Among other things, this implies that all eigenvalues $\lambda_{i}(W_{n})$ are (very crudely) of size at most $O(n^{20})$ .

Since $\eta\geq\eta_{0}=T_{1}/n$ , we conclude from (19) and (16) that

We conclude thatOne could also have used Proposition 30 at this juncture.

for all $\eta\geq\eta_{0}$ (note that this claim is trivial for $\eta\geq n^{100}$ ).

Next, if we integrate (19) and use the triangle inequality, we observe that

Let us now evaluate the left-hand side. From the definition of the Stieltjes transform, we may rewrite it as

where ${\operatorname{Arg}}$ is the standard branch of the argument on the upper half-plane.

Since $E\in$ and $\lambda_{i}(W_{n})=O(n^{20})$ , we have

(say). Also, from elementary trigonometry one has

We may therefore write the left-hand side of (21) as

(compare with (15)). On the other hand, from (20) and dyadic decomposition (recalling that $\lambda_{i}(W_{n})=O(n^{20})$ ) one has

Choosing $T_{1}$ to be a small multiple of $T/\log n$ (and bounding $T_{1}^{c}$ from below by $T^{c^{\prime}}-O(\log n)$ for some sufficiently small $c^{\prime}>0$ ), we obtain Theorem 8 as desired.

It remains to deduce Theorem 18 from Proposition 17. This will be the objective of the next few sections.

The moment method, and the Lindeberg strategy

Given a matrix $W_{n}=\frac{1}{\sqrt{n}}M_{n}$ and a complex number $z=E+\sqrt{-1}\eta$ , define the quantity $A(W_{n})=A(W_{n},z)$ by the formula

This quantity describes the normalised deviation of the Stieltjes transform of $W_{n}$ from the semicircular law at $z$ . In this notation, Proposition 17 becomes the assertion that

whenever $T>0$ , $E\in$ , and $0<\eta\ll n^{100}$ , when $M_{n}$ is drawn from a Wigner matrix obeying Condition C0, and with ${\operatorname{Re}}\xi_{ij}$ and ${\operatorname{Im}}\xi_{ij}$ having variance $1/2$ and third moment zero for $1\leq i<j\leq n$ .

To deduce (23) from (22) we will use the moment method combined with the Lindeberg exchange strategy; more specifically, we will show that a high moment ${\mathbf{E}}A(W_{n})^{k}$ for some large even number $k$ (which one should think of, in practice, as comparable to $T$ ) is stable under the operation of replacing (the real or imaginary part of) one entry of $M_{n}$ (and its transpose) with another entry with a number of matching moments. The Lindeberg exchange strategy is by now a standard tool in establishing universality properties for Wigner matrices , , ; the main novelty hereVery recently , a similar application of the Lindeberg exchange strategy to a high moment of a spectral statistic was used to establish some related concentration results. We thank Antti Knowles for bringing this preprint to our attention. is the application of that strategy to a high moment ${\mathbf{E}}A(W_{n})^{k}$ (as opposed to a quantity such as ${\mathbf{E}}G(A(W_{n}))$ for some smooth test function $G$ ).

Let us now make the strategy more precise. Let us call two Wigner matrices $M_{n},M^{\prime}_{n}$ real-adjacent, or adjacent for short, if their respective atom variables $\xi_{ij},\xi^{\prime}_{ij}$ are equal except for a single choice of $(i,j)=(a,b)$ and its transpose $(i,j)=(b,a)$ , and such that $\xi_{ab},\xi^{\prime}_{ab}$ either have identical real parts, or identical imaginary parts. Thus, a Wigner matrix $M^{\prime}_{n}$ adjacent to $M_{n}$ is formed by changing the real or imaginary part of a single entry of $M_{n}$ and its adjoint, leaving the other components of $M_{n}$ unchanged. The main technical step is then to establish the following proposition.

Let $M_{n},M^{\prime}_{n}$ be two adjacent Wigner matrices obeying Condition C0, whose moments match to order $m$ for some fixed $m=O(1)$ . Let $z=E+\sqrt{-1}\eta$ for some $E\in$ and $0<\eta\ll n^{100}$ , and set $W_{n}:=\frac{1}{\sqrt{n}}M_{n}$ and $W^{\prime}_{n}:=\frac{1}{\sqrt{n}}M^{\prime}_{n}$ . Then for any even integer $k\geq\log n$ , one has

Let us assume this proposition for now and establish Theorem 18. Let $n,M_{n},W_{n},E,\eta,z,T$ be as in that theorem. We may assume that $T\geq\log^{C_{0}}n$ (say) for some sufficiently large absolute constant $C_{0}$ , as the claim is trivial otherwise; we may also assume that $T\leq\eta n$ , since the claim follows from existing local semicircle laws (in particular, Corollary 32). In particular, we may now assume that $T\leq n^{O(1)}$ and $\eta\geq\log^{C_{0}}n/n$ . Our task is now to show that

On the other hand, if $M^{\prime}_{n}$ is drawn from GUE and $W^{\prime}_{n}:=\frac{1}{\sqrt{n}}M^{\prime}_{n}$ , then from Proposition 17 one has

for all $T>0$ . In particular, for any $k\geq\log n$ , one has

We can replace $M^{\prime}_{n}$ with $M_{n}$ in a sequence of $n^{2}$ exchanges from one Wigner matrix to a real-adjacent one; $n^{2}-n$ of these exchanges arise by swapping the real or imaginary part of an off-diagonal entry $\xi_{ij}$ of $M^{\prime}_{n}$ (and its transpose $\xi_{ji}$ ) with the corresponding component of $M_{n}$ , and $n$ of these exchanges arise by swapping a diagonal entry $\xi_{ii}$ of $M^{\prime}_{n}$ with the corresponding entry of $M_{n}$ . We perform these exchanges in an arbitrary order. By hypothesis, for the $n^{2}-n$ off-diagonal exchanges one has matching moments to order $m=3$ , while for the diagonal exchanges one has matching moments to order $m=1$ . Let $M_{n}=M^{0}_{n},M^{1}_{n},\ldots,M^{n^{2}}_{n}=M^{\prime}_{n}$ denote the sequence of exchanges from $M_{n}$ to $M^{n^{2}}_{n}$ , and let $W^{0}_{n},\ldots,W^{n^{2}}_{n}$ be the associated rescaled Wigner matrices. By Proposition 19 one has

for $0\leq a<n^{2}$ , where $m_{a}$ is equal to $3$ for $n^{2}-n$ choices of $a$ and equal to $1$ for $n$ choices of $a$ . Concatenating these bounds, we conclude that for any $k\geq\log n$ one has

If we set $k$ to be the largest even integer less than $T^{c_{0}}$ for some absolute constant $c_{0}$ , and if $C_{0}$ is sufficiently large depending on $c_{0}$ , we obtain (25) as desired, thanks to the assumptions $\log^{C_{0}}n\leq T\leq n\eta$ .

An inspection of the above argument reveals that we in fact have the slight refinement

in the regime $T\leq(n\eta)^{c}$ , since in this regime we may take $k$ to be a small multiple of $T$ (rounded off to the nearest even integer, of course). Unfortunately, this refinement does not appear to immediately offer any significant improvement to the conclusion of Theorem 8.

It remains to establish Proposition 19. This will be achieved in the next section.

Stability of high moments

We now prove Proposition 19. We introduce a definition:

An elementary matrix is a matrix which has one of the following forms

As $M_{n},M^{\prime}_{n}$ are real-adjacent, one can write

for all $t\geq C^{\prime}$ and some $C,C^{\prime}>0$ .

We now recall some (deterministic) resolvent stability results concerning matrices of the form $M_{n}^{0}+tV$ . Define the matrix norm $\|R\|_{(\infty,1)}$ of a $n\times n$ matrix $R=(R_{ij})_{1\leq i,j\leq 1}$ by the formula

Let $M_{n}^{0}$ be a Hermitian matrix, let $V$ be an elementary matrix, and let $t$ be a real number. Let $z:=E+\sqrt{-1}\eta$ be a complex number with $\eta>0$ . Write

Furthermore, if we set $s_{t}:=\frac{1}{n}\operatorname{trace}R_{t}$ , then we have the Taylor expansion

for any fixed nonnegative $m=O(1)$ , where the coefficients $c_{j}$ are independent of $t$ and obey the bounds

See [32, Lemma 12] and [32, Proposition 13]. ∎

Our objective is to establish (24). From Corollary 33 we see that

with probability $1-O(n^{O(1)}\exp(-(n\eta)^{c})$ , while from (28) we certainly have $\xi=o(\sqrt{n})$ with $1-O(n^{O(1)}\exp(-(n\eta)^{c})$ . Hence by the first conclusion of Proposition 22 (with $M_{n}^{0}$ and $V$ replaced with $M_{n}^{0}+\xi V$ , and setting $t$ equal to $-\xi$ ) we have

with probability $1-O(n^{O(1)}\exp(-(n\eta)^{c}))$ . Using the crude bound $A(W_{n})=O(n^{O(1)})$ , we may thus condition $M_{n}^{0}$ to be fixed and obeying (30), since the contribution of the event where (30) fails to ${\mathbf{E}}A(W_{n})^{k}$ is $O(n^{O(k)}\exp(-(n\eta)^{c}))$ .

By Proposition 22, we thus see that whenever $\xi=o(\sqrt{n})$ , one has

where the coefficients $A_{0},a_{j}$ are deterministic (and in particular independent of $\xi,\xi^{\prime}$ , though they can depend on $\eta,n$ ), and $a_{j}$ obeys the bound $a_{j}=O(1)$ .

Suppose first that $|A_{0}|\leq k$ . Then one has

whenever $\xi=o(\sqrt{n})$ , which gives a net contribution of $O(k)^{k}$ to ${\mathbf{E}}|A(W_{n})|^{k}$ ; meanwhile, from (28), the case when $\xi\gg\sqrt{n}$ contributes at most $O(n^{O(k)}\exp(-(n\eta)^{c}))$ . Thus we may assume that $|A_{0}|>k$ . Thus we have

for some deterministic coefficients $b_{1},\ldots,b_{m}=O(1)$ , and assuming that $\xi=o(\sqrt{n})$ . Raising this to the $k^{\operatorname{th}}$ power (after using Taylor’s theorem with remainder to expand $(1+\frac{1}{k}x)^{k}$ to $m^{\operatorname{th}}$ order in the regime $x=o(1)$ ), we conclude that

for some deterministic coefficients $d_{1},\ldots,d_{m}=O(1)$ (which are allowed to depend on $k$ ), whenever $\xi=o(\sqrt{n})$ . Taking (conditional) expectations in $\xi$ (using (28) and the trivial bound $A(W_{n})=O(n^{O(1)})$ to handle the tail event when $|\xi|\gg\sqrt{n}$ ) we conclude that

Since $\xi$ and $\xi^{\prime}$ match to order $k$ , we obtain the claim. This concludes the proof of Proposition 19 and hence Theorem 8.

It is possible to adapt the above arguments to the case when ${\operatorname{Re}}\xi_{ij}$ and ${\operatorname{Im}}\xi_{ij}$ are not assumed to be independent. The main new difficulty is that instead of swapping the real and imaginary parts of a single entry $\xi_{ab}$ of $M_{n}$ (and its transpose $\xi_{ba}$ ) separately, one has to swap them together. This requires one to consider perturbations of the form

where $V_{1},V_{2}$ are two distinct elementary random variables, and $\xi_{1},\xi_{2}$ are real random variables that are not necessarily independent and obeying the exponential decay hypothesis (28). However, it is possible to extend Proposition 22 without much difficulty to the case of two-parameter perturbations and perform a similar argument to that given above. We omit the details.

Extreme eigenvalues

We now prove Theorem 14, by combining the arguments in previous sections with some ideas from (and in particular, demonstrating a concentration of ${\operatorname{Im}}s_{W_{n}}(E+\sqrt{-1}\eta)$ that is better than $1/n\eta$ for some energy $E>2$ ). By symmetry, it suffices to prove the bound for $\lambda_{1}(W_{n})$ . We may of course assume that $n$ is large.

By standard large deviation estimates, one has

for any $E\geq 3$ ; seeOne could also use the earlier estimates in or ; see also for more discussion. [16, Lemma 7.2]. This already deals with the case when $n^{2/3}\leq T\leq n^{100}$ (say), and the case $T>n^{100}$ can be handled by crudely bounding $\lambda_{1}(W_{n})$ by, say, the Frobenius norm of $W_{n}$ and using Condition C0. Thus we may restrict attention to the regime $T\leq n^{2/3}$ , and show that

We may assume that $T\geq\log^{C_{0}}n$ for some suitably large absolute constant $C_{0}$ , as the claim is trivial otherwise.

Suppose that $\lambda_{1}(W_{n})$ was in the interval $[2+n^{-2/3}T,3]$ . Set $\eta:=n^{-2/3}$ , and let $B(W_{n})$ denote the quantity

where $E$ is the closest multiple of $n^{-2/3}$ in $[2+n^{-2/3}T,3]$ to $\lambda_{1}(W_{n})$ . Thus, by the union bound, it will suffice to show that

Let $M^{\prime}_{n}$ be drawn from GUE, and set $W^{\prime}_{n}:=\frac{1}{\sqrt{n}}M^{\prime}_{n}$ . By Theorem 11, we have

outside of an event of probability $O(\exp(-cT^{3/2}))$ ; in particular, we have

Also, from Corollary 5 and the union bound we see that outside of an event of probability $O(n^{O(1)}\exp(-cT))$ , one has

(say) for all intervals $I$ . In particular, outside of this event, we have

for all $k\geq 1$ , using the bound $\rho_{sc}(y)=O((2-y)^{1/2})$ when $y<2$ .

From (34), (35), (32), and dyadic decomposition one easily establishes that

outside of an event of probability $O(n^{O(1)}\exp(-cT^{c}))$ .

Let $\log n\leq k\leq n^{0.01}$ be an integer to be chosen later. Since we may trivially bound ${\operatorname{Im}}s_{W_{n}}(E+\sqrt{-1}\eta)$ by $n^{O(1)}$ , we conclude that

We claim the following stability result for ${\mathbf{E}}B(W_{n})^{k}$ , analogous to Proposition 19:

Let $M_{n},M^{\prime}_{n}$ be two adjacent Wigner matrices obeying Condition C0, whose moments match to order $m$ for some fixed $m=O(1)$ . Set $W_{n}:=\frac{1}{\sqrt{n}}M_{n}$ and $W^{\prime}_{n}:=\frac{1}{\sqrt{n}}M_{n}$ . Then for any integer $\log n\leq k\leq n^{0.1}$ , one has

Applying this proposition $n^{2}-n$ times with $m=4$ and $n$ times with $m=2$ we conclude that

and thus (using (36) and the hypothesis $k\leq n^{0.01}$ )

The desired claim (33) then follows from Markov’s inequality by taking $k=T^{c_{0}}$ for some sufficiently small $c_{0}>0$ (and assuming $C_{0}$ sufficiently large depending on $c_{0}>0$ ).

It remains to establish Proposition 24. As in the previous section, we write

for some elementary matrix $V$ , some random matrix $M_{n}^{0}$ , and some real random variables $\xi,\xi^{\prime}$ independent of $M_{n}^{0}$ that match moments to $m^{\operatorname{th}}$ order and obey the exponential decay condition (28). Arguing exactly as before, we may condition $M_{n}^{0}$ to be a deterministic matrix for which

Using Proposition 22 as before, we see that

for some deterministic coefficients $B_{0}$ and $a_{j}=O(1)$ , whenever $\xi=o(\sqrt{n})$ .

Suppose first that $|B_{0}|\leq 1/200$ . Then one has $|B(W_{n})|\leq 1/100$ whenever $\xi=o(\sqrt{n})$ , and so this case contributes $O(100^{-k})+O(n^{O(k)}\exp(-cn^{c}))$ to (37), which is acceptable. Thus we may restrict attention to the case when $|B_{0}|>1/200$ . Then we may write

whenever $\xi=o(\sqrt{n})$ , where the $b_{j}=O(1)$ are deterministic coefficients.

Suppose now that $\xi=O(n^{0.3})$ . Since $k\leq n^{0.01}$ , we may perform a Taylor expansion of $(1+x)^{k}$ to order $m$ for $x=O(n^{-0.2})$ and conclude that

in this regime, where the $c_{j}=O(1)$ are deterministic coefficients (which are allowed to depend in $k$ ). Taking expectations as in the preceding section, and using (28) to handle those $\xi$ with $|\xi|\geq n^{0.3}$ , we conclude that

and similarly for ${\mathbf{E}}B(W^{\prime}_{n})^{k}$ ; and the claim follows from the matching moments hypothesis.

As in Remark 23, it is possible to extend these arguments to the case when ${\operatorname{Re}}(\xi_{ij})$ and ${\operatorname{Im}}(\xi_{ij})$ are not independent; we leave the details to the interested reader.

Note that when one has four matching moments rather than three, the error terms are more favorable by a factor of $\sqrt{n}$ , giving some additional room to vary the parameters of the argument by small powers of $n$ . Because of this, it is possible to modify the proof of Theorem 18 to conclude in this case that

in the regime $0<T\leq n^{c}$ for a sufficiently small $c$ . This is achieved by arguing as in this section, except that one allows the resolvent $\|R_{0}\|_{(\infty,1)}$ to be as large as $O(n^{c})$ rather than $O(1)$ in order to keep the failure probability bounded by $O(n^{O(1)}\exp(-n^{c}))$ rather than $O(n^{O(1)}\exp(-(n\eta)^{c}))$ . We omit the details. As a consequence, we can sharpen the conclusion of Theorem 8 to

when $0<T\leq n^{c}$ and $M_{n}$ matches moments with GUE to fourth order off the diagonal and second order on the diagonal.

Appendix A Local semicircle law

In this appendix we establish some preliminary local semicircle law estimates, following the treatment in and . As the methods used here are now standard, and the results very close to those in and , we shall be somewhat brief in our treatment.

We first recall a concentration estimate of Hanson and Wright .

for all $t\geq C^{\prime}$ and $1\leq i\leq n$ , and some $C,C^{\prime}>0$ independent of $n$ . Let $A$ be an $n\times n$ matrix. Then for any $T>0$ , one has

outside of an event of probability $O(\exp(-cT^{c}))$ .

See [16, Lemma B.1]. (Note that a factor of $\sigma$ is missing from the statement of the exponential decay hypothesis in the lemma as stated in , which is needed in order to reduce to the $\sigma=1$ case.) ∎

outside of an event of probability $O(\exp(-cd^{c}))$ .

Apply the preceding proposition with $A:=\pi_{V}$ (so $\operatorname{trace}A=\operatorname{trace}A^{*}A=d$ ) and $T:=d^{1/2}/10$ . ∎

We can also use Talagrand’s inequality as in , combining with a truncation argument (to bound each entries by some properly chosen quantity $K$ ). In the case when the atom variables have very fast decay (such as sub-gaussian) or bounded (such as Bernoulli), this calculation will actually lead to a decent bound on the value of $c$ in Theorem 8.

We can now establish a crude upper bound on the counting function $N_{I}$ of a Wigner matrix.

Let $M_{n}$ be a Wigner matrix obeying Condition C0, and let $W_{n}:=\frac{1}{\sqrt{n}}M_{n}$ . Then for any interval $I$ , one has

outside of an event of probability $O(n^{O(1)}\exp(-c(n|I|)^{c}))$ .

Fix $I$ , which we write as $I=[E-\eta,E+\eta]$ . Suppose that

for some sufficiently large absolute constant $C$ to be chosen later. We will show that this leads to a contradiction outside of an event of probability $O(n^{O(1)}\exp(-c(n\eta)^{c})$ .

On the other hand, we can write the Stieltjes transform $s_{W_{n}}$ in terms of the coefficients $R_{ij}$ of the resolvent as

Thus, by the pigeonhole principle, we have

for some $1\leq i\leq n$ . By symmetry (and conceding a factor of $n$ in the failure probability estimates) we may take $i=n$ .

Now, a standard Schur complement computation (see e.g. [30, Lemma 42]) shows that

where $R^{(n)}(z)=(W_{n}^{(n)}-z)^{-1}$ is the resolvent corresponding to the $n-1\times n-1$ matrix $W_{n}^{(n)}$ formed by removing the $n^{\operatorname{th}}$ row and column from $W_{n}$ , $\xi_{nn}$ is the bottom right entry of $M_{n}$ , and $X$ is the rightmost column of $W_{n}$ (after removing the bottom entry $\frac{1}{\sqrt{n}}\xi_{nn}$ ). In particular, using the trivial bound $|{\operatorname{Im}}\frac{1}{z}|\leq\frac{1}{|{\operatorname{Im}}z|}$ , we conclude that

Now, by the Cauchy interlacing law, $W_{n}^{(n)}$ has $\gg Cn\eta$ consecutive eigenvalues in $I$ . There are $O(n^{2})$ possibilities for the starting and ending index of these eigenvalues. If we let $V$ be the space spanned by the corresponding eigenvectors, then $\dim(V)\gg Cn\eta$ , and from the spectral theorem we see that

outside of an event of probability $O(\exp(-c(n\eta)^{c}))$ . If $C$ is sufficiently large, the claim follows. ∎

This gives rise to a self-consistent equation:

Let $M_{n}$ be a Wigner matrix obeying Condition C0, and let $W_{n}:=\frac{1}{\sqrt{n}}M_{n}$ . Then for any $z=E+\sqrt{-1}\eta$ with $E=O(1)$ and $0<\eta\ll n^{100}$ , and all $1\leq i\leq n$ , one has

outside of an event of probability $O(n^{O(1)}\exp(-c(n\eta)^{c}))$ . In particular, by the union bound, we have

outside of an event of probability $O(n^{O(1)}\exp(-c(n\eta)^{c}))$ .

We can assume that $n\eta\geq\log^{100}n$ (say), as the claim is trivial otherwise. By symmetry, it will suffice to establish

outside of an event of probability $O(n^{O(1)}\exp(-c(n\eta)^{c}))$ . By (39), this statement is equivalent to

By Condition C0, one has $\frac{1}{\sqrt{n}}\xi_{nn}=o(1)$ outside of an event of probability $O(\exp(-cn^{c}))$ , which is certainly acceptable; so our task is now to show that

outside of an event of probability $O(n^{O(1)}\exp(-c(n\eta)^{c}))$ .

From the Cauchy interlacing law (cf. [30, §5.2]) we know that

By Proposition 30 and the union bound, we may assume outside of an event of probability $O(n^{O(1)}\exp(-c(n\eta)^{c}))$ , one has

for all intervals $I$ of width at least $\eta$ centered at $E$ . By interlacing, we may also conclude

for such intervals. Inserting this bound into (42), we conclude that

If we then apply Proposition 27 with $T:=(n\eta)^{1/4}$ (say), using the hypothesis that $n\eta\geq\log^{100}n$ (so that $1/(n\eta)^{c}=o(1)$ for any $c>0$ ) we conclude (41) outside of an event of order $O(n^{O(1)}\exp(-c(n\eta)^{c}))$ , as required. ∎

We can combine this proposition with a standard stability analysis of the self-consistent equation (40) to conclude a crude version of the local semicircle law:

outside of an event with probability $O(n^{O(1)}\exp(-c(n\eta)^{c}))$ .

We note that this corollary is essentially [17, Theorem 3.1]; in the statement of the result in the additional constraint $\eta\geq\log^{C\log\log n}/n$ for some constant $C$ is imposed, but this constraint is not actually used in the proof, at least if one is not concerned with obtaining the best possible bounds for the $o(1)$ error terms. For the convenience of the reader, we sketch the proof of this corollary below.

As before we may assume that $\eta\geq\log^{100}n/n$ ; we may also assume that $n$ is large. By Proposition 31, we may assume that (40) holds.

Let us first dispose of the case when $\eta$ is large, say $\eta\geq 100$ . In this case, the imaginary part of $s_{W_{n}}(z)+z+o(1)$ is at least $100-o(1)$ , and hence by (40) one has $|s_{W_{n}}(z)|\leq 1/100+o(1)$ ; inserting this back into (40) (and using (16)) one obtains $|s_{W_{n}}(z)-s_{\operatorname{sc}}(z)|\leq 1/10$ (say). One can then deduce (44) from (40) (and (17)) by a routine application of the contraction mapping theorem.

Henceforth we assume that $\eta<100$ , so that $z=O(1)$ . Then equation (40) already implies that $s_{W_{n}}(z)=O(1)$ , since (40) cannot hold if $|s_{W_{n}}(z)|$ is too large. We may thus multiply out the denominator and conclude that

Since the two solutions to the quadratic equation $s^{2}+zs+1=0$ are $s=s_{\operatorname{sc}}(z)$ and $s=-z-s_{\operatorname{sc}}(z)$ , we conclude that

outside of an event with probability $O(n^{O(1)}\exp(-c(n\eta)^{c}))$ .

We apply this fact with $z$ replaced by an arbitrary complex numbers $\zeta$ with ${\operatorname{Re}}(\zeta)=O(1)$ and $\eta\leq{\operatorname{Im}}(\zeta)\ll 1$ , and whose real and imaginary parts are multiples of $n^{-100}$ (say). By the union bound, the probability of the failure event is still $O(n^{O(1)}\exp(-c(n\eta)^{c}))$ . We may then remove the latter hypotheses using the fact that $s_{W_{n}}$ and $s_{\operatorname{sc}}$ have Lipschitz constant $O(n)$ in this region, and conclude that outside of an event of probability $O(n^{O(1)}\exp(-c(n\eta)^{c}))$ , one has

for all $\zeta$ with ${\operatorname{Re}}(\zeta)=O(1)$ and $\eta\leq{\operatorname{Im}}(\zeta)\ll 1$ . On the other hand, if one has ${\operatorname{Im}}(\zeta)\geq c$ for some absolute constant $c>0$ , then the second possibility in (46) cannot occur for $n$ large enough, because $s_{W_{n}}(\zeta)$ necessarily has positive imaginary part. A continuity argument then shows that the first option in (46) holds for all $\zeta$ in the indicated regionWhen $\zeta$ approaches the edges $\pm 2$ of the spectrum, thus $\zeta=\pm 2+o(1)$ , the two options in (46) begin to overlap, but in that regime one can deduce the first option from the second (with a slightly worse $o(1)$ error) and so the claim made in the text is still valid.. This gives (44). Among other things, this shows that $|s_{W_{n}}(z)+z|\gg 1$ (thanks to (17)), and then from (17) and the second part of Proposition 31 we obtain (45). ∎

For our applications, we will also need bounds on the coefficient norm

Let $M_{n}$ be a Wigner matrix obeying Condition C0, and let $W_{n}:=\frac{1}{\sqrt{n}}M_{n}$ . Then for any $z=E+\sqrt{-1}\eta$ with $E=O(1)$ and $0<\eta\ll n^{100}$ , one has

outside of an event with probability $O(n^{O(1)}\exp(-c(n\eta)^{c}))$ .

Again, we may assume $\eta>\log^{100}n/n$ . By the union bound, it suffices to show for each $1\leq i,j\leq n$ that

outside of an event with probability $O(n^{O(1)}\exp(-c(n\eta)^{c}))$ . In the diagonal case $i=j$ , this follows directly from (45), so suppose that $i\neq j$ . In this case, we may use the Schur complement identity

where $R^{(i)}(z)$ is the resolvent associated to the $n-1\times n-1$ matrix $W_{n}^{(i)}$ formed by removing the $i^{\operatorname{th}}$ row and column from $W_{n}$ , and $K_{ij}^{(ij)}$ is the quantity

outside of an event of probability $O(n^{O(1)}\exp(-c(n\eta)^{c}))$ . But by Proposition 27 (viewing the $n-2\times n-2$ matrix $(W_{n}^{(ij)}-z)^{-1}$ as the upper-right block of a nilpotent $2(n-2)\times 2(n-2)$ matrix, and concatenating $X_{i}$ and $X_{j}$ together), one has

outside of an event of probability $O(\exp(-cT^{c}))$ , for any $T>0$ . But by repeating the derivation of (43), one has

If one then sets $T=O(\sqrt{n\eta})$ , one obtains the claim. ∎

We remark that the above argument in fact shows that we may improve the bound $R(z)_{ij}=O(1)$ to $R(z)_{ij}=O(\frac{1}{(n\eta)^{1/2-\delta}})$ for any fixed $\delta>0$ ; compare with [17, Theorem 3.1]. However, this improvement is not used in this paper.

Introduction

Reduction to the Stieltjes transform

The moment method, and the Lindeberg strategy

Stability of high moments

Extreme eigenvalues

Appendix A Local semicircle law

References